Michael J. BlackFebruary 2002
Learning the Appearance and Learning the Appearance and Motion of People in VideoMotion of People in Video
Hedvig Sidenbladh Michael J. Black
http://www.cs.brown.edu/~black
Department of Computer ScienceBrown University
Defense Research InstituteStockholm Sweden
http://www.nada.kth.se/~hedvig
(The Science of Silly Walks)(The Science of Silly Walks)
Michael J. BlackFebruary 2002
CollaboratorsCollaborators
David Fleet, Xerox PARC
Nancy Pollard, Brown University
Dirk Ormoneit and Trevor Hastie Dept. of Statistics, Stanford University
Allan Jepson, University of Toronto
Michael J. BlackFebruary 2002
The (Silly) ProblemThe (Silly) Problem
Unsolved without manual intervention.
Michael J. BlackFebruary 2002
Inferring 3D Human MotionInferring 3D Human Motion
* No special clothing* Monocular, grayscale, sequences (archival data)* Unknown, cluttered, environment* Incremental estimation
* Infer 3D human motion from 2D image properties.
Michael J. BlackFebruary 2002
Why is it Hard?Why is it Hard?
Low contrast
Self occlusion
Singularities in viewing direction
Unusual viewpoints
Ambiguous matches
Michael J. BlackFebruary 2002
Large MotionsLarge Motions
Limbs move rapidly with respect to their width.
Non-linear dynamics.
Motion blur.
Michael J. BlackFebruary 2002
RequirementsRequirements
1. Represent uncertainty and multiple hypotheses.
2. Model non-linear dynamics of the body.
3. Exploit image cues in a robust fashion.
4. Integrate information over time.
5. Combine multiple image cues.
Michael J. BlackFebruary 2002
Simple Body ModelSimple Body Model
* Limbs are truncated cones* Parameter vector of joint angles and angular velocities =
Michael J. BlackFebruary 2002
Inference/IssuesInference/IssuesBayesian formulation
p(model | cues) = p(cues | model) p(model)
3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities.
p(cues)
1. Need a constraining likelihood model that is alsoinvariant to variations in human appearance.
2. Need a prior model of how people move.
Michael J. BlackFebruary 2002
What Image Cues?What Image Cues?
Pixels?
Temporal differences?
Background differences?
Edges?
Color?
Silhouettes?
Optical flow?
Michael J. BlackFebruary 2002
Brightness ConstancyBrightness Constancy
I(x, t+1) = I(x+u, t) +
Image motion of foreground as a function of the 3D motion of the body.
Problem: no fixed model of appearance (drift).
t1t
Michael J. BlackFebruary 2002
Bregler and Malik ‘98Bregler and Malik ‘98
State of the Art.
* Brightness constancy cue
• insensitive to appearance
* Full-body required multiple cameras.
* Single hypothesis.
• MAP estimate
Michael J. BlackFebruary 2002
Cham and Rehg ‘99Cham and Rehg ‘99
State of the Art.
* Single camera, multiple hypotheses.
* 2D templates (solves drift but is view dependent)
I(x, t) = I(x+u, 0) +
Michael J. BlackFebruary 2002
Edges as a Cue?Edges as a Cue?
• Probabilistic model?• Under/over-segmentation, thresholds, …
Michael J. BlackFebruary 2002
Deutscher, North, Bascle, & Deutscher, North, Bascle, & Blake ‘99Blake ‘99
* Multiple cameras
* Simplified, clothing, lighting and background.
State of the Art.
Michael J. BlackFebruary 2002
Changing background
Low contrast limb boundaries
Occlusion
Varying shadows
Deforming clothing
What do people look like?
What do non-people look like?
Michael J. BlackFebruary 2002
Key Idea #1 Key Idea #1 (Rigorous Likelihood)(Rigorous Likelihood)
1. Use the 3D model to predict the location of limb boundaries (not necessarily features) in the scene.
2. Compute various filter responses steered to the predicted orientation of the limb.
3. Compute likelihood of filter responses using a statistical model learned from examples.
Michael J. BlackFebruary 2002
Natural Image StatisticsNatural Image Statistics
Ruderman. Lee, Mumford, Huang. Portilla and Simoncelli. Olshausen & Field. Xu, Wu, & Mumford. …
* Statistics of image derivatives are non-Gaussian.* Consistent across scale.
Michael J. BlackFebruary 2002
Statistics of EdgesStatistics of Edges
Statistics of filter responses, F, on edges, pon(F), differs from background statistics, poff (F).
Likelihood ratio, pon/ poff , can be used for edge detection and road following.
Geman & Jednyak and Konishi, Yuille, & Coughlan
What about the object specific statistics of limbs?
* edge may be present or not.
Michael J. BlackFebruary 2002
Edge FiltersEdge FiltersNormalized derivatives of Gaussians (Lindeberg, Granlund and Knutsson, Perona, Freeman&Adelson, …)
),(cos),(sin),,( xxx yxe fff
Edge filter response steered to limb orientation:
Filter responses steered to arm orientation.
Michael J. BlackFebruary 2002
Distribution of Edge Distribution of Edge Filter ResponsesFilter Responses
pon(F) poff(F)
Michael J. BlackFebruary 2002
Contrast Normalization?Contrast Normalization?
contrast
OcontrastSw
*2
)*tanh(1
Lee, Mumford & Huang
)ˆ
log(I
IInorm
Michael J. BlackFebruary 2002
Contrast NormalizationContrast NormalizationMaximize difference between distributions
* e.g. Bhattarcharyya distance:
dyypyppp offonoffonB )()(log),(
Michael J. BlackFebruary 2002
Ridge FeaturesRidge Features
|),(cossin2),(sin),(cos|
|),(cossin2),(cos),(sin|),,(22
22
xxx
xxxx
xyyyxx
xyyyxxr
fff
ffff
Scale specific
Michael J. BlackFebruary 2002
Ridge FiltersRidge Filters
Relationship between limb diameter in image and scale of maximum ridge filter response.
Michael J. BlackFebruary 2002
Brightness ConstancyBrightness Constancy
What are the statistics of brightness variationI(x, t) - I(x+u, t+1)?
Variation due to clothing, self shadowing, etc.
I(x, t) I(x+u, t+1)
Michael J. BlackFebruary 2002
Brightness ConstancyBrightness Constancy
• well fit by t-distribution or Cauchy distribution (heavy tails)
• related to robust statistics
Michael J. BlackFebruary 2002
Key Idea #2 Key Idea #2 (Explain the Image)(Explain the Image)
p(image | foreground, background)
Generic, unknown, background
Foreground person
Foreground should explain what the background can’t.
pixelsfore
pixelsfore
backimagep
foreimagepconst
)|(
)|(
See also McCormick and Isard, ICCV’01.
Michael J. BlackFebruary 2002
LikelihoodLikelihood
Steered edgefilter responses
crude assumption: filter responses independent across scale.
limbs cues )background|responsefilter(
)person|responsefilter(
p
p
Michael J. BlackFebruary 2002
Inference/IssuesInference/IssuesBayesian formulation
p(model | cues) = p(cues | model) p(model) p(cues)
1. Need a constraining likelihood model that is alsoinvariant to variations in human appearance.
2. Need a prior model of how people move.
Michael J. BlackFebruary 2002
Learning Human MotionLearning Human Motion
* constrain the posterior to likely & valid poses/motions* model the variability
time
joint angles
3D motion-capture data. * Database with multiple actors and a variety of motions.
(from M. Gleicher)
Michael J. BlackFebruary 2002
Key Idea #3 Key Idea #3 (Trade learning for search.)(Trade learning for search.)
Problem:
* insufficient data to learn a prior probabilistic model of human motion.
Alternative:
* the data represents all we know
* replace representation and learning with search. (challenge: search has to be fast)
Michael J. BlackFebruary 2002
Texture SynthesisTexture Synthesis
Efros & Freeman’01
“Database”Synthetic Texture
* De Bonnet & Viola, Efros & Leung, Efros & Freeman, Paztor & Freeman, Hertzmann et al, …
* Image(s) as an implicit probabilistic model.
Michael J. BlackFebruary 2002
Implicit Probabilistic ModelImplicit Probabilistic Model
Key idea: probabilistic search (log time) of this tree approximates sampling from p(stored sequence | generated sequence).
Michael J. BlackFebruary 2002
SynthesisSynthesis
* Colors indicate different training sequences.
* For graphics, we need- editability, constraints (ground contact, pose, interpenetration), key frames, style, …
Michael J. BlackFebruary 2002
Tracking* Efficiently generate samples (image data will sort out which are good).
* Temperature parameter controls randomness of tree search.
Michael J. BlackFebruary 2002
Bayesian FormulationBayesian Formulation
1111 ))|()|(()(
)|(
ttttttt
tt
dppp
p
II
I
Posterior over model parameters given an image sequence.
Likelihood ofobserving the imagegiven the model parameters
Temporal model (prior)
Posterior fromprevious time instant
Michael J. BlackFebruary 2002
What does the posterior look like?What does the posterior look like?
x yz
Shoulder: 3dofElbow: 1dof
Elbow bends
Michael J. BlackFebruary 2002
Inference/IssuesInference/IssuesBayesian formulation
p(model | cues) = p(cues | model) p(model)
3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities.
p(cues)
1. Need a constraining likelihood model that is alsoinvariant to variations in human appearance.
2. Need a prior model of how people move.
Michael J. BlackFebruary 2002
Key Idea #4 Key Idea #4 (Represent Ambiguity)(Represent Ambiguity)
Samples from a distributionover 3D poses.
* Represent a multi-modal posterior probability distribution over model parameters - sampled representation - each sample is a pose and its probability - predict over time using a particle filtering approach.
Michael J. BlackFebruary 2002
Particle FilteringParticle Filtering* large literature (Gordon et al ‘93, Isard & Blake ‘96,…)
* non-Gaussian posterior approximated by N discrete samples
* explicitly represent the ambiguities
* exploit stochastic sampling for tracking
)(nt )10( 3NNn ,...,1
Michael J. BlackFebruary 2002
Particle FilterParticle Filter
samplesample
samplesample
normalizenormalize
Posterior)I|( 11 ttp
Temporal dynamics)|( 1ttp
Likelihood
)|I( ttp )I|( ttp
Posterio
r
Michael J. BlackFebruary 2002
Tracking with OcclusionTracking with Occlusion
1500 samples, ~2 minutes/frame.
Michael J. BlackFebruary 2002
Stochastic 3D TrackingStochastic 3D Tracking
* 2500 samples (now down as low as 300 with the new prior).
Michael J. BlackFebruary 2002
ConclusionsConclusionsInferring human motion, silly or not, from video is challenging.
We have tackled three important parts of the problem:
1. Probabilistically modeling human appearance in a generic, yet useful, way.
2. Representing the range of possible motions using techniques from texture modeling.
3. Dealing with ambiguities and non-linearities using particle filtering for Bayesian inference.
Michael J. BlackFebruary 2002
Ongoing and Future WorkOngoing and Future WorkBetter search algorithms Hybrid Monte Carlo tracker (Choo and Fleet ’01) Covariance scaled sampling (Schiminescu&Triggs’01)
Richer prior models of motion.
Estimate background motion.
Statistical models of color and texture.
Automatic initialization.
Training data and likelihood models to be available in the web.
Top Related