EdMotionCapturev5.0

Motion Capture in 3D AnimationEdward Tse

Motion Capture as a ToolMotion capture (MOCAP) is an effective 3D animation tool for realistically capturing human motion

OutlineRotoscopingThe MOCAP PipelineLimitations of MOCAPThe Future of MOCAP

RotoscopingCaptured VideoHistoryJohnston, et al., 1995

RotoscopingHistoryTracingJohnston, et al., 1995

RotoscopingPost ProcessingJohnston, et al., 1995

The MOCAP PipelineCalibrationCapture3D Position ReconstructionFitting to the SkeletonPost Processing

CaptureCosta-Sousa, M., 2004

Calibration

3D Position Reconstruction (Utopia)

3D Position Reconstruction (Reality)

Multiple MarkersFor small number of markers: SizeOcclusions are a problemMultiple Hypothesis Tracking

Multiple MarkersMechanical Optical Hybridshttp://www.animazoo.com/services/gypsyHybrid.htm

Fitting to the SkeletonUtopian approach10 20% length changesMarkers on both sidesJoint DisplacementUse Rotation Angles Only

Post ProcessingMotion EditingCut, Copy, PasteMotion WarpingSpeed up or Slow DownRotate, Scale or TranslateMotion Signal ProcessingSmoother Motions

Limitations of MOCAPSubject to data inaccuracyExpensive

DeviceCost for Body capture (USD)Mechanical$20000 $30000Electromagnetic$100000 $120000Optical$90000 $210000

Limitations of MOCAPOnly realistic motion capturedCartoony Animations not possibleWYSIWYG (what you see is what you get)Cant add more expressionContinally need to recapture motionRobertson, B., 2001

The Future of MOCAPMarkerless MOCAPCheaper MOCAPMOCAP as Input to Large Displays

ConclusionMotion capture (MOCAP) is an effective 3D animation tool for realistically capturing human motion

ReferencesJohnston, O., Thomas, F., (1995) The Illusion of Life: Disney Animation, Disney Editions, ISBN: 0786860707 Costa-Sousa, M. (2004) Motion Synthesis, Powerpoint PresentationParent, R. (2002) Computer Animation: Algorithms and Techniques, Morgan Kaufmann PublishingRinger, M., Lasenby, J. (2002) Multiple Hypothesis Tracking for Automatic Optical Motion Capture, European Conference on Computer Vision (ECCV 02)Robertson, B. (2001) Medieval Magic, Computer Graphics World, April 2001. Tse, E. (2004) The Single Display Groupware Toolkit, MSc Thesis, University of Calgary, Ab, CanadaBregler, C., Loeb, L., Chuang, E., Deshpande, H., (2002) Turning to the Masters: Motion Capturing Cartoons, Proceedings of ACM Siggraph 2002 pp 399 407.vicon.com, inition.co.uk, polhemus.com, wikipedia.org, animazoo.com, hollywoodjesus.com

Bonus SlidesStuff thats too cool for you!

Calibration

OutlineHistory Motion Capture PipelineLimitationsFuture Work

In this talk I will discuss some of the algorithms used in motion capture technology. But I need to warn you ahead of time that there will be some hand waving when it comes to the mathematical proofs found in appendix B. (!) Just to make this hand waving explicit the hand wavy bits will be explained using a blue By the power of appendix B starburst

A rotoscope is a device that enables animators to trace live action movement, frame by frame, for use in animation. It might be called a clumsy forerunner to digital motion capture. The device was invented by Max Fleischer in 1914. Fleischer used his brother dressed in a clown outfit, as the live-film reference for the animated character Koko the Clown.

Rotoscoping works by first capturing a video of a real actor around props that resembles elements of a scene.

Animators would then use a (!) transparent matte to trace over each frame of the recorded video, this would result in a figure that moved in a very realistic fashion.Finally, the animated figured would be coloured and then integrated with various background layers to create the final shot. Walt Disney and his animators employed rotoscoping carefully and very effectively in films such as Snow White and peter pan as seen in this scene where tinkbell peers through a keyhole.

The 3D motion capture pipeline has similarities to the process of rotoscoping. Both capture a live actor, use the captured frames as references in an animation, and both require lots of post processing to achieve the intent of animator.

The motion capture pipeline consists of 5 steps, calibration, capture, 3D position reconstruction, fitting to the skeleton and post processing. I am going to talk about the capture process first because you need this information to understand the rest of the motion capture pipeline.I will discuss three MOCAP input techniques. First, Optical tracking devices generally use an (!) infrared camera to track (!) small reflective spheres or markers placed on an actor. Optical tracking allows large areas to be tracked but suffers from occlusion problems when multiple people are involved. Most of my discussion about the motion capture pipeline will focus around using optical tracking technology.

Second, (!) electro magnetic systems use a centrally located (!) transmitter which emits an electromagnetic field. A set of recievers are attached to a body suit so that their position and orientation can be tracked relative to the transmitter. This system does not need to worry about occlusion but the end user must be tethered to the tracking device and the actor must stay in close proximity (rougly about 5 feet) to the transmitter.

Finally, (!) electromechanical suits can track an actors joint angles using (!) potentiometers that measure the angles of a mechanical exoskeleton. Electromechanical suits do not have to deal with occlusion and there are wireless versions available. However, the downside is that there is a limited range of motions that are possible with such a suit and this system is not as durable as the reflective markers in optical systems. Before the 3D position of a marker can be determined in an optical system, we need to know where each camera is located in world space coordinates.

Camera calibration is performed by recording a number of image space points whose world space locations are known such as those on a checkboard or a (!) calibration frame. (!) By measuring several image space to world space pairs we can create a set of linear equations that can be solved to obtain the location of each camera.

Im not going to go into how to solve this equation (the book suggests a least sqaures solution discussed in Appendix B p502). For the purposes of this example I will assume that (!) the magic of Appendix B allows us to obtain a view volume for each camera each includes the x, y and z positions of each camera in world space coordinates. To reconstruct the 3D coordinates of a marker, the Point P must be seen in at least two camera views. The greater the orthogonality of the two cameras the better the chance for an accurate reconstruction.

In an ideal world if we were only tracking a single object we could determine its position in world space by

(!) creating two vectors from the camera origin (C1 and C2) to point Ps projection on the image plane (I1 and I2). The location of point P would be some scalar multiple of the vector from the camera to the image plane (k1 and k2)(I1 C1). Since P should be the same point for both cameras we can (!) combine these two equations into one. That gives us (!) three equations and two unknowns (k1 and k2). Once weve solved for k1 and k2 we can find the value of P.

Unfortunately, noise tends to complicate the real world. In practice, these two equations will not exactly intersectIn practice the points of closest encounter must be found on each line. This requires us to find p1 and p2 which lie somewhere along the vector from the camera to the image point. Since we know that the actual point P (!) is somwhere between P1 and P2 we know that the (!) vector from P2 to P1 is perpendicular to the vector from the camera to the image point (I1 C1). Hence their dot product must equal zero.

Combining these two equations together (!) yeilds a single equation that we can solve for. The actual method for solving these two equations is in appendix B (p 502), (!) but Ill assume that we get two points P1 & P2. By calculating the midpoint of these two points we can get a fairly accurate approximation of the actual position of point P.

If we desire to track only one or two markers we could easily vary the size of the reflective balls to distinguish between them. However, this is not feasable in actual motion capture since its not uncommon to have 31 unique markers for each actor in a multiple actor scene.

The problem with having so many markers is that there is a high probability of occlusion. For example, (!) the marker on this actors left foot is hidden from the view of the cameras C1 and C2. One solution is to add more cameras and rerecord this particular scene so that occlusion does not occur. But when another camera costs tens of thousands of dollars we use a vision technique (!) called multiple hypothesis tracking to make informed guesses as to where these points actually are and present the user with a choice of several different tracked options. On a vicon system this is done by traversing through a playback until the system gets confused and asking the user for input. Another approach is to use electro mechanical sensors and attach (!) reflective spheres to the body suit. This greatly reduces the total number of markers so that mechanical sensor technology can be augmented with absolute position tracking.Once the motion of the individual markers looks smooth and reasonable, the next step is to attach them to the underlying bone structure. In a straightfoward utopian approach, the position of each marker in each frame would be used to absolutely position a joint in the skeleton.

In practice youd find that the distance from an actors knee joint to their hip would change with time. Length changes of 10-20% are not uncommon, this usually results the foot of the skeleton sliding even when the actor is not moving (an effect also know as skating)

One of the reasons for this problem is that these markers are not actually located on the joints of the performers but outside the joints on the surface. (!) One strategy is to place markers on both sides of each joint and map the midpoint as the location of the actual bone structure. This usually provides a more accurate representation but it also doubles the number of points that need to be tracked.

(!) Another solution is to algorithmically displace the captured joint positions so that they are closer to the actors actual joints. (!) This can be done by treating every three joints as a plane and displacing these three joints along a perpendicular axis.

(!) Another approach is to calculate joint rotations and only use them to manipulate the skeleton based on the IK constraints set in the 3D model. Often additional constraints will need to be added to prevent oddities in the scene like the feet of the model from going through the floor.

After the motion has been captured and mapped to a 3D model there are a number of things that we can do in post processing to improve our animation.

We can edit captured motion by cutting, copying and pasting bits of motion in parts of our scene. We can warp motion by changing its speed or by transforming sections of the animation through rotating, translation and scaling. (!) For example, we can scale the size of an actors motion so that it appears as if an adult is dancing with a child.

Finally, we can apply motion signal processing to reduce the effects of input noise.

Some of the limitations of Motion Capture include the issue of data inaccuracy and cost. Most motion capture devices range from $20000US to $210000, this makes motion capture an option available only for large animation studios with lots of cash. Usually most manufacturers will rent time in a motion capture studio rather than go out and buy all of this expensive equipment.Another problem with motion capture is that only realistic motions can be captured. For this reason, in the movie Shrek, (!) motion capture was not used since the director wanted a cartoony look for each of the characters.

Animating shrek involved the manipulation of 750 controls that manipulated both general characteristics like facial expression, walking and breathing to low level joints angles and muscle movements. Animation controls made the characters lips stick together prior to opening. Farquaads forhead was programmed to wrinkle automatically when talking. While motion capture is a useful tool for 3D animation, sometimes it does not give the director enough control over the subtleties of an animation. For example, you cant add more expression to a captured motion, directors have to continually recapture motion to get the action to occur exactly as they desire.

There are a number of reasearchers looking at capturing motion without the need for markers. For example, Bregler et al. presented a paper in Siggraph 2002 about motion capturing cartoons. This figure shows a Jiminy Cricket animation in which his blue top hat has been replaced with a yellow sorcerers hat. The motion captured hat path is mapped to a set of 2 dimensional witchat images to create a very believeable new Jiminy.

(!) Most of you are familiar with the specialized light emitting input device used by tom cruise in minority report to manipulate the positions and locations of various videos on a holographic screen. (!) Well, there exists technology that allows us to create similar interaction techniques today. Using a several infra red light emitting diodes (IR LEDs), two off the shelf web cameras and an open source computer vision library developed at Intel, members of the Interactions Laboratory where I work have developed a 3D wand input device that can be used to interact on a large display.

(!) There are also companies that are selling motion tracking devices such as data gloves for use in computer games. Essential Reality sells a 6 DOF data glove that can track the position of the hand along with the bend of each finger for under $100 US. The availability of this technology has attracted the attention of researchers interested in interactions over large wall or table displays.After conclusion (!)Before the 3D position of a marker can be determined, we need to know where each camera is located in world space coordinates.

Camera calibration is performed by recording a number of image space points whose world space locations are known such as points on a checkboard or . These pairs can be used to created a set of linear equations that can be solved to obtain the location and orientation of each camera.

EdMotionCapturev5.0

Documents

Transcript of EdMotionCapturev5.0