Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular...

13
Binocular and monocular depth cues in online feedback control of 3D pointing movement Center for Visual Science, University of Rochester, Rochester, NY, USA Bo Hu Center for Visual Science, University of Rochester, Rochester, NY, USA David C. Knill Previous work has shown that humans continuously use visual feedback of the hand to control goal-directed movements online. In most studies, visual error signals were predominantly in the image plane and, thus, were available in an observer s retinal image. We investigate how humans use visual feedback about nger depth provided by binocular and monocular depth cues to control pointing movements. When binocularly viewing a scene in which the hand movement was made in free space, subjects were about 60 ms slower in responding to perturbations in depth than in the image plane. When monocularly viewing a scene designed to maximize the available monocular cues to nger depth (motion, changing size, and cast shadows), subjects showed no response to perturbations in depth. Thus, binocular cues from the nger are critical to effective online control of hand movements in depth. An optimal feedback controller that takes into account the low peripheral stereoacuity and inherent ambiguity in cast shadows can explain the difference in response time in the binocular conditions and lack of response in monocular conditions. Keywords: online feedback control, goal-directed movement, depth perception, binocular disparity, cast shadow Citation: Hu, B., & Knill, D. C. (2011). Binocular and monocular depth cues in online feedback control of 3D pointing movement. Journal of Vision, 11(7):23, 113, http://www.journalofvision.org/content/11/7/23, doi:10.1167/11.7.23. Introduction Recent studies on goal-directed pointing movements (Brenner & Smeets, 2003, 2006; Izawa & Shadmehr, 2008; Liu & Todorov, 2007; Prablanc & Martin, 1992; Sarlegna et al., 2004; Saunders & Knill, 2003, 2004, 2005; van Mierlo, Louw, Smeets, & Brenner, 2009) converge on the conclusion that the visual system continuously uses visual information from both the target and the moving hand to guide it to a target. The main paradigm used in these studies has been to measure subjects’ corrective responses to perturbations in the visual information about the position of the finger and/or the target. Comparisons of subjects’ corrective responses with those predicted by ideal observers/actors show that the CNS uses information online in an optimal wayVintegrating visual signals from the moving hand and the target over time in a manner commensurate with the reliabilities of the signals (Izawa & Shadmehr, 2008; Saunders & Knill, 2004). While subjects movements in many of these studies included movement in depth, the perturbations applied, except for Brenner and Smeets (2006), included compo- nents that were readily present in the retinal image plane. This makes it difficult to examine the contribution of visual depth cues to online control in goal-directed pointing tasks. In fact, the tasks in most of these studies were essentially 2D: either the target and starting points were at the same depth or the movement was confined in a plane (typically a horizontal plane). Yet many natural pointing tasks consist of free movements in three dimensions, including motion in depth. The visual information about the hand’s position and movement in depth is qualitatively different from visual information about its position and movement in the image plane. The latter is given directly by the 2D projection of the hand on the retina, while depth information is carried more indirectly by a rich set of visual depth cues, both binocular and monocular. We set out to investigate how humans use visual depth cues from the hand for online feedback control of 3D movements. Binocular depth information from a target has been shown to be important in online control in grasping (Bradshaw & Elliott, 2003; Greenwald & Knill, 2009b; Jackson, Jones, Newport, & Pritchard, 1997; Loftus, Servos, Goodale, Mendarozqueta, & Mon-Williams, 2004) and object place- ment tasks (Greenwald, Knill, & Saunders, 2005; Knill, 2005; Knill & Kersten, 2004; van Mierlo et al., 2009). The CNS also uses monocular depth cues about the orientation of a target object (texture and figure outline shape) to control the orientation of the hand online during grasping and object placement movements (Greenwald & Knill, 2009a; Greenwald et al., 2005; Knill, 2005; Knill & Kersten, 2004). Little is known, however, about the role played by different depth cues in providing feedback about the depth of the moving hand for online control. A recent study by Brenner and Smeets (2006) provides information about the speed at which subjects can respond Journal of Vision (2011) 11(7):23, 113 http://www.journalofvision.org/content/11/7/23 1 doi: 10.1167/11.7.23 Received March 30, 2011; published June 30, 2011 ISSN 1534-7362 * ARVO

Transcript of Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular...

Page 1: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

Binocular and monocular depth cues in online feedbackcontrol of 3D pointing movement

Center for Visual Science, University of Rochester,Rochester, NY, USABo Hu

Center for Visual Science, University of Rochester,Rochester, NY, USADavid C. Knill

Previous work has shown that humans continuously use visual feedback of the hand to control goal-directed movementsonline. In most studies, visual error signals were predominantly in the image plane and, thus, were available in anobserver’s retinal image. We investigate how humans use visual feedback about finger depth provided by binocular andmonocular depth cues to control pointing movements. When binocularly viewing a scene in which the hand movement wasmade in free space, subjects were about 60 ms slower in responding to perturbations in depth than in the image plane.When monocularly viewing a scene designed to maximize the available monocular cues to finger depth (motion, changingsize, and cast shadows), subjects showed no response to perturbations in depth. Thus, binocular cues from the finger arecritical to effective online control of hand movements in depth. An optimal feedback controller that takes into account the lowperipheral stereoacuity and inherent ambiguity in cast shadows can explain the difference in response time in the binocularconditions and lack of response in monocular conditions.

Keywords: online feedback control, goal-directed movement, depth perception, binocular disparity, cast shadow

Citation: Hu, B., & Knill, D. C. (2011). Binocular and monocular depth cues in online feedback control of 3D pointingmovement. Journal of Vision, 11(7):23, 1–13, http://www.journalofvision.org/content/11/7/23, doi:10.1167/11.7.23.

Introduction

Recent studies on goal-directed pointing movements(Brenner & Smeets, 2003, 2006; Izawa & Shadmehr,2008; Liu & Todorov, 2007; Prablanc & Martin, 1992;Sarlegna et al., 2004; Saunders & Knill, 2003, 2004, 2005;van Mierlo, Louw, Smeets, & Brenner, 2009) converge onthe conclusion that the visual system continuously usesvisual information from both the target and the movinghand to guide it to a target. The main paradigm used inthese studies has been to measure subjects’ correctiveresponses to perturbations in the visual information aboutthe position of the finger and/or the target. Comparisons ofsubjects’ corrective responses with those predicted byideal observers/actors show that the CNS uses informationonline in an optimal wayVintegrating visual signals fromthe moving hand and the target over time in a mannercommensurate with the reliabilities of the signals (Izawa& Shadmehr, 2008; Saunders & Knill, 2004).While subjects movements in many of these studies

included movement in depth, the perturbations applied,except for Brenner and Smeets (2006), included compo-nents that were readily present in the retinal image plane.This makes it difficult to examine the contribution ofvisual depth cues to online control in goal-directedpointing tasks. In fact, the tasks in most of these studieswere essentially 2D: either the target and starting pointswere at the same depth or the movement was confined in a

plane (typically a horizontal plane). Yet many naturalpointing tasks consist of free movements in threedimensions, including motion in depth. The visualinformation about the hand’s position and movement indepth is qualitatively different from visual informationabout its position and movement in the image plane. Thelatter is given directly by the 2D projection of the hand onthe retina, while depth information is carried moreindirectly by a rich set of visual depth cues, both binocularand monocular. We set out to investigate how humans usevisual depth cues from the hand for online feedbackcontrol of 3D movements.Binocular depth information from a target has been shown

to be important in online control in grasping (Bradshaw &Elliott, 2003; Greenwald & Knill, 2009b; Jackson, Jones,Newport, & Pritchard, 1997; Loftus, Servos, Goodale,Mendarozqueta, & Mon-Williams, 2004) and object place-ment tasks (Greenwald, Knill, & Saunders, 2005; Knill,2005; Knill & Kersten, 2004; van Mierlo et al., 2009). TheCNS also uses monocular depth cues about the orientationof a target object (texture and figure outline shape) tocontrol the orientation of the hand online during graspingand object placement movements (Greenwald & Knill,2009a; Greenwald et al., 2005; Knill, 2005; Knill &Kersten, 2004). Little is known, however, about the roleplayed by different depth cues in providing feedback aboutthe depth of the moving hand for online control.A recent study by Brenner and Smeets (2006) provides

information about the speed at which subjects can respond

Journal of Vision (2011) 11(7):23, 1–13 http://www.journalofvision.org/content/11/7/23 1

doi: 10 .1167 /11 .7 .23 Received March 30, 2011; published June 30, 2011 ISSN 1534-7362 * ARVO

Page 2: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

to abrupt changes in depth of a target or an effectorcontrolled by an observer. In their task, subjects used amouse to move a cursor quickly from a starting locationon a computer screen to a target location. The targetlocation appeared at two different depths relative to thescreen and both the cursor and the target were shown instereo. Early in the movement on some trials, either thecursor or the target jumped 15 cm in depth toward or awayfrom the observer. Subjects corrected for these perturba-tions within approximately 200 ms of the perturbations;thus, it is clear that in principle subjects can correct forchanges in depth of an effector quickly, though not quiteas quickly as they can correct for perturbations in theimage plane.These data provide a useful bound on the possible speed

with which the CNS can correct for movement errors indepth based on visual feedback, but it is unclear whether itreflects the behavior of an autopilot feedback system thatnormally controls hand movements or something else.First, the perturbations used in the study were extremelylarge, much larger than the errors one would normally findfrom noise in the sensorimotor system. Second, the largeperturbations created sizable sharp temporal transients inthe visual feedback from the hand that is not normallyassociated with naturally occurring movement errors.Finally, in part because of the size and unnaturalness ofthe perturbations, the perturbations must have been clearlydetectable by subjects who could then have been attunedto detect and correct for them.In the current experiments, subjects performed a point-

ing task in free 3D space behind a mirror through whichvisual feedback was given by a binocularly renderedvirtual finger optically co-aligned with the subjects’ realfinger. The subjects’ goal was to touch a target ballpositioned in the workspace by a robot arm and rendered inthe virtual display to be optically co-aligned with theactual ball. We measured corrective responses on trials inwhich the position and/or motion of the virtual finger wasperturbed by a small amount in depth or in the image plane(within the variance of movement trajectories) when thefinger disappeared behind a virtual occluder.The first experiment was designed to assess basic

properties of how the CNS uses visual depth informationabout the moving hand during online control of pointingmovements. Subjects made unconstrained pointing move-ments in free space, so that the only visual objects in thescene were the starting position, the target ball, and thefinger. We measured subjects’ corrective responses tosmall perturbations of the virtual finger in depth andcompared them with subjects’ responses to equal-sizedperturbations in the image plane. We further explored therelative efficacy of position and motion information bylooking at corrective responses to simple step perturba-tions in depth and perturbations in which the virtual fingeris rotated around the target, causing an initial shift indepth but a change in motion direction to keep the motionof the virtual finger relative to the target unchanged.

In the first experiment, visual cues to finger depthincluded both binocular disparity cues (static anddynamic) and a few monocular cuesVfinger size (staticand dynamic) and motion parallax (using kinestheticmotion signals as a baseline for scaling velocity in theimage plane to estimate depth and motion in depth). In asecond experiment, we created a visual scene designed tomaximize monocular cues to depth and measured thecontributions of these cues to online feedback control bymeasuring subjects’ corrective responses to depth pertur-bations in monocular conditions and comparing them totheir corrective responses to image plane perturbations. Inparticular, we rendered a textured tabletop over whichsubjects moved their fingers and illuminated the scenewith a directional light source to add cast shadows ofthe finger and the target. While cues like size changemay be ineffective for feedback control because of therelatively small difference in the amount of size changecreated by natural variations in hand movements,numerous studies (Kersten, Knill, Mamassian, & Bulthoff,1996; Mamassian, Knill, & Kersten, 1998) have shownthat cast shadows can greatly affect depth perception inboth static and dynamic scenes. In the “ball in box”experiment, for example, Kersten, Mamassian, and Knill(1997) demonstrated that cast shadows induce a strongsense of motion in depth even when the size of an objectstays constant. It, thus, seems plausible that the visualsystem can utilize cast shadows to guide finger movement.On the other hand, the depth information from castshadows is inherently ambiguous; it depends on the visualsystem’s knowledge of the light source and the geometryof the surface that the shadows are cast on. In thisexperiment, we presented subjects the needed informationto remove the ambiguity, making cast shadows atheoretically viable depth cue. Similar to binoculardisparities, the CNS could use the shadow information toinfer depth information or it could use the relative motionof the finger’s and target’s shadows directly to guide thehand.

Methods

General design

Subjects performed a pointing task by moving theirindex finger from a starting ball to a target ball in 3Dspace. Visual information about the finger and the ballswas provided by computer graphics displayed on a CRTmonitor and reflected into the workspace by a mirror(Figure 1). In over half of the trials, the virtual fingerrendered in the display was displaced from the real fingerposition by a small amount. This perturbation happenedwhen the finger was behind a virtual occluder, whichmasked the onset of the perturbations (Figures 2 and 3).

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 2

Page 3: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

The perturbations were applied in a moving coordinateframe centered at the current position of the real finger. Ateach instance, the line of sight defined by the cyclopeaneye (the point midway between the two eyes) and thefinger was what we call the depth dimension and the planeperpendicular to it was the image plane (Figure 4a). Thegoal was to separate and compare how visual signals indepth and in the image plane were processed; we,therefore, added the same perturbations both in depth(in-depth perturbations) and in the image plane (in-imageperturbations).We applied two types of perturbation to the virtual

finger in Experiment 1. The first was a small, 1-cm stepperturbation that added a fixed offset between the real andvirtual finger until the end of movement (Figure 4b). Thesecond was a small rotation perturbation, in which theangle between the virtual finger and the target wasincreased or decreased by 2.6–3.9 degrees (so as to createan initial 1-cm shift in position when the virtual fingerappeared from behind the occluder). These were alsoimposed in a moving coordinate frame in which the z-axiswas aligned with the line of sight and included rotations inthe X–Z plane (in-depth rotation perturbations) androtations in the X–Y plane (image plane rotations).Rotation perturbations caused positions shifts that startedat 1 cm when the finger emerged from the occluder anddecreased over time, vanishing at the target position(Figure 4c). The rotation perturbations keep the motion

of the virtual finger relative to the target unchanged andcorrective responses actually decrease endpoint accuracy.The visual display in Experiment 2 was richer than that in

Experiment 1 to give subjects information of where thelight source was and the distance and orientation of theplane that the shadows were cast on. We used a fixeddirectional light source in all trials and set the light sourcedirection from above the subjects’ headVin agreementwith people’s prior assumptions on light source direction.We put a checkerboard texture on the ground plane, whichall the shadows were cast on and covered the field of view(Figure 3b). The ground plane coincided with the tabletopused during the initial calibration of subject’s eye positions(see Procedures section) at the beginning of each session.The ground plane was about 50 cm from a subject’scyclopean eye. The virtual target ball was rendered on topof a pole, which was perpendicular to the ground plane andwhose height changed with the position of the target ball.The pole provided extra information along with its shadowto estimate light source direction and to localize the virtualtarget. This scenario also created the possibility for subjectsto use the relative position of the two cast shadows directlyas a control signal for correcting movement errors in depth.In view of humans’ high sensitivity to dynamic

monocular cues to motion in depth, for example, resultsshowing a dramatic influence of cast shadow motion onperceived motion in depth (Kersten et al., 1997), it is clearthat motions of various kinds in the image can providestrong monocular cues to motion in depth. Static monoc-ular depth cues, however, are poor indicators of absolutedepth from the viewer. We, therefore, perturbed thedirection of motion of the virtual finger rather than itsposition when it emerged from behind the occluder. Theperturbations started at 0 and increased over time so that ifsubjects did not correct for the perturbation the virtualfinger would be 1 cm away from the target in theappropriate dimension (in-image or in-depth; Figure 4d).

Subjects

Sixteen subjects participated in the study, eight in eachexperiment. All subjects had corrected vision and theeight subjects in Experiment 1 had scores of eight orhigher on the Randot (Precision Vision, IL) stereo test. Allsubjects were right-handed. All subjects were students ofthe University of Rochester and naive for the purpose ofthe experiments. All provided informed consent inaccordance with the guidelines from the University ofRochester Research Subjects Review Board.

Apparatus and display

Figure 1 illustrates the physical setup for the experi-ments. The tasks were performed in a calibrated virtual

Figure 1. The schematic of the experiment setup. Subjects movedtheir finger to reach for the target ball mounted on a robot arm.The visual information of the finger and the balls was provided bycomputer graphics on the computer monitor and reflected into theworkspace. Subject could not see their hand or the balls duringthe movement.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 3

Page 4: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

reality setup. The stimuli and visual feedback weredisplayed on a CRT monitor and reflected into theworkspace by a half-silvered mirror. The monitor had aresolution of 1152 � 768 pixels and a refresh rate of120 Hz. Subjects in Experiment 1 wore LCD shutterglasses (CrystalEye, RealD, CA) and viewed the scenebinocularly. Subjects in Experiment 2 viewed the scenewith their left eye and covered their right eye with an eye

patch. The scene was drawn in red to take advantage ofthe fast decay time of the red phosphor of the CRT displayand minimize cross-talk between the two eyes in Experi-ment 1. Subjects could not see their hands during theexperiment.The starting ball was fixed to a platform in the right

field of the subjects’ visual space and the target ball wasmoved by a Denso V-series robot (Aichi, Japan). Both balls

Figure 2. A trial sequence. (a) Subjects start from a fixed starting point in their right field of view and moved their finger toward a target ball30 cm away. At the initial stage of the movement, the displayed finger (virtual finger) coincided with the unseen real finger (the green line).The point of view here is from above, not from the subject’s point of view. (b) In perturbed trials, the position of the virtual finger waschanged (the blue line) when it was behind the occluder. Subjects would have to compensate for the perturbation to consistently reach thetarget. The scene is drawn from a side view to show the otherwise occluded perturbation.

Figure 3. Visual stimuli. (a) In Experiment 1, subjects saw the virtual finger, the starting ball (on the right side), the target ball, and theoccluder, rendered to both eyes. (b) In Experiment 2, the target ball was displayed on top of a pole. Subjects also saw the cast shadows ofthe objects. The scene was rendered only to the left eye.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 4

Page 5: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

were rendered in the virtual display as shaded sphereswith their true physical sizes and locations (Figure 3). Thecenter-to-center distance of the start and target balls wasalways 30 cm. The position of the starting ball was fixedacross all trials and was designated the origin of thefrontal plane, whose normal was the vector from thecyclopean eye to the center of the reflected screen andthe horizontal axis was parallel to the vector from the lefteye to the right eye. The position of the target ball wasrandomly chosen from a patch of the sphere centered atthe starting ball with a radius of 30 cm. The patch wascentered on a point in the frontal plane 28.2 cm to the leftof the starting position and 10.3 cm above the midline of thedisplay. The patch spanned T7.5 degrees in azimuth andelevation around the center point, translating into a rangein the image plane and in depth of approximately T4 cm.Three infrared markers were attached to a metal splint

worn on the subjects’ index finger and the position of themarkers was tracked by an Optotrak 3020 system (NDI,Ontario, Canada) at 120 Hz. The position and orientation(or pose) of the finger was calculated from the tracking

data in real time. The virtual finger was rendered as a half-sphere with a radius of 0.85 cm on top of a cylinder with alength of 1 cm, about the same size of the metal splintworn by subjects on their right index finger. It wasrendered at the measured 3D position and orientation of asubjects’ finger, except when it was perturbed from thisposition on perturbation trials. To compensate for thedelay caused by the tracking system and the computerrendering loop, the pose of the displayed finger wasdetermined by linearly extrapolating for 17 ms (2 frames)from the latest 25 ms (3 frames) of marker data.An annular-shaped virtual occluder was displayed 5 cm

above the frontal plane. At this distance, the finger wasbehind the occluder in a natural movement. The virtualoccluder was centered at the starting ball with an innerradius of 5 cm and a width of 5 cm. With thisconfiguration, the finger disappeared behind the occluderabout 1/6 of the total distance to the target and emergedfrom behind the occluder about 1/3 of the total distance.The starting and target balls were made of aluminum,

with a radius of 0.94 cm. The balls and the splint were

Figure 4. Experiment conditions. (a) All perturbations were applied in a local coordinate frame. The depth or the z-axis is defined by thecyclopean eye and the current real finger position. The x-axis is horizontal in the image plane and the y-axis is defined by the right-handrule. In-depth perturbations were added along the z-axis and in-image perturbations along the y-axis. The blue arrows in the figure showthe motion of the virtual image of the finger when the motion of the real finger is directly toward the target. (b) Step perturbations added afixed amount (T1 cm) relative to the current real finger position (left) in depth or (right) in the image plane. (c) Rotation perturbations (left) indepth and (right) in the image plane were T1 cm when the finger came out of the occluder and decreased so that it would be 0 when thefinger reached the target. (d) Direction perturbations (left) in depth and (right) in the image plane were 0 when the finger emerged from theoccluder and increased so that the virtual finger would be T1 cm away from the target in the appropriate dimension.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 5

Page 6: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

connected in a contact detection circuit. When the fingertouched either ball, it closed one of the two circuit loopsand the state of the circuit was sampled at the same rate(120 Hz) as the infrared markers by the ODAU module ofthe Optotrak system. This provided a direct measure of thebeginning and end of a movement.

Procedures

Each subject completed five 1-h sessions on separatedays, each of which consisted five to eight blocks,depending on how fast the subject could perform the task.Each block had thirty baseline trials and twenty perturbedtrials of all 4 types of perturbations. The trials wererandomly intermixed under the constraint that no twoconsecutive trials were of the same perturbation type.At the beginning of each session, subjects performed a

geometric calibration procedure by repeatedly moving anOptotrak marker on a flat tabletop and aligning the markerto reference points displayed on the monitor. Geometrictransformations among the coordinate frames associatedwith the Optotrak, the computer monitor, and the subjects’eyes were computed from the procedure. The informationwas used to render the scene with correct perspective andbinocular disparity to both eyes.Subjects initiated a trial by touching and resting on the

starting ball with their index finger. Once the contactdetection circuit signaled that the finger was on the ball,the robot moved the target ball to a randomly chosenposition and the virtual target was displayed on the screen.Subjects started the movement on an auditory cue. Datacollection started when the finger left the starting ball.Subjects were instructed to move to reach for the target attheir natural pace. A successful trial ended when the fingertouched the target ball within 500–1200 ms. If the fingerdid not touch the target within the time limit but was close(within 3 cm) to the target at any time during the movement,the trial was recorded as a miss. Otherwise, the trial wastreated as invalid and recorded data were discarded. A textmessage would inform the subject to speed up or slow downaccordingly. On a successful trial, subjects saw the virtualfinger touching the virtual target and also felt the physicalcontact between the splint and the target ball. The virtualtarget disappeared upon contact and the program waited forthe subject to initiate the next trial.

Data processing

We removed various irregularities before analyzing thedata. We used linear interpolation to fill in missing framesin the recorded marker data if necessary. However, ifthere were 3 or more consecutive missing frames, the trialwas discarded. We removed the slowest and fastest 10% ofthe trials. We then removed all of the remaining trials thathad a maximal acceleration of 57 m/s2 or higher. These

were typically trials in which subjects rushed to the targetand searched around the target before the trial time expired.We measured the timing and strength of subjects’

responses to the perturbation using an autoregressive(AR) model. Let t = 0 be the time (in Optotrak frames)that the virtual finger emerged out of the occluder and thefinger position at time t be xt; thus, we have

xt ¼Xn

i¼1

aixtji þ ut þ "t; ð1Þ

where ai (i = 1In) are the coefficients of the AR modeland ut is a constant term. The residual (t has zero mean byconstruction. We fitted the AR model from baseline trialsand computed the raw perturbation influence function wt

of each perturbation type from

xt ¼Xn

i¼1

aixtji þ ut þ wtpt; ð2Þ

where pt is the amount of perturbation (set to +1 or j1).The raw influence function and the residual noise weresmoothed by a causal exponential filter, f(t) = e1t / 1, t G 0,with the time constant 1 = 37 ms.We used the following procedure to estimate each

subject’s reaction time to begin correcting for perturba-tions. For each subject, we estimated the standard error ofthe smoothed influence functions under the null hypoth-esis of no correction (wt = 0) using a bootstrap procedurein which we fit Equation 2 to resampled data from the noperturbation trials with the perturbation (pt) set randomlyto T1 on each trial. The standard deviation of the resultingbootstrapped estimates of the smoothed influence func-tions provides a measure of the standard deviation ofweights expected under the null hypothesis that subjectsdid not correctVtechnically, it is the standard deviation ofthe values of wt one would expect if subjects did not begincorrecting at a time less than or equal to t. We used as ourmeasure of reaction time the first time at which asmoothed perturbation function deviated from 0 by onestandard error and remained more than one standard erroraway from 0 for more than 15 frames (125 ms). Reactiontime greater than 40 frames (333 ms) indicated a latecorrection and we considered the subject as not respond-ing to the perturbation.The response at time t, c(t), was given by the time

integral of wt, c(t) = ~ti¼1wi. The result gives a normed

measure of subjects’ responses, where a response of 1corresponds to a 1-cm deviation in movements withperturbations from movements on unperturbed trials.We, therefore, express subjects’ responses in centimeters.The magnitude of the response at the end of the movementis computed by summing the weights up to the point thatsubjects first touched the target.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 6

Page 7: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

We used a 10th order (n = 10) AR model in theanalysis, though any model above 4th order gave similarresults.

Results

Experiment 1: Binocular conditions

Subjects performed well in the binocular task and theaverage miss rate of all eight subjects was 6.8%. The ratioof the missed trials of each perturbation type was similarto the overall ratio, so the perturbations themselves werenot the cause of the misses.All subjects responded to the perturbations in the image

plane, which was consistent with the findings in Saundersand Knill (2004). The mean reaction time to the stepperturbation was 173 ms (SEM 11 ms) and that to therotation perturbation was 187 ms (SEM 12 ms). Allsubjects responded to the step perturbation in depth withan average reaction time of 230 ms (SEM 12 ms). Only 2of the 8 subjects showed responses to the rotationperturbation in depth and their average reaction time was175 ms (SEM 13 ms).

Figure 5 shows the average influence functions of allsubjects. The apparent divergence from 0 of the influencefunction for the in-depth rotation perturbation was totallydriven by the 2 subjects that responded to the perturba-tions (Figure 6); however, even those subjects respondedmore strongly to the rotation perturbations in the imageplane. Within the in-image and in-depth conditions,subjects showed stronger responses to the step perturba-tions than the rotation perturbations, which agreed with theprevious findings (Saunders & Knill, 2004). Figures 5 and6 also show the transitory effect to the rotation perturba-tions, both in the in-image and in-depth conditions.Figure 7a plots the average response to perturbations as

a function of the distance to the target. For stepperturbations, negative responses reflect appropriate cor-rections to bring the finger closer to the target at the end ofthe movement. For rotation perturbations, responses(positive or negative) result in larger endpoint errors.The mean endpoint correction of the in-image stepperturbation was 0.53 cm, with 95% confidence intervalof [0.49, 0.57], corresponding to a 53% correction, whileit was 0.3 cm (95% confidence interval of [0.24, 0.36]) forthe in-depth perturbations (Figure 7b).No subject noticed the perturbations as reported in the

exit debriefing, even after we explained what theperturbations were. Fixation during the task was not

Figure 5. Average influence functions of all the subjects in Experiment 1. Error bands are T1 SE. The reaction time was defined when theinfluence function deviated from 0 by 1 SE. Time 0 was when the virtual finger emerged from the occluder. The four panels (a–d)correspond to the four perturbation conditions. The response to the in-depth perturbation was driven by the data from two subjects (seeFigure 6).

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 7

Page 8: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

enforced by an eye tracker, but all subjects reported tohave fixated at the target.

Discussion

Assuming subjects indeed fixated on the target, theaverage eccentricity when the finger came out of theoccluder was 17 degrees and the average pedestaldisparity was 114 arcmin (crossed), corresponding to aposition approximately 2.8 cm nearer to the observer thanthe geometric horopter, though this varied from trial totrial with a standard deviation of 126 arcmin. Thoughstereoacuity values are highly dependent on the measure-ment method and stimuli, and there are no direct data

available at said eccentricity and pedestal disparity,extrapolating one measurement (Howard & Rogers,1995), the threshold disparity difference for depth dis-crimination would be expected to be above 400 arcmin.The stereo acuity is, thus, very low compared to theacuity of position judgments on the retinaVapproximately51 arcmin at that eccentricityVcomputed using anestimated Weber fraction for position estimates of 0.05(Burbeck, 1987; Burbeck & Yap, 1990; Whitaker &Latham, 1997). Because of the markedly low acuity ofbinocular disparities in the periphery, it would take longerfor an optimal controller to integrate the disparity signalsinto its estimate of finger position, explaining whysubjects responded slower and more weakly to the in-depth step perturbation than the in-image one. A similardifference in delay was observed by Brenner and Smeets(2006) as well. An alternative explanation of the apparentdifference in the delay for corrective responses is that thevisual system takes longer to process information aboutdepth than position signals in the retinal plane. Measure-ments of the time constant for integrating inputs from thetwo eyes for stereopsis (Ludwig, Pieper, & Lachnit, 2007;Ogle, 1963) put the latency of binocular depth processingin perceptual tasks at 50 ms to 100 ms, which couldexplain the observed differences in delay. However, arecent study found that the latency can be much lower,about 14 ms, in visuomotor tasks (Wilson, Pearson,Matheson, & Marotta, 2008). Since the two factors leadto similar effects on performance, we cannot sort out fromthe current data the relative contribution of each one to thedifference in apparent delay.Six of the eight subjects did not respond to the rotation

perturbations. Given that they responded to the stepperturbations, with the same initial sizes (1 cm), it seemsplausible that those subjects relied mainly on the motionof the finger in depth relative to the target and/or

Figure 6. Average influence functions of the 2 subjects whoresponded to the in-depth rotation perturbation (dotted line) andthe 6 subjects who did not (solid line). Note that the reaction timewas defined as the first time that the influence function wasdifferent from 0 by 1 standard error and remained different for125 ms. Under this definition, the 6 subjects did not showresponse to the rotation perturbation in depth.

Figure 7. Responses to perturbations in Experiment 1. Negative responses reflect corrections that bring the finger closer to the target forstep perturbations. For rotation perturbations, responses presumably move the finger away from the target at the end. (a) The averageresponse to perturbations during the movement. (b) Plots of the endpoint corrections for all the conditions. Error bars are 95% confidenceintervals.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 8

Page 9: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

the change in relative disparity to reach for the target(a homing strategy) in contradiction with the findings ofBrenner and Sweets (2006). Two considerations argueagainst this. First, those same subjects showed a responseto the rotation perturbations in the image plane. Onewould, therefore, have to posit that the CNS uses adifferent strategy for controlling hand movement in depthand for controlling hand movement in the image plane. Onthe face of it, this seems unlikely, as the distinctionbetween depth and image plane dimensions does not mapnaturally to the motor system being controlled.Second, Saunders and Knill (2004) have shown that for

the image plane perturbations, the subjects’ behavior isconsistent with an optimal controller that simultaneouslyestimates the position and velocity of the finger and usesthese estimates as input to a controller that generatesonline motor commands. While such a system necessarilydisplays corrective responses to simple position shifts ofthe visual feedback, its responses to rotational perturba-tions depends on the relative contributions of the sensorysignals for position and velocity to their internal estimates.As the relative uncertainties of those signals change, theresponse to the rotation perturbations changes. As wasfound here, Saunders and Knill found that a simplerotation of visual feedback in the image plane led to fastinitial responses in a direction opposite to the positionshift in the visual finger position created by the rotation.The response was weaker than the response to a positionshift perturbation, consistent with a model in whichsensory information about the motion of the fingercontributes to internal estimates of finger motion, mitigat-ing the effects of the position shift signaled by a rotationperturbation. They found that when the visual feedbackwas perturbed by a combination of a rotation and a shift soas to initially create a feedback signal shifted 1 cm in onedirection away from the finger but with movement towarda point shifted 1 cm away from the target in the oppositedirection (for which subjects should correct in the samedirection as the positional shift), responses to theperturbation were delayed by over 100 ms relative to theresponses to simple rotation perturbations. Simulationresults showed this pattern to be consistent with theknown sensory noise parameters on position and velocitysignals; that is, early in the movement, the sensoryposition signal contributes more strongly to internalestimates of finger position than do velocity signals, sothat the motion feedback had to be doubled relative to theposition feedback to effectively cancel out the responses.Given the consistency of subjects’ behavior in response

to image plane perturbations with an optimal feedbackcontroller constrained by sensory noise on position andmotion signals, it is natural to interpret the current data toindicate that the sensory noise on position and motionsignals in depth for 6 of the eight subjects are such that thetwo signals effectively are given equal weight whenintegrated into internal estimates of finger position andmotion, while for two of the subjects, the sensory noise on

motion in depth is effectively higher than the sensorynoise on position in depth (relative to the noise levels inthe other subjects). Since we do not have good data on therelative uncertainties of those signals in depth as we do forsignals in the image plane, we are unable to parameterizean optimal feedback control model to test these predic-tions. The fact that six subjects showed early correctiveresponses for step perturbations but not for rotationalperturbations provides strong evidence that the CNS usesnot only sensory information about depth but also aboutmotion in depth for online control.

Experiment 2: Monocular conditions

Subjects performed worse in the monocular conditionsthan in the binocular conditions; the average miss rate ofall subjects was 17.7%, significantly higher than that inthe binocular conditions in Experiment 1 (6.8%). Subjectswere also 23% more likely to miss in the trials within-depth perturbations than the in-image ones (22.2%for in-depth perturbations and 18.1% for image planeperturbations).All subjects responded to the in-image perturbations.

The reaction time for the step perturbation was 173 ms(SEM 12 ms) and that of the direction perturbation was221 ms (SEM 19 ms; see Figure 8). The average amountof correction to the in-image step perturbations was0.73 cm, with 95% confidence interval of [0.65, 0.81] andthat to the in-image direction perturbations was 0.51 cm,with 95% confidence interval of [0.45, 0.57]. Subjects’corrections to in-image step perturbations were larger thanwe found in Experiment 1, but this can be explained by theincreased movement duration in Experiment 2V777 mscompared to 659 ms in Experiment 1. Subjects showed nosignificant response to the perturbations in depth (seeFigures 8 and 9). Average corrections to the step androtation perturbations in depth were 0.01 cm, with 95%confidence interval of [j0.068, 0.088], and 0.03 cm, with95% confidence interval of [j0.048, 0.108], respectively.The 95% confidence bounds on these estimates put themaximal corrections at 0.088 and 0.108; thus, while wecannot conclude definitively that subjects did not correctfor the perturbations in depth, we can conclude that anycorrections they made were very small in proportion to thesize of the perturbations.No subject noticed the perturbations and no subject

reported to have used a simple homing strategy byaligning the shadow of the finger to the shadow of thetarget ball.

Discussion

The reaction time to the in-image step perturbation wasthe same as that in Experiment 1. The reaction time to thein-image direction perturbations was slightly longer,

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 9

Page 10: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

which is consistent with the results in Saunders and Knill(2004). Subjects showed no significant response to theperturbations in depth. This suggests that none of themonocular cues present in the display, including castshadows, are effective online cues for online control offast pointing movements.Several factors may hinder the use of shadows as a

reliable online depth cue. Although in theory, given that

the light source is directional, its direction can be solvedfrom just two frames of finger–shadow correspondences,in practice, the solution involves the intersection of thevectors connecting those correspondences and the estimatecan be very noisy. Strong prior assumptions about lightsource directions can also negatively affect how peopleestimate depth from cast shadows. Though we put thevirtual light source above the subjects’ heads, Mamassian

Figure 8. Average influence functions of all the subjects in Experiment 2. Error bands are T1 SE. The four panels (a–d) correspond to thefour perturbation conditions. That the influence functions of the in-depth perturbations were not significantly different from 0 showed thatsubjects did not respond to the in-depth perturbations.

Figure 9. Responses to perturbations in Experiment 2. As before, negative responses reflect corrections that improve pointing accuracy.(a) The average amount of correction during the movement. (b) Subjects made little corrections to the in-depth perturbations and theendpoint corrections were not significantly different from 0 (error bars are 95% confidence intervals). Subjects corrected more to the in-image perturbations than to the same perturbations in Experiment 1.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 10

Page 11: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

et al. reported that the prior on light source position canbe biased toward the left by as much as 26 degrees(Mamassian & Goutcher, 2001). Given a strong andlikely biased prior and noisy visual input, it could take anoptimal estimator very long to infer the 3D position of thefinger from its shadows, rendering cast shadows ineffec-tive as an online depth cue. That subjects did not useshadows as an online visual cue does not mean theycannot use them to reach for the target at the very end.Studies on how shadows help in motor tasks are sparse,but everyday examples abound.It is always a concern that the scene was not rendered

realistically enough for subjects to trust the shadowinformation. For example, since we used a directional lightsource, no penumbrae were rendered, while in real worldpenumbrae are always present. This, however, does notchange the underlying geometry; furthermore, the effective-ness of shadows in perceptual tasks is remarkably robust tothe realism of the rendering (Mamassian et al., 1998).

General discussion

All of the reaction times to the in-image perturbations aresomewhat longer than those of corresponding perturba-tions in Saunders and Knill (2004; 143 ms for rotation and146 ms for step perturbations) and all of the reaction timesto the in-depth perturbations are longer than that in Brennerand Smeets (2006). This can be attributed to at least twofactors. First, the size of the perturbations is only half ofthose used in Saunders and Knill (1 cm vs. 2 cm) andmuch smaller than those in Brenner and Smeets. Given theconservative statistical techniques for detecting the begin-ning of corrective responses, smaller perturbations (hence,smaller corrective responses) necessarily result in laterestimates of response times. Second and perhaps moreimportantly, the current task was different from previousones. Subjects had to bring the speed of the finger to nearzero when approaching the target ball for an accuratetouch, whereas in Saunders and Knill subjects could hitthe tabletop at any speed, knowing they would be stopped.Subjects in Brenner and Smeets were required to hitvirtual targets with a virtual cursor and had no endpointspeed constraint either. Liu and Todorov (2007) observedthat when subjects were required to stop at the target, themovement duration was longer than simply asked to hitthe target at arbitrary speed. This trade-off betweenendpoint stability and accuracy is nicely captured by anoptimal feedback control model. The model also predictsthat the amount of correction is smaller in the stop-at-target case, which explains why the corrections to thestep perturbations in both experiments (53% and 73%,respectively) are smaller than that (80%) in Saunders andKnill.

While we have focused our discussion on cast shadows,the apparent size of the finger (and how it changes overtime) is another potentially salient monocular cue to depthand motion in depth. With the average distance betweenthe fingertip and the cyclopean eye at 50 cm in a naturalpointing task and size of the finger at 1.7 cm, a 1-cmperturbation in depth corresponds to about 2 arcmin ofsize change. Considering that the Weber fraction ofangular size judgment threshold is around 6% (Mckee &Welch, 1992), or 7 arcmin under our experiment con-dition, the uncertainty in the size cue renders it effectivelyuseless for online control.The results suggest that binocular disparities are indis-

pensable for fast accurate pointing movement, not onlybecause they are important for estimating the depth of atarget object but also because they are used by the CNS foronline feedback control of the moving hand. While relevantstudies do not dissociate the contribution of stereopsis toestimates of target depth and online control, it is clear thatstereoscopic function significantly impacts performance onclose-range motor tasks (O’Connor, Birch, Anderson,Draper, & FSOS Research Group, 2010) and people withpoor stereoscopic vision such as amblyopes performworse on tasks like grasping than normals (Grant,Melmoth, Morgan, & Finlay, 2007). In general, stereo-acuity has direct impact on how well we perform close-range accurate motor tasks (O’Connor et al., 2010).

Conclusions

The experiments show that the CNS uses stereoscopicdepth information about the moving hand for onlinecontrol of pointing movements. Monocular depth cues,however, appear ineffective for online control, suggestingthat stereoscopic disparities dominate online control ofhand movements in depth. That online depth informationhas less of an influence in online control than informationabout the position of the hand in the image plane isconsistent with what one would expect from an optimalcontroller that weighs the sensory input for estimatinghand state in proportion to the reliability of the signals.

Acknowledgments

This work was supported by NIH Grant R01 EY017939to the second author.

Commercial relationships: none.Corresponding author: Bo Hu.Email: [email protected]: Center for Visual Science, University ofRochester, P.O. Box 270270, Rochester, NY 14627, USA.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 11

Page 12: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

References

Bradshaw, M. F., & Elliott, K. M. (2003). The role ofbinocular information in the on-line control ofprehension. Spatial Vision, 16, 295–309.

Brenner, E., & Smeets, J. B. J. (2003). Fast corrections ofmovements with a computer mouse. Spatial Vision,16, 365–376.

Brenner, E., & Smeets, J. B. J. (2006). Two eyes in action.Experimental Brain Research, 170, 302–311.

Burbeck, C. A. (1987). Position and spatial frequency inlarge-scale localization judgments. Vision Research,27, 417–427.

Burbeck, C. A., & Yap, Y. L. (1990). Two mechanismsfor localization? Evidence for separation-dependentand separation-independent processing of positioninformation. Vision Research, 30, 739–750.

Grant, S., Melmoth, D. R., Morgan, M. J., & Finlay, A. L.(2007). Prehension deficits in amblyopia. InvestigativeOphthalmology & Visual Science, 48, 1139–1148.

Greenwald, H. S., & Knill, D. C. (2009a). A comparisonof visuomotor cue integration strategies for objectplacement and prehension. Visual Neuroscience, 26,63–72.

Greenwald, H. S., & Knill, D. C. (2009b). Cue integrationoutside central fixation: A study of grasping indepth. Journal of Vision, 9(2):11, 1–16, http://www.journalofvision.org/content/9/2/11, doi:10.1167/9.2.11. [PubMed] [Article]

Greenwald, H. S., Knill, D. C., & Saunders, J. A. (2005).Integrating visual cues for motor control: A matter oftime. Vision Research, 45, 1975–1989.

Howard, I. P., & Rogers, B. J. (1995). Binocular visionand stereopsis. New York: Oxford University Press.

Izawa, J., & Shadmehr, R. (2008). On-line processing ofuncertain information in visuomotor control. Journalof Neuroscience, 28, 11360–11368.

Jackson, S. R., Jones, C. A., Newport, R., & Pritchard, C.(1997). A kinematic analysis of goal-directed pre-hension movements executed under binocular,monocular, and memory-guided viewing conditions.Visual Cognition, 4, 113–142.

Kersten, D., Knill, D. C., Mamassian, P., & Bulthoff, I.(1996). Illusory motion from shadows. Nature, 379, 31.

Kersten, D., Mamassian, P., & Knill, D. C. (1997).Moving cast shadows induce apparent motion indepth. Perception, 26, 171–192.

Knill, D. C. (2005). Reaching for visual cues to depth:The brain combines depth cues differently for motorcontrol and perception. Journal of Vision, 5(2):2,

103–115, http://www.journalofvision.org/content/5/2/2,doi:10:1167/5.2.2. [PubMed] [Article]

Knill, D. C., & Kersten, D. (2004). Visuomotor sensitivityto visual information about surface orientation.Journal of Neurophysiology, 91, 1350.

Liu, D., & Todorov, E. (2007). Evidence for the flexiblesensorimotor strategies predicted by optimal feedbackcontrol. Journal of Neuroscience, 27, 9354–9368.

Loftus, A., Servos, P., Goodale, M. A., Mendarozqueta, N.,&Mon-Williams, M. (2004). When two eyes are betterthan one in prehension: Monocular viewing and end-point variance. Experimental Brain Research, 158,317–327.

Ludwig, I., Pieper, W., & Lachnit, H. (2007). Temporalintegration of monocular images separated in time:Stereopsis, stereoacuity, and binocular luster. Percep-tion & Psychophysics, 69, 92–102.

Mamassian, P., & Goutcher, R. (2001). Prior knowledgeon the illumination position. Cognition, 81, B1–B9.

Mamassian, P., Knill, D. C., & Kersten, D. (1998). Theperception of cast shadows. Trends in CognitiveSciences, 2, 288–295.

Mckee, S. P., & Welch, L. (1992). The precision of sizeconstancy. Vision Research, 32, 1447–1460.

O’Connor, A. R., Birch, E. E., Anderson, S., Draper, H.,& FSOS Research Group. (2010). The functionalsignificance of stereopsis. Investigative Ophthalmol-ogy & Visual Science, 51, 2019–2023.

Ogle, K. N. (1963). Stereoscopic depth perception andexposure delay between images to the two eyes.Journal of the Optical Society of America, 53,1296–1304.

Prablanc, C., & Martin, O. (1992). Automatic controlduring hand reaching at undetected two-dimensionaltarget displacements. Journal of Neurophysiology,67, 455.

Sarlegna, F., Blouin, J., Vercher, J.-L., Bresciani, J.-P.,Bourdin, C., & Gauthier, G. M. (2004). Onlinecontrol of the direction of rapid reaching movements.Experimental Brain Research, 157, 468–471.

Saunders, J. A., & Knill, D. C. (2003). Humans usecontinuous visual feedback from the hand to controlfast reaching movements. Experimental BrainResearch, 152, 341–352.

Saunders, J. A., & Knill, D. C. (2004). Visual feedbackcontrol of hand movements. Journal of Neuroscience,24, 3223–3234.

Saunders, J. A., & Knill, D. C. (2005). Humans usecontinuous visual feedback from the hand to controlboth the direction and distance of pointing move-ments. Experimental Brain Research, 162, 458–473.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 12

Page 13: Binocular and monocular depth cues in online … › knill_lab › papers › 53_JOV.pdfBinocular and monocular depth cues in online feedback control of 3D pointing movement Center

van Mierlo, C. M, Louw, S., Smeets, J. B. J., & Brenner, E.(2009). Slant cues are processed with different laten-cies for the online control of movement. Journal ofVision, 9(3):25, 1–8, http://www.journalofvision.org/content/9/3/25, doi:10.1167/9.3.25. [PubMed] [Article]

Whitaker, D., & Latham, K. (1997). Disentangling therole of spatial scale, separation and eccentricity in

Weber’s law for position. Vision Research, 37,515–524.

Wilson, K. R., Pearson, P. M., Matheson, H. E., &Marotta, J. J. (2008). Temporal integration limits ofstereovision in reaching and grasping. ExperimentalBrain Research, 189, 91–98.

Journal of Vision (2011) 11(7):23, 1–13 Hu & Knill 13