Saliency-Driven Tactile E ect Authoring for Real-Time...

12
Saliency-Driven Tactile Effect Authoring for Real-Time Visuotactile Feedback Myongchan Kim 1 Sungkil Lee 2 Seungmoon Choi 1 1 Pohang University of Science and Technology 2 Sungkyunkwan University [email protected] [email protected] [email protected] Abstract. New-generation media such as the 4D film have appeared lately to deliver immersive physical experiences, yet the authoring has relied on content artists, impeding the popularization of such media. An automated approach for the authoring becomes increasingly crucial in lowering production costs and saving user interruption. This paper presents a fully automated framework of authoring tactile effects from existing video images to render synchronized visuotactile stimuli in real time. The spatiotemporal features of video images are analyzed in terms of visual saliency and translated into tactile cues that are rendered on tactors installed on a chair. A user study was conducted to evaluate the usability of visuotactile rendering against visual-only presentation. The result indicated that the visuotactile rendering can improve the movie to be more interesting, immersive, appealing, and understandable. Keywords: Tactile Effect, Visual Saliency, 4D Film, Multimedia 1 Introduction and Background Recent advances in haptics technologies have proven its worth as an effective communicative source in a wide variety of applications. While the majority of multimedia contents are being mediated through visual and auditory channels, recent research and industrial applications, such as 4D films, are extending be- yond the bimodal interaction to encompass diverse physical experiences includ- ing vibration, breeze, smell, mist, or tickler [6, 13]. Such new-generation films provide better experiences in terms not only of immersion and entertainment, but also of better content delivery through unallocated haptic sensory channels. As the commercialization of stereoscopic TV was perceived as a particularly successful occasion, it is not far away to feel physical movies in everyday life. In creating such immersive experiences, the haptic sensation is one of the crucial components to bring about pervasive changes in multimedia. Generating a haptic film requires the haptic contents that are coordinated with the existing semantics presented audiovisually; otherwise, they cause con- fusions in understanding the director’s intention. If there exists an explicit com- putational model that can be used to reproduce the audiovisual signals, the authoring of haptic contents becomes easier to a greater extent. However, most of real-world contents are directly captured without recognizing their sources.

Transcript of Saliency-Driven Tactile E ect Authoring for Real-Time...

Page 1: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

Saliency-Driven Tactile Effect Authoringfor Real-Time Visuotactile Feedback

Myongchan Kim1 Sungkil Lee2 Seungmoon Choi1

1Pohang University of Science and Technology 2Sungkyunkwan [email protected] [email protected] [email protected]

Abstract. New-generation media such as the 4D film have appearedlately to deliver immersive physical experiences, yet the authoring hasrelied on content artists, impeding the popularization of such media.An automated approach for the authoring becomes increasingly crucialin lowering production costs and saving user interruption. This paperpresents a fully automated framework of authoring tactile effects fromexisting video images to render synchronized visuotactile stimuli in realtime. The spatiotemporal features of video images are analyzed in termsof visual saliency and translated into tactile cues that are rendered ontactors installed on a chair. A user study was conducted to evaluate theusability of visuotactile rendering against visual-only presentation. Theresult indicated that the visuotactile rendering can improve the movieto be more interesting, immersive, appealing, and understandable.

Keywords: Tactile Effect, Visual Saliency, 4D Film, Multimedia

1 Introduction and Background

Recent advances in haptics technologies have proven its worth as an effectivecommunicative source in a wide variety of applications. While the majority ofmultimedia contents are being mediated through visual and auditory channels,recent research and industrial applications, such as 4D films, are extending be-yond the bimodal interaction to encompass diverse physical experiences includ-ing vibration, breeze, smell, mist, or tickler [6, 13]. Such new-generation filmsprovide better experiences in terms not only of immersion and entertainment,but also of better content delivery through unallocated haptic sensory channels.As the commercialization of stereoscopic TV was perceived as a particularlysuccessful occasion, it is not far away to feel physical movies in everyday life. Increating such immersive experiences, the haptic sensation is one of the crucialcomponents to bring about pervasive changes in multimedia.

Generating a haptic film requires the haptic contents that are coordinatedwith the existing semantics presented audiovisually; otherwise, they cause con-fusions in understanding the director’s intention. If there exists an explicit com-putational model that can be used to reproduce the audiovisual signals, theauthoring of haptic contents becomes easier to a greater extent. However, mostof real-world contents are directly captured without recognizing their sources.

Page 2: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

2 M. Kim, S. Lee, and S. Choi

This poses a challenge in new media creation. A straightforward approach is torely on the insights of content designers, but non-trivial efforts are necessaryto be put for the coordination with other modalities. For instance, in the early1970s, a static black-and-white picture was used to generate tactile cues for 400-points tactile stimulator mounted on the chair [3]. Recently, 3D videos with depthvideos were used to produce force feedback synchronized with visual signals [1].Kim et al. used a manual line-drawing interface to author tactile motion seg-ments in a video for their tactile glove system [9]. Intuitive GUIs have often beenhelpful for pre-encoding lengthy haptic media with a number of actuators [12,9]. However, such manual authoring are laborious and time-consuming tasks, ordo not account for what is important in the content. Also, it is obviously not thecase in the situations that require complex scenarios and a tremendous amountin real time. This challenge inspired us to explore an automated approach forhaptic film authoring, while being in accordance with the visual media.

While some of the 4D media are already popular in a limted extent, it is notclear how to derive and involve haptic cues into the existing media. It is partic-ularly challenging in an automated approach, without being aware of semanticsand spatiotemporal structure of a scene—object recognition is still ongoing re-search in the computer vision community. In the absence of particular contextinformation, one of the mainstreams in vision study attempting to find visualimportance suggests to use visual saliency as a key aspect to generate tactilestimuli. Salient inhomogeneous structures of visual features are prioritized andperceived first to humans. Various features including color, brightness, and edgewere reported as significant in such perception [7, 14, 4]. Such visual saliency isthe basis of our approach for extracting tactile contents from visual information.This computational autonomy is enabled by the feature integration theory, one ofthe most influential theories on bottom-up (feature-driven) visual perception [15,10, 5]. Using the theory, the streaming media are processed spatiotemporally, andits corresponding saliency map can be generated. In this work, we note the pos-sibility of automatic translation of the saliency map to a tactile map that candrive an array of tactors.

This paper reports our research on algorithms to transform streaming vi-sual signals to tactile cues using the visual saliency, a real-time tactile displaybuilt upon an array of tactors installed on a chair, and a user study that evalu-ated the usability of our system. Furthermore, our system is aimed at a real-timeinteraction system unlike the previous researches, having the great benefit of dis-tributing haptic contents without manual pre-encoding. To our knowledge, thisis the first attempt for an automatic, real-time tactile effect authoring systemmaking use of movies. Our system can play synchronized visuotactile effects inreal time directly from a movie source. In addition, it has a potential advantagefor human-aided design. An initial seed can be provided rapidly by our system,and then designers can take it over and enhance the tactile scenes, thereby, re-ducing production costs significantly. This can be a viable alternative consideringno semantics is taken into account in our system. In addition, our study useslow-cost vibration motors for tactile rendering. This choice greatly elevates the

Page 3: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

Saliency-Driven Tactile Effect Authoring 3

practicability of our system, but it comes with a large actuation latency to takecare of. For synchronous visuotactile stimulation, our system uses asynchronouscommanding, that is, issues tactile commands earlier than visual commands bypre-calibrated differences between the display latencies.

2 Overview of Framework

In this section, we provide a brief perspective on our system. Fig. 1 illustratesthe pipeline of our system. In the system, visual and tactile renderings are asyn-chronously executed using two different threads. For every frame, the thread forvisual display runs as usual, but meanwhile, the thread builds the saliency mapthat spatiotemporally abstracts perceptual importance in a visual scene. Theresulting saliency map is translated and stored in tactile buffers, the resolutionof which is identical to that of a physical tactor array. In the other thread fortactile rendering, the tactile buffers are read into a tactile map at a lower framerate, e.g., 5 Hz. This tactile map is mapped to the actuation commands to besent to the tactors. In particular, the tactile commands are issued in advanceto the visual commands with compensation of vibration latency. Each step isdetailed in the following sections.

30FPSMovie On Visual Display

RENDERING FLOW

5FPSThread for Tactile Generation

Read current tactile bu�ersUpdate tactile mapActuate tactors

Choose the tactors to activateCompensate vibration latencyIssue tactor command

30FPSThread for Saliency Extraction

Read a newer image from the movie

Store current tactile bu�ers

Build contrast mapsBuild spatial saliency mapsBuild spatiotemporal saliency map

Build saliency map

Tactile Display

Fig. 1: Rendering flow of our system. Two threads for visual and tactile displayrun simultaneously but in different frame rates.

3 Tactile Movie Generation Based on Visual Saliency

First, we briefly review the neuroscientific background on visual saliency andits implementation. It is well known that attentional allocation involves the re-flexive (bottom-up) capture of visual stimuli, in the absence of user’s volitionalshifts [14, 4]. Albeit humans are generally efficient in searching visual informationfrom complex scenes, this does not necessarily mean that everything is perceived

Page 4: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

4 M. Kim, S. Lee, and S. Choi

simultaneously. Preattentive primitives such as color and lightness are first de-tected in parallel and then separately encoded into feature maps. A slow serialconjunctive search is followed to integrate the feature maps into a single topo-graphical saliency map [15, 5]. Neuronal mechanisms of early vision underlyingbottom-up attention give us an important insight for detecting salient areas. Theperiphery of retinal zone (surround) suppresses neuronal activation in narrow re-ceptive fields of the highest spatial acuity (center). Therefore, visual structuresare particularly well-visible when they occupy a region popping out of its localneighborhood.

The extraction of visual saliency from an input image basically relies on thetypical computational method proposed by Itti et al. [10, 8], which has beendistinguished for its effectiveness and plausible outcome in analyzing gaze be-haviors. The key idea of their algorithm is finding salient regions by subtractinga pair of images each other, spatially convolved over different kernel sizes; theyfurther accelerate this procedure using the image pyramid. The image averagedwith a smaller blur kernel preserves finer structures than that with a largerblur kernel. Therefore, the image difference between them effectively capturesspatially salient areas differing from local neighborhood, which simulates thebiological process in the human visual system. Such image difference is calledthe center-surround difference. We first describe their basic algorithm in moredetails, and then describe our extensions on the use of CIE L∗a∗b∗ color space,temporal saliency, and binary thresholding. See also Fig. 2 for the whole pipelineof our algorithm.

The basic algorithm of Itti et al. is as follows. Given an input RGB image,visual feature maps (e.g., color, luminance, and orientation) are first extracted.The image pyramid of each feature map is built by successively downsampling animage to the 1/4 size of its predecessor until reaching the coarsest image of 1×1.For example, for an image with a resolution of 2N × 2N , the levels of its imagepyramid ranges from 0 (the finest image) to N (the coarsest image). For eachimage pyramid, six pairs of center (c) and surround (s) levels are defined; we usedthe common configuration from the previous studies, c ∈ {2, 3, 4} and s=c+ δ,δ ∈ {3, 4} [8]. For all the center-surround pairs, cross-scale image differences(i.e., center-surround differences) are computed (we call the result as contrastmaps). Computationally, the center-surround is realized by upsampling a coarsersurround image to the finer center image and subtracting each other. Finally, thecontrast maps are linearly combined to yield single topographical saliency map.Further details can be found in [8, 11]. We note that the parameters used hereare commonly accepted when using a visual saliency map, and the choice of themis decoupled from a particular tactile rendering algorithm and hardware. Sinceoptimal parameters to further enhance tactile sensation are unknown, findingsuch parameters would be an interesting direction for future research.

We improve the definition of visual features (e.g, color, luminance, and ori-entation) using CIE L∗a∗b∗ color space (in short, Lab space), a widely knownperceptual color space [2]. One common challenge in using the saliency map is tofind appropriate weights for linear combination of different contrast maps. One

Page 5: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

Saliency-Driven Tactile Effect Authoring 5

SPATIAL MAPPING

Gaussian Pyramid Decomposition

Center-Surround Di�erence

Normalization

Feature Pyramid (CIE Lab)

Spatial Contrast Maps

Spatial Saliency Map

SPATIAL MAPPING

Gaussian Pyramid Decomposition

Center-Surround Di�erence

Normalization

Feature Pyramid (CIE Lab)

Build Spatial Contrast Maps

Build Spatial Saliency Map

SPATIAL MAPPING

Gaussian Pyramid Decomposition

Center-Surround Di�erence

Normalization

Feature Pyramid (CIE Lab)

Build Spatial Contrast Maps

Build Spatial Saliency Map

SPATIAL MAPPING

Gaussian Pyramid Decomposition

Center-Surround Di�erence

Normalization

Feature Pyramid (CIE Lab)

Spatial Contrast Maps

Spatial Saliency Map

TEMPORAL MAPPING

Temporal Center-Surround Di�erence

Temporal Contrast Maps

Temporal Saliency Map

Binary Thresholding

TACTILE MAPPING

Downsampling

Gaussian Pre-Filtering

SOURCE MOVIE

Feature Extraction(RGB to CIE Lab)

OU

TCO

ME

TACT

ILE

MA

P G

ENER

ATIO

N

Fig. 2: Overall pipeline of the visuotactile mapping algorithm.

could use unit weights, but there is no guarantee that this is an optimal selec-tion. To cope with this, we use the Lab color space wherein a Euclidean distancebetween points roughly scales with their perceived color difference. Instead ofmultiple feature maps, we only define a single Lab image as a feature map. Thisallows us to efficiently evaluate the perceptual color difference at a single stepwithout the linear combination issue.

Also, we augment the previous definition of the visual saliency along the tem-poral dimension. Since the saliency map was initially designed for static imageanalysis, it only deals with spatial dimension. Therefore, directly applying it tostreaming images with dynamic scenes may not be appropriate. For instance,salient yet static objects may be less significant than dynamic objects in a scene.Hence, in order to preclude such static spots and track dynamic motions, we alsouse temporal saliency. The principle is the same as that of the spatial saliency,but the temporal image pyramids are built along the time. In other words, alevel-n image averages the spatial saliency maps of 2n previous frames includingthe current frame. We note that the temporal image pyramids have the sameimage size as the spatial saliency map has, and are built on the fly. The final tem-poral saliency map is computed using the temporal center-surround differences,whereas the temporal center levels {0, 1, 2} were used instead of {2, 3, 4}.

The resulting spatiotemporal saliency map was globally scaled using the non-linear normalization operator as was done in [8], which uses the ratio between themean of local and global maxima. While fine details of a saliency map promote

Page 6: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

6 M. Kim, S. Lee, and S. Choi

the gaze analysis of a static image, they often inhibit the maximum saliencyresponse in an image. To draw more focus on the most salient spots, it is moreeffective to discard excessive details. Thus, we applied binary thresholding undera certain cutoff value (in our case, 50 percentage of the maximum level) [16].

The last step is relating the final saliency map to tactile cues to actuatetactors. To abstract tactile display hardware, a tactile map such as a 3×3 array isused. The resolution of the saliency map is usually much higher than tactors, andhence, we need to define the mapping between the saliency map and tactile map.In our implementation, we used a simple linear downsampling with Gaussianprefiltering, leaving room for better mapping that considers scene semantics andthe expectation of user’s volitional factors.

At the tactile rendering stage, the intensities of the tactile map are inter-preted as the levels of vibration, and actuate the tactors whose dimension isthe same as the tactile map. The actuation is performed in a continuous form,since the tactile map is also streaming along with the source video. The mappingbetween the tactile map and commands is straightforward, and thus, softwarecommands for issuing haptic signals are virtually negligible. However, the me-chanical latency takes longer than time for a single video frame, and thus, itrequires to be compensated and this will be discussed in the next section.

We here report the computational performance of our haptic content gen-eration algorithm. Our system was implemented on an Intel i5 2.66 GHz withIntel OpenCV library. For most movie clips, up to the resolution of 1000×1000,our system performed more than 30 FPS, the common requirement for real-timerendering. For HD resolutions such as 1080p, parallel GPU processing can beexploited on demand, as was done in [11].

4 Tactile Rendering

In order to test our saliency-based algorithm for visuotactile mapping, althoughit is independent of particular hardware platforms, we have built a test platformfor tactile display. The display is designed to provide vibrotactile stimuli ontothe lower back of a user sitting on a chair. Coin-type eccentric-mass vibrationmotors, one of the most popular and inexpensive actuators, are installed on achair in a 3×3 array. Each tactor is 10 cm apart from neighboring tactors andindependently connected to a customized control circuit. The maximal voltageto actuate tactors is 3 V, which can provide the vibration intensity up to 49 Gat a 77-Hz vibration frequency. When tactors are fastened on a solid chair, thetactor’s vibration may be propagated onto neighboring tactors and be dislocatedwhen they vibrate. To alleviate this problem, the tactors are installed in thetactor housings of a chair cushion cover made of thin nylon, and used the coverto wrap the sponge fabric of the chair cushion (see Fig. 3).

The procedure of tactile rendering is as follows. The tactile map for renderingread tactile buffers which store the translated results of the video thread. Sincevisual and tactile renderings run at different threads with different refresh rates,latency from vibrotactile actuators requires suitable compensation to avoid con-

Page 7: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

Saliency-Driven Tactile Effect Authoring 7

Fig. 3: Our test platform for tactile display.

-0.1 0.0 0.1 0.2 0.30.0

0.5

1.0

1.5

2.0

-0.1 0.0 0.1 0.2 0.3-2.0

-1.0

0.0

1.0

2.0

Rising Time

100%90%

0%

EnvelopeMeasured Signal

Rising TimeBoundary

Time (second)

Hilbert-Transformed DataRaw Data

Dis

plac

emen

t (1.

0x10

m)

-3

Fig. 4: Example of a vibration intensity response for the definition of rising time.

fusions in visuotactile presentation. The lag is mainly caused by the mechanicalactuation delay of triggering vibration on the tactors. When a delay falls shortof the duration required to play a single visual frame (e.g., 33 ms), the latencyis in an acceptable range in practice and does not need to be compensated.

A simple remedy for the latency problem is pre-issuing tactile commands afew frames earlier than the corresponding visual signals. To do so, we measuredthe latency values for a number of starting and target vibration voltages. Whena target voltage was smaller than a starting voltage, i.e., when the motor wasdecelerated, we observed the maximum latencies were in the acceptable rangeand thus, decided not to consider those falling times further. However, the accel-eration process required for the target voltage larger than the starting voltageshowed a noticeably longer delay. Hence, we defined the rising time of actuationas the time period required to reach the 90 percent of steady-state vibrationlevel, as shown in Fig. 4, and compensated it in tactile rendering.

The vibration intensity was noninvasively measured by looking at the tac-tor’s vibration displacement using a laser vibrometer (SICK, model: AOD5-N1).During the measurements, the vibration motor was fixed on a flat sponge 30 mmnext to the vibrometer. Staring voltages and offsets to the target voltages weresampled in the range from 0 to 3.0 V by a step size 0.1 V. The recorded datawere fed to Hilbert transform to reconstruct their signal envelope for accurateamplitude estimation (Fig. 4).

Page 8: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

8 M. Kim, S. Lee, and S. Choi

Fig. 5: Example of vibration displacement measured using a laser vibrometer.

For the rendering purpose, a function of parametric form is more convenientthan interpolating the latency data in real time. Thus, we regressed the risingtime data to a quadratic form (R2 =0.98) as, such that:

fr(Vs, Vd) = 241.9− 175.7Vd − 117.7Vs + 38.86V 2d + 49.59VdVs + 18.07V 2

s , (1)

where fr(Vs, Vd) is the estimated rising time, and Vs and Vd are the starting volt-age and the voltage offset, respectively. In practice, we do not have to consider allthe target voltages. Weak voltage commands less than a certain threshold (e.g.,0.5 V) can be discarded. By excluding such inputs, we have a set of rising timeswith 150 ms or less, in which the maximum resolution of tactile cues can be upto five video frames. Accordingly, a single data block for tactile signals is filledout of five video frames. For instance, when 60 ms is found as the rising time forthe next tactile data block, the first three frames of the data block are writtenwith the previous input voltage and the remaining two frames are filled withthe target voltage at the next data block. A concern about force discontinuitymight arise here in issuing discrete force commands within the tactile data block.However, it does not manifest itself, since the low-bandwidth dynamics of theactuator is likely to interpolate the abrupt changes of the vibration intensity.Hence, this simple strategy can effectively compensate for the motor delay ofvibrotactile rendering to synchronize the visual and tactile stimulations, whileavoiding torque discontinuity.

5 Evaluation

We conducted a user experiment to assess the usability of our system. Six usabil-ity items comparing visual-only and visuotactile presentations for different typesof movies were collected via questionnaire. This section reports the methods usedin our evaluation and experimental results.

Twelve paid undergraduate students (6 males and 6 females; 19–30 years oldwith average 22.3) participated in the experiment. The participants were askedto wear a thin T-shirt, leaning back on a chair to directly feel the vibration.Also, they wore earplugs and a headset to isolate them from tactor noises.

Page 9: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

Saliency-Driven Tactile Effect Authoring 9

(a) Static motion (b) Dynamic motion (c) Real-world example

Fig. 6: Examples of (a) static-motion movie (two balls appear in various loca-tions), (b) dynamic-motion movie (a single ball moves in various speed), and (c)real-world movie (a documentary film of two bears in a zoo).

The experiment used a two-factor within-subject design. The first factor wasthe presence of tactile cues while playing a movie. The other factor was the typeof movies to be presented. Three movies, two synthetic and one natural, wereused in the experiment (see Fig. 6 for each). The video resolution of 1000×1000at 30 Hz was used commonly. One synthetic movie showed the static motions inthat one or two balls repeatedly appeared at various locations and remained atthe same position more than 1 second. The other synthetic movie contained dy-namic motions with a ball moving around at various speeds with sudden stallingmotions. The last one was a real-world movie that shows bears in a zoo. By com-bination, each participant went through the total of six successive experimentalsessions. Their presentation order was balanced using Latin squares.

After each session, a break longer than two minutes was provided to theparticipant to fill out the questionnaire. The questionnaire consisted of six ques-tions. Four questions were common to all the sessions, and the other two weretactile-specific questions given only in the conditions where tactile cues were pre-sented. One additional survey asking a free evaluation of the overall system wasfollowed. The common questions included: Q1. How interesting did you find thesystem? Q2. How much did you like the whole system? Q3. How much were youimmersed in the movie with the given system? Q4. How well did you understandthe contents of the movie? The tactile-specific questions were: QA. How wellwere the vibrations matched with the movie? QB. How much did the vibrationsimprove the immersion into the movie? Each question except the survey useda 100-point continuous scale, where 100 represents the strong agreement to thequestion, 50 a neutral opinion, and 1 the strongest disagreement.

The subjective ratings of the participants obtained in the experiment areplotted with standard errors in Fig. 7. Overall, the presence of tactile stimulationelicited much positive responses. The participant preferred the tactile-enabledmovies to the original movies. Further, the tactile cue supports significantly

Page 10: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

10 M. Kim, S. Lee, and S. Choi

Fig. 7: Average subjective ratings measured in the user experiment.

enhanced the immersion and content delivery as well. We applied ANOVA tosee the statistical significances of these differences between the visuotactile andvisual-only presentations. Statistical significances were found for all the fourcommon questions; all the p values were less than 0.001 and their F -statisticswere F1,11 =39.95, F1,11 =33.05, F1,11 =30.87, and F1,11 =21.79, respectively. Incontrast, none of the responses resulted in statistical significance for the factor ofmovie type. The tactile-specific question, QA, examined the system performanceof the motion extraction; as expected, static-motion movie was evaluated as themost well matched (mean score 89, SE=2.68), while the real-world movie had theweakest performance (mean score 60, SE=5.48). The question on the absolutemeasurement of immersion, QB, showed that the real-world movie was the leastfavored over the other two movies.

6 Discussion

The user study proved that the synchronized tactile display system improvedthe multimedia presentation; the participants were more engaged and immersedin the movie experience; tactile cues were effective in improving content delivery.The positive responses elucidated that the tactile cues were translated relativelywell and it would emphasize the motions of salient spots in images.

More qualitative evaluation, regarding the overall system, was also collectedvia additional survey. The participants saw the system innovative and new, whilethe tactile cues are fairly well generated according to the salient spots of themovies. On the other hand, some of them saw the system abnormal or difficultto follow when they first tried it out. One suggestion for revising the renderingalgorithm was limiting the duration of tactile cues, which makes them morememorable and avoids excessive tactile events as well.

As seen in the experimental results, the quality of translation in creatingtactile cues is the key in conveying better experiences. The simple configurations

Page 11: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

Saliency-Driven Tactile Effect Authoring 11

of the synthetic movies were translated almost perfectly and preferred in terms ofimmersion with higher ratings, while the real-world movie was evaluated ratherunorganized.

The visuotactile translation can be enhanced by tweaking our system. Forinstance, simple linear mapping between the visual and tactile viewports canbe dynamically adapted. Cinematography commonly tries to locate salient onesin the middle of the scene to draw attention. However, this may cause tactileeffects occurring in the same location for a long period, and generate an unpleas-ant concentrated feeling to the user. In addition, the periphery of the screen isoften of less importance, and small movements in the center often require to beexaggerated. To consider this, region of interests can be dynamically widen andnarrowed down according to the present context, and the provision of tactile cuescan be limited in a short period, solely for important shots in scene semantics.

On the other hand, excessive and wrong translation would be a natural conse-quence of our system because we do not hold the high-level semantics of objectspresent in the scene. Without such scene semantics, the translation is hardlyperfect. One good way to improve this problem is combining automated pro-cessing and manual authoring in the postprocessing. In the automated step, weaim at generating rather excessive tactile cues to some extent without filtering.Given tactile cues automatically generated from the visual information, the tac-tile content designer tries to prune out redundant cues to provide more focusedfeedback or to add some missing semantics during the translation. Also, diversetastes from different cultures and favors can be incorporated in this postprocess-ing stage. We envision the tactile authoring will be automated in this fashion toprovide sophisticated tactile cues.

7 Conclusion

We presented an automated framework of tactile effect authoring to providesynchronized visuotactile stimuli in real time. The visual saliency served as abasis for extracting spatiotemporal importance from existing visual media andtranslating the visual importance to tactile cues. Vibrotactors installed on a chairwere used to render tactile effects synchronously, along with the compensationof vibrotactile latency. The user study found that visuotactile rendering werepreferred to visual-only presentation, eliciting more immersion and involvement,and better understandings of content.

Since our algorithm is independent from particular rendering methods, ourapproach has special importance for haptic content creators and interaction de-signers, who strive to create online or offline physical contents inducing spatially-present experience. Haptic cues extracted automatically from the existing mediacan facilitate the rapid manipulation of a tactile movie in the postproductionstage. In the future, we are planning to include more sophisticated supports forthe semi-automatic postproduction.

Page 12: Saliency-Driven Tactile E ect Authoring for Real-Time ...cg.skku.edu/pub/papers/2012-kim-eh-hsalc-cam.pdftactile rendering, the tactile bu ers are read into a tactile map at a lower

12 M. Kim, S. Lee, and S. Choi

Acknowledgements. This work was supported in parts by a BRL program2011-0027953 from NRF and by an ITRC program NIPA-2012-H0301-12-3005,all funded by the Korean government.

References

1. Cha, J., Kim, S.Y., Ho, Y.S., Ryu, J.: 3D Video Player System with Haptic In-teraction based on Depth Image-Based Representation. IEEE Trans. ConsumerElectronics 52(2), 477–484 (2006)

2. C.I.E: Recommendations on uniform colour spaces, colour difference equations,psychometric colour terms (1978), supplement No.2 to CIE publication No.15 (E.-1.3.1) 1971/(TC-1.3.)

3. Collins, C.: Tactile Television-Mechanical and Electrical Image Projection. IEEETrans. Man-Machine Systems, 11(1), 65–71 (1970)

4. Connor, C.E., Egeth, H.E., Yantis, S.: Visual attention: Bottom-up versus top-down. Current Biology 14(19), R850–R852 (2004)

5. Desimone, R., Duncan, J.: Neural mechanisms of selective visual attention. Annualreview of neuroscience 18(1), 193–222 (1995)

6. Hamed, H.M., Hema, R.A.: The Application of the 4-D Movie Theater Systemin Egyptian Museums for Enhancing Cultural Tourism. Journal of Tourism 10(1),37–53 (2009)

7. Hoffman, D., Singh, M.: Salience of visual parts. Cognition 63(1), 29–78 (1997)8. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid

scene analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 20(11),1254–1259 (1998)

9. Kim, Y., Cha, Jongeun , Ryu, Jeha, Oakley, I.: A Tactile Glove Design and Au-thoring System for Immersive Multimedia. IEEE Multimedia 17(3), 34–45 (2010)

10. Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlyingneural circuitry. Human Neurobiology 4(4), 219–227 (1985)

11. Lee, S., Kim, G.J., Choi, S.: Real-Time Tracking of Visually Attended Objects inVirtual Environments and Its Application to LOD. IEEE Trans. Visualization andComputer Graphics 15(1), 6–19 (2009)

12. Lemmens, P., Crompvoets, F., Brokken, D., van den Eerenbeemd, J., De Vries, G.:A body-conforming tactile jacket to enrich movie viewing. In: Third Joint Euro-haptics Conference and Symposium on Haptic Interfaces for Virtual Environmentand Teleoperator Systems. pp. 7–12 (2009)

13. Oh, E., Lee, M., Lee, S.: How 4D Effects cause different types of Presence experience? In: Proc. the 10th International Conference on Virtual Reality Continuum andIts Applications in Industry. pp. 375–378 (2011)

14. Parkhurst, D., Law, K., Niebur, E.: Modeling the role of salience in the allocationof overt visual attention. Vision Research 42(1), 107–123 (2002)

15. Treisman, A.M., Gelade, G.: A Feature-Integration Theory of Attention. CognitivePsychology 12, 97–136 (1980)

16. Walther, D., Itti, L., Riesenhuber, M., Poggio, T., Koch, C.: Attentional selectionfor object recognitiona gentle way. In: Biologically Motivated Computer Vision.pp. 251–267 (2002)