Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science...

67
Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet g n i p ö k r r o N 4 7 1 0 6 n e d e w S , g n i p ö k r r o N 4 7 1 0 6 - E S LIU-ITN-TEK-A-18/047--SE Gaze-driven interaction in video games Mohamed Al-Sader 2018-10-19

Transcript of Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science...

Page 1: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet

gnipökrroN 47 106 nedewS ,gnipökrroN 47 106-ES

LIU-ITN-TEK-A-18/047--SE

Gaze-driven interaction invideo games

Mohamed Al-Sader

2018-10-19

Page 2: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

LIU-ITN-TEK-A-18/047--SE

Gaze-driven interaction invideo games

Examensarbete utfört i Medieteknikvid Tekniska högskolan vid

Linköpings universitet

Mohamed Al-Sader

Handledare Jimmy JohanssonExaminator Stefan Gustavson

Norrköping 2018-10-19

Page 3: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Mohamed Al-Sader

Page 4: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

Linköping UniversitySE–601 74 Norrköping, Sweden

+46 11 36 30 00 , www.liu.se

Linköping University | Department of Science and Technology

Master thesis, 30 ECTS | Media Technology

202018 | LIU-ITN/LITH-EX-A--2018/xxx--SE

Gaze-driven interaction invideo gamesThis thesis is presented for the degree ofMaster of Science of Linköping University

Mohammed Al-Sader

Supervisor : Jimmy JohanssonExaminer : Stefan Gustavson

Page 5: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

Presentation Date Publishing Date (Electronic version)

Department and Division Department of Science and Technology Linköping University

URL, Electronic Version http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-xxxx (Replace xxxx with the correct number)

Publication Title Gaze-driven interaction in video games Author(s) Mohammed Al-Sader Abstract The introduction of input devices with natural user interfaces in gaming hardware has changed the way we interact with games. Hardware with motion-sensing and gesture recognizing capabilities remove the constraint of interacting with games through typical traditional devices like mouse-keyboard and gamepads. This changes the way we approach games and how the game communicates back to us as the player opening new levels of interactivity. In this thesis we examine how eye tracker technology can be used in games. Eye tracking technology has previously been in extensive use within areas of support, aiding people with disabilities. It has also been used in marketing and usability testing. Up to date, the use of eye tracking technology within games has been very limited. This thesis will cover how to integrate Tobii's eye tracker in EA DICE's Frostbite 3 game engine and how to improve the gaze accuracy of the device through filtering. It will also cover the use of eye tracking technology in rendering methods. In addition, we will study how the eye tracker technology can be used to simulate the human visual system when changing our focus point and when we adapt to new luminance conditions in the scene. This simulation will be integrated with the implementation of depth of field and tone mapping in Frostbite 3.

Keywords gaze direction, gaze-dependent DOF, gaze-dependent tone mapping, depth of field, eye tracker, gaze precision, gaze accuracy, measurement noise

Language X English Other (specify below) Number of Pages 54

Type of Publication Licentiate thesis X Degree thesis Thesis C-level Thesis D-level Report Other (specify below)

ISBN (Licentiate thesis)

ISRN:

Title of series (Licentiate thesis)

Series number/ISSN (Licentiate thesis)

Page 6: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – från publicerings-datum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstakakopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och förundervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva dettatillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För attgarantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfat-tning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skyddmot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som ärkränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsidahttp://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement –from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read,to download, or to print out single copies for his/hers own use and to use it unchanged fornon-commercial research and educational purpose. Subsequent transfers of copyright cannotrevoke this permission. All other uses of the document are conditional upon the consent ofthe copyright owner. The publisher has taken technical and administrative measures to assureauthenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned whenhis/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its proce-dures for publication and for assurance of document integrity, please refer to its www homepage: http://www.ep.liu.se/.

c© Mohammed Al-Sader

Page 7: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

Abstract

The introduction of input devices with natural user interfaces in gaming hard-

ware has changed the way we interact with games. Hardware with motion-

sensing and gesture recognizing capabilities remove the constraint of interacting

with games through typical traditional devices like mouse-keyboard and gamepads.

This changes the way we approach games and how the game communicates back to

us as the player opening new levels of interactivity.

In this thesis we examine how eye tracker technology can be used in games. Eye

tracking technology has previously been in extensive use within areas of support,

aiding people with disabilities. It has also been used in marketing and usability test-

ing. Up to date, the use of eye tracking technology within games has been very

limited. This thesis will cover how to integrate Tobii’s eye tracker in EA DICE’s

Frostbite 3 game engine and how to improve the gaze accuracy of the device through

filtering. It will also cover the use of eye tracking technology in rendering methods.

In addition, we will study how the eye tracker technology can be used to simulate

the human visual system when changing our focus point and when we adapt to new

luminance conditions in the scene. This simulation will be integrated with the im-

plementation of depth of field and tone mapping in Frostbite 3.

Keywords: gaze direction, gaze-dependent DOF, gaze-dependent tone mapping,

depth of field, eye tracker, gaze precision, gaze accuracy, measurement noise.

Page 8: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

Acknowledgments

I would like to thank my supervisor Torbjörn Söderman, Technical Director at EA DICE, for theguidance and valuable feedback he provided me. I would like to thank the developers fromthe Frostbite and Battlefield 4 teams for the support and input they gave me: Mattias Unger forgiving me input on the filter implementation, Toby Vockrodt who helped me integrate my filterimplementations with the entity system, Charles de Rousiers who helped me understand thedepth of field implementation in Frostbite 3 and Sébastien Hillaire who gave me very valuablefeedback on my implementation of gaze-dependent depth of field which is based on his work.In particular I would like to thank Tobias Bexelius for his valuable advice, encouragement andhelp in solving problems in all of my work throughout my time in EA DICE. Next I wouldlike to thank the staff at Tobii Technology for their support: Dzenan Dzemidzic and FredrikLindh who familiarized me with the eye tracker and Tobias Lindgren and Anders Olsson forsupporting me with any questions and problems I had with the hardware. I would also liketo thank my examiner Stefan Gustavson at Linköpings University for the patient guidance andsupport he provided me and for making this project possible together with EA DICE. Finally Iwould like to thank my family and friends for proofreading and for their patience and endlessencouragement.

v

Page 9: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

Contents

Abstract iv

Acknowledgments v

Contents vi

List of Figures viii

List of Tables x

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 EA DICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Tobii Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4.1 Tobii Eye Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.2 How eye tracking works - Pupil Centre Corneal Reflection . . . . . . . . . 2

2 Background and Related Work 4

2.1 Eye Tracker Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.1 Gaze accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Gaze precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Gaze-dependent Depth Of Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Gaze-dependent Tone Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Implementation and results 11

3.1 Eye Tracker integration in Frostbite 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Eye Tracker Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1 Average filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.2 Moving Average Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2.3 First-order Low-pass filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.3.1 Time constant against Cutoff Frequency . . . . . . . . . . . . . . 223.2.4 Online Cursor Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.4.1 False Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.5 Spatial Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Gaze-dependent Depth of Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.1 Simulation of accommodation phenomenon . . . . . . . . . . . . . . . . . 39

3.4 Gaze-dependent tone mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.1 Scene dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Discussion 46

4.1 Eye tracker filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2 Gaze-dependent Depth of Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 Gaze-dependent tone mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

vi

Page 10: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

5 Conclusion 50

5.1 Future work and improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.1.1 Foveated rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.1.2 Maladaptation with gaze-dependent tone mapping . . . . . . . . . . . . . 515.1.3 Gaze interaction with game logic . . . . . . . . . . . . . . . . . . . . . . . . 51

References 52

vii

Page 11: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

List of Figures

1 Pupil Centre Corneal Reflection (PCCR) remote eye tracking technique where an im-age of the reflections on the cornea and pupil, created from the illumination of a lightsource, is captured and then used to calculate the gaze direction. Image courtesy to [1]. 3

2 Gaze precision and accuracy metrics from the performance of an eye tracker. Theblack circle illustrates the real gaze point of the client while the red x marks are themeasured gaze points. The bottom and upper left examples show less dispersionbetween each gaze sample and thus good precision compared to the bottom andupper right examples. For the accuracy the bottom left and upper right examplesshow that samples are at a much smaller angular distance from the real gaze pointgiving better accuracy. Image courtesy to [2]. . . . . . . . . . . . . . . . . . . . . . . . 4

3 The dashed line is the measured gaze point and the solid line is the actual gaze point.The distance between them shows the gaze accuracy. . . . . . . . . . . . . . . . . . . 6

4 Thin lens camera model. Image courtesy to [17]. . . . . . . . . . . . . . . . . . . . . . 75 Pinhole camera model. Image courtesy to [17]. . . . . . . . . . . . . . . . . . . . . . . 86 The Zone system maps scene zones to print zones with the middle brightness of the

scene mapped to the middle print zone. Image courtesy to [23]. . . . . . . . . . . . . 97 HDRI technique showing pictures of a scene taken at different stops. Image courtesy

to [27]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

8 Flow chart of eye tracker integration in Frostbite 3. . . . . . . . . . . . . . . . . . . . . 139 Signal with measurement noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310 Signal filtered by average filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1511 Signal filtered by average filter (zoomed). . . . . . . . . . . . . . . . . . . . . . . . . . 1612 Block Diagram of Moving Average Filter. . . . . . . . . . . . . . . . . . . . . . . . . . 1813 First-order low-pass filter. The square represents the cutoff frequency. . . . . . . . . . 1814 Butterworth First-order low-pass filter. The square represents the cutoff frequency. . 1915 Backward Euler Differentiation Approximation. Image courtesy to [34]. . . . . . . . 2016 Signal filtered by First-order Low-pass filter. . . . . . . . . . . . . . . . . . . . . . . . 2117 Signal filtered by First-order Low-pass filter (zoomed). . . . . . . . . . . . . . . . . . 2118 Time constant τ against cutoff frequency fc. . . . . . . . . . . . . . . . . . . . . . . . . 2219 First-order Low-pass filter applied to a signal with low and high cutoff frequencies. 2320 First-order Low-pass filter applied to a signal with low and high cutoff frequencies

(zoomed). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2321 Block diagram of Online Cursor Filter. Image courtesy to [11]. . . . . . . . . . . . . . 2622 Signal filtered by Online Cursor Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2623 Signal filtered by Online Cursor Filter (zoomed). . . . . . . . . . . . . . . . . . . . . . 2724 Edge of signal filtered by Online Cursor Filter (zoomed). . . . . . . . . . . . . . . . . 2725 False Alarms in Online Cursor Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2926 False Alarms in Online Cursor Filter (zoomed). Left: Arrow 1, Middle: Arrow 2,

Right: Arrow 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2927 Online Cursor Filter with large threshold giving slow response when steady-state of

the signal changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

viii

Page 12: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

28 Hyperbolic Tangent Function. The blue line shows the standard function while thered line shows the function scaled. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

29 Signal filtered by Spatial filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3230 Signal filtered by Spatial filter (zoomed). . . . . . . . . . . . . . . . . . . . . . . . . . . 3231 Circle of Confusion. Image courtesy to [19]. . . . . . . . . . . . . . . . . . . . . . . . . 3332 Flow chart of gaze-dependent DOF in Frostbite 3. . . . . . . . . . . . . . . . . . . . . 3433 Autofocus system represented by a focus zone centered on a filtered gaze point. . . . 3434 Gaussian Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3535 Focal point calculated using spatial weighting of the focus zone. . . . . . . . . . . . . 3636 Focal point calculated as average depth of the focus zone. . . . . . . . . . . . . . . . . 3737 Gaze-dependent DOF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3838 Gaze-dependent DOF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3839 Focal distance calculated with accommodation effect. . . . . . . . . . . . . . . . . . . 3940 Focal distance calculated with accommodation effect (zoomed). . . . . . . . . . . . . 4041 Flow chart of gaze dependent tone mapping in Frostbite 3. . . . . . . . . . . . . . . . 4142 The tone mapping operator (a) [32] uses the constraint from Equation 37 while (b)

[37] does not use it. We see that (b) has poor quality from pure black regions whengazing at a high luminance region (the sun). In (a) we get better quality since we stillsee details in lower luminance regions similar to the global method in (c) [23]. Imagecourtesy to [32]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

43 Without the constraint in step 3 the tone mapping operator gives large values forthe log-average luminance in high luminance regions of the scene as seen in the redcurve. Using the constraint gives the tone mapping operator a logarithmic behaviorsimilar to the global method. Image courtesy to [32]. . . . . . . . . . . . . . . . . . . . 42

44 Gaze-dependent tone mapping when gazing at regions with low luminance. In thisscene we see details in low luminance regions clearly while high luminance regionsare overexposed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

45 Gaze-dependent tone mapping when gazing at high luminance regions. We see de-tails in the high luminance regions while low luminance regions becomes darker. . . 44

46 Gaze-dependent tone mapping with the constraint in [32] being used. This methodpreserves more details in the regions with low luminance while gazing at regionswith high luminance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

47 Gaze-dependent tone mapping without the constraint used. The luminance in darkregions becomes lower. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

48 Spatial Filter and Online Cursor Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4749 Spatial Filter and Online Cursor Filter (zoomed). . . . . . . . . . . . . . . . . . . . . . 4750 Gaze accuracy and gaze-dependent DOF. . . . . . . . . . . . . . . . . . . . . . . . . . 48

ix

Page 13: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

List of Tables

1 Abstract interface class for eye trackers. . . . . . . . . . . . . . . . . . . . . . . . . 112 Interface for Tobii Eye Trackers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

x

Page 14: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

1 Introduction

1.1 Motivation

Input commands in video games traditionally come from three different devices: mouse, key-board and gamepads. Games played on a computer use a mouse-keyboard combination whilehome game systems generally use a gamepad. This is the preferred format for the current gen-eration of home game systems and computers and has been the preferred format going backdecades in previous generations. In the past decade this restriction has gradually been removedfrom both consoles and computers with the introduction of devices using a natural user inter-face; voice and gesture commands (Kinect), motion commands (Wii Remote and PS Move) andtouch commands (Wii U gamepad and tablets). One of the most natural uses of the human bodyis the eyes. With an eye tracker we get a natural user interface that communicates between us,the client, and the software by tracking the gaze of our eyes. By integrating this technologyin video games new ways of interacting with games become possible. For example, renderingeffects or game logic that depends on the eyes in a real environment could use an eye trackerto receive feedback from the eyes to create a more immersive experience. The aim of a playercontrolled character in a game, AI behavior and interaction could be simulated by analyzingthe gaze of the player.

1.2 Aim

The aim of this thesis is to research how Tobii’s eye tracker technology can be used to interactwith games. The thesis will cover integration of the Tobii X2-30 eye tracker in Frostbite 3, filter-ing of gaze data received from the eye tracker and finally integration in some of the renderingtechniques in Frostbite 3. The report is structured in the following way:

• Chapter 1 gives an introduction on eye trackers and eye tracker technology. It also givesa short introduction to EA DICE and Tobii Technology who collaborated in this thesis.

• Chapter 2 provides background and related work on the filtering and eye tracker integra-tion on the rendering techniques in Frostbite 3.

• Chapter 3 demonstrates the implementations and results of the eye tracker integration, thefiltering of input data from the eye tracker and the integration with rendering techniquesin Frostbite 3.

• Chapter 4 includes a discussion surrounding the results obtained throughout this thesis;advantages and disadvantages of the filtering of input data and the effect of an eye trackeron the rendering techniques.

• Chapter 5 concludes the report and provides thoughts on future improvements.

1

Page 15: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

1.3. EA DICE

1.3 EA DICE

EA DICE is a Swedish video game developer. Founded in 1992 and based in Stockholm, EADICE is today well-established within the game industry with critically acclaimed video gameseries such as Battlefield and Mirror’s Edge. The company is also the developer of the Frostbitegame engine which today is used by multiple EA studios.

1.4 Tobii Technology

Tobii Technology is a Swedish hi-tech company specializing in development of hardware andsoftware for eye tracking. Founded in 2001 with headquarters based in Stockholm, Tobii Tech-nology is a global leader in eye tracker technology with heavy focus on research and devel-opment. With their eye tracking technology involved in a wide range of fields such as usabil-ity, automotive, computer games, human behavior, marketing research and neuroscience, TobiiTechnology have multiple partnerships with well-established software and hardware vendorsaround the world. The company is divided into three business units: Tobii Dynavox, Tobii Proand Tobii Tech. Each unit focuses on different fields where their technology is applicable.

1.4.1 Tobii Eye Tracker

Tobii eye trackers are used in three main fields: assistive communication, human behavior andvolume products (computer hardware, automotive and games). Development of eye trackerswith assistive technology comes from Tobii Dynavox and focuses on people with speech dis-abilities. These eye trackers are devices integrated with computers or tablets that offer otherassistive technology together with the eye tracking capability such as the software or touchingmechanism. Tobii Pro develops eye trackers used to study human behavior. For example, eyetrackers can be used by firms wanting client feedback by studying where the client is gazing intheir websites or advertisements. They could be used by researchers in academic institutions tostudy the human behavior and interaction with virtual environments among other things. Eyetrackers developed by Tobii Pro come in various formats: as standalone devices mounted onmonitors, as glasses and as screen-based eye trackers. Tobii Tech develops eye trackers used involume products such as VR, games and automotive. These eye trackers also come in differentformats. For gaming eye trackers are mounted on the computer monitor to track the player’sgaze. In automotive such as cars a chip is integrated and then used for tasks such as identify-ing the driver for personalization of car settings or for safety such as detection of distraction ordrowsiness of the driver.

1.4.2 How eye tracking works - Pupil Centre Corneal Reflection

The pupil center corneal reflection (PCCR) is one of the most common remote eye trackingtechniques and also the one Tobii’s eye trackers are based on. In PCCR a camera is used tocapture an image of the reflections on the cornea and pupils created by the illumination of alight source. The two reflections form an angle from which a direction vector is calculated. Thisvector is used together with other geometrical features coming from the reflections to retrievethe gaze direction. Tobii’s technique works similarly to PCCR; a near-infrared light source fromthe eye tracker illuminates the eyes to create the reflections on the cornea and pupil. The imagesensors captures two images of the reflections and the gaze direction is then calculated from animage processing algorithm [1], see Figure 1.

2

Page 16: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

1.4. Tobii Technology

Figure 1: Pupil Centre Corneal Reflection (PCCR) remote eye tracking technique where animage of the reflections on the cornea and pupil, created from the illumination of a lightsource, is captured and then used to calculate the gaze direction. Image courtesy to [1].

3

Page 17: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

2 Background and Related Work

2.1 Eye Tracker Filtering

The performance of eye trackers are important as they determine how well we interact with thevirtual world of a game. Poor interaction caused by poor gaze data from the eye tracker makesthe client feel disconnected from the response of the game while good gaze data provides newintuitive ways of interacting with the game alongside increased immersion. The performanceof eye trackers are defined by three core metrics affecting the gaze data the most: robustness,accuracy and precision [2], see Figure 2. By determining the size of a focus zone centered on theclient’s gaze point and by filtering gaze data both gaze accuracy and precision are improvedwhich gives the client a more polished experience in-game with the eye tracker.

Figure 2: Gaze precision and accuracy metrics from the performance of an eye tracker. Theblack circle illustrates the real gaze point of the client while the red x marks are the measuredgaze points. The bottom and upper left examples show less dispersion between each gazesample and thus good precision compared to the bottom and upper right examples. For theaccuracy the bottom left and upper right examples show that samples are at a much smallerangular distance from the real gaze point giving better accuracy. Image courtesy to [2].

4

Page 18: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

2.1. Eye Tracker Filtering

2.1.1 Gaze accuracy

The gaze accuracy is defined as the average angular distance between the gaze point measuredby the eye tracker and the real gaze point (see Figure 2), and it is measured in degrees of thevisual angle [3,4]. This is illustrated in Figure 3 where the dashed line shows the real gaze pointand the solid line shows the measured gaze point. Gaze points measured by the eye trackerrarely correspond to the real gaze points. Head position relative to the eye tracker, change ingaze direction and environmental interference such as illumination affects the gaze accuracyand each degree of accuracy is an error from the real gaze point. For instance, the gaze accuracyfor a Tobii X2-30 Eye Tracker measured under ideal conditions and from a distance of 60-65cm from the eye tracker has an average gaze accuracy of 0.4˝ [3]. Assuming we are sitting at adistance of 65 cm from the eye tracker and we are using a 24” display (1920x1200) we show thata gaze accuracy of 0.4˝ is superior to one screen pixel. Let C denote the measured gaze pointand F denote the real gaze point in Figure 3. The gaze accuracy is denoted as α, the distance asd and the eye as E. We want to solve for the unknown variable b which is the average angulardistance error. From Figure 3 we have △ECF in Equation 1 which we can solve b from:

tan(

α ¨π

180

)

=b

dô b = d ¨ tan

(

α ¨π

180

)

(1)

With d = 262(65cm) and α = 0.4˝ as input we get an average angular distance error of b «0.182(45.7mm). The pixel density, measured in pixels per inch (PPI) is calculated from Equation2:

PPI =dp

di, dp =

b

w2p + h2

p (2)

dp is the diagonal resolution in pixels of the monitor, wp and hp are the width and height ofthe resolution in pixels and di is the diagonal size in inches. With wp = 1920, hp = 1200 anddi = 242 we get PPI « 94.3. Multiplying this by the angular distance error b gives us an errorof 94.3 ¨ 0.18 « 17 pixels from the real gaze point. We see that even under ideal conditionsTobii X2-30 Eye Tracker gaze accuracy will be superior to one pixel which means the client isnot necessarily gazing at the measured point. This error is treated when performing calculationsheavily influenced by the gaze area by using the pixels in a focus zone centered on the measuredgaze point to contribute to the final result. The size of the focus zone is then based on theaverage angular distance error.

5

Page 19: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

2.2. Gaze-dependent Depth Of Field

Figure 3: The dashed line is the measured gaze point and the solid line is the actual gazepoint. The distance between them shows the gaze accuracy.

2.1.2 Gaze precision

When interacting with the virtual world of a game the player will fixate their gaze on points ofinterest in the world. This makes the precision from the input device very important because itsoutput must accurately match the players fixations. Having the eyes fixating on a point whenusing an eye tracker is referred to as a fixation, and gaze precision defines how accurate thefixation measured by the eye tracker is based on how well it measures the same gaze point [5].However, measurement systems output digitized signals corrupted by noise resulting in poorprecision [6]. In an eye tracker this gives fixations where gaze points are dispersed insteadof being focused on the players point of interest which yields in poor precision (see Figure2). High frequency measurement noise in an eye tracker comes from different factors such asinfluence from the eyes and interference from the environment in which the measurement istaking place in [7,8]. The noise caused by the eyes comes from different eye movements such asmicro saccades, tremors, drifts, blinking, gaze directions in extreme peripheral regions wherethe precision degrades and other physiological properties [2,8–10]. System inherent noise comesfrom interference in the environment. Examples of this could be the illumination affecting theimage captured by the image sensors of the eye tracker, imperfections in the algorithm usedto estimate the gaze point from the image [11, 12] and limitations on the hardware used forthe algorithm calculation. Poor precision particularly affect eye trackers used in entertainmentmediums such as video games where the input from the player needs to result in an immediateand accurate output on the screen. In this case the output from the game would not correspondto the player’s expected input causing the player and the game to be out of sync. Similar to thegaze accuracy this can lead to various issues such as difficulties in UI-interactions or renderingartifacts caused by incorrect gaze data. To improve the precision a low-pass filter is appliedto attenuate high frequencies. Choices of low-pass filters could be a first-order low-pass filter,average filter or dynamic filters such as a spatial filter that adjust its filter characteristics basedon distance between two gaze points.

2.2 Gaze-dependent Depth Of Field

The Human Visual System (HVS) is regularly changing the focus point when the human eyesis scanning the environment. The view in the central part of the eye’s visual field is perceivedsharply while the view outside of it (the parafoveal and peripheral vision) has less detail and is

6

Page 20: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

2.2. Gaze-dependent Depth Of Field

perceived as blurry. This is directly related to the foveal vision in which the fovea gives a sharpvision of the view seen in the central two degrees of the visual field [13–15]. Changing the focuspoint cause the fovea to be directed in a new direction and the image of the view in the newdirection is then projected on the fovea. Similarly there is a depth range around the focus pointwhere the projected image is perceived as sharp by the eyes while anything in front of or behindthis range loses detail and is perceived as blurry. The depth range where the image is in focusis known as depth of field (DOF). The distance to this range is known as the focal distance (alsoknown as focus distance).

DOF is similar to cameras where the optical lens focuses or diverges incoming light. Lightrays emitted at the focal distance will be refracted by the optical lens and converge to a singlepoint on the image plane (film in the camera) giving us a sharp image. Light rays emitted ata distance outside the focal distance will diverge more from each other after the lens refrac-tion and end up intersecting the image plane in a conic-like shape, see Figure 4. This shapeis approximated by the Circle of Confusion (CoC) [16–18] whose diameter is proportional tothe distance where the light rays are emitted from. If the diameter becomes large enough tomake the intersecting shape distinguishable from the smallest point that a human eye can see,it contributes to a blurring effect.

In computer graphics the default camera model used is a pinhole camera model. This modelalways gives a sharp image because theoretically it has an infinite small aperture that onlyallows one light ray from each point in the scene to pass through [17], see Figure 5. This causesthe projected point on the image plane to be infinitely small. However, since real cameras areequipped with lenses and have a finite size on their aperture, thereby allowing multiple lightrays from each point in the scene to pass through, they can cause the occurrence of the CoC.Rendering the projected point on the image plane corresponds to a pixel on the screen. Foran infinitely small aperture this means we only sample the pixel we are currently processingwhen determining its color. If we simulate an aperture with a finitie size and the CoC is largeenough, the neighboring pixels are sampled when determining the color of the current pixelwhich results in a blurry effect.

Figure 4: Thin lens camera model. Image courtesy to [17].

7

Page 21: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

2.2. Gaze-dependent Depth Of Field

Figure 5: Pinhole camera model. Image courtesy to [17].

DOF has been simulated in fields like photography and cinematography for a long time [16]and was introduced in the early age of computer graphics [19]. Today it is a powerful renderingtechnique when used in the VFX and games industry where developers use it to give a deeperimmersion. It is also used as a tool to guide the client’s focus in the scene being displayed. Ingames depth of field has traditionally been static; a parameter controlling the focal distance isset and the scene is then blurred accordingly. For some types of games this doesn’t impose anygameplay restriction. For example, in FPS games the player focus almost exclusively on thevisor which is located in the center of the screen [20]. Based on this, developers may simplyset the focal distance to be within a range that keeps the center of the screen always in focus.Nevertheless this contradicts the natural behavior of the HVS and cameras where the focal dis-tance is changed according to the player’s gaze. Gaze-dependent DOF removes this restrictionby simulating the behavior of the eyes. In this technique the focal distance is used as a dynamicparameter whose value is determined by the area the player is gazing at. This makes the focaldistance always dependent on the player’s gaze point. This technique is implemented in twoways:

Focal distance from depth of a pixel: The coordinates of the gaze point are used to retrieve thefocal distance from the depth of a pixel [21]. A pixel is sampled from the coordinates, and thedepth of the pixel is then sampled from the depth buffer of the rendered scene, transformed tothe correct coordinate system and then used as the focal distance when calculating the CoC.

Focal distance from focus zone: A focus zone centered on the filtered gaze point is used tocalculate the focal distance [22]. The depth of each pixel in the focus zone is sampled from thedepth buffer and stored in a new buffer. The average depth is then calculated from the focuszone depth buffer. This method is further enhanced by associating weights with the pixels.The first approach to this uses a Gaussian function to calculate the weights of each pixel in thefocus zone. The center of the focus zone which corresponds to the gaze point has the maximumweight which gradually decreases for pixels further away from the center. The effect of this isthat more importance is given to depths located around the center of the focus zone. The secondapproach uses semantic weighting where objects in the game have a weight. This creates apriority system where the depths of one object to has higher priority than the other object whenboth fall within the focus zone.

8

Page 22: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

2.3. Gaze-dependent Tone Mapping

2.3 Gaze-dependent Tone Mapping

The dynamic range of luminance in the real world is very high, spanning from the light of astar to the light from the sun. The range of luminance between the two is in ten orders of ab-solute range and four orders of dynamic range from shadows and highlights in a scene [23].Mapping the high dynamic range of luminance in the real world to a display format such asprint on photographic papers and monitors (CRT, LCD etc.) to reproduce the luminance detailsis challenging due to the significantly lower dynamic range they have. Mapping from high dy-namic range to low dynamic range is known as tone mapping, and a well-known tone mappingmethod for non-digital black-and-white photography is the Zone System [24–26]. This methoddivides photography in two categories of zones: scene zones and print zones. Scene zonesrepresent an approximate of the luminance range in the scene while print zones represent anapproximate of the reflectance of a print. There are eleven print zones ranging from pure blackto pure white. In turn there may potentially be many more scene zones because of the muchhigher dynamic range in the scene. Reproducing the high dynamic range to the print is doneby mapping the scene zones to a print zone. This is done by first finding the luminance rangeof the middle brightness of the scene (known as the middle gray) and mapping it to the middleprint zone, and then finding the luminance range for darker and lighter regions and mappingthose to other print zones, see [23–26] for more details and Figure 6 for an illustration of theZone System. In digital photography there are techniques such as high dynamic range imaging(HDRI) to reproduce the high dynamic range from exposure values known as stops. In thistechnique several pictures of a scene are captured at different exposure levels or stops and thenblended together to get the final stop, see Figure 7 for an illustration of this technique.

Figure 6: The Zone system maps scene zones to print zones with the middle brightness ofthe scene mapped to the middle print zone. Image courtesy to [23].

9

Page 23: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

2.3. Gaze-dependent Tone Mapping

Figure 7: HDRI technique showing pictures of a scene taken at different stops. Image cour-tesy to [27].

Many tone mapping algorithms used in computer graphics and digital photography are basedon traditional photography which treat all parts of the scene equally. While this gives goodresults it contradicts the natural behavior of the HVS. The dynamic range of the eye at anygiven moment is limited to four orders of magnitude but the eyes have temporal adaptationto “extend” this dynamic range under varying luminance conditions. This is done by movingthe detailed vision to the new luminance range in the scene which makes it very sensitive to adynamic range of ten orders [23, 28, 29]. For instance, when entering a dark room from a brightroom the change in the luminance is intense and the temporal adaptation adjusts the eye to thedark environment allowing it to see more details in the dark. Another example is when steppinginto a movie theater whilst a movie is playing. At first the visible details are primarily from themovie display while the theater is dark. After a while the eye has adapted to a new range ofluminance that includes darker regions allowing the observer to see some details of the movietheater itself. The temporal adaptation in the HVS adapts to an area covered by one degree ofthe viewing angle around the gaze direction of the observer, and areas that fall outside havesignificantly less impact on the result of the adaptation [30]. Most tone mapping algorithmsdo not simulate the temporal adaptation of the HVS [29] opting to go for a global approach bytreating the luminance of the whole scene equally. In addition the global approach influencesthe image quality as it has been shown that gazing areas highly influence the quality of the finalimage [31]. In gaze-dependent tone mapping a global tone mapping approach [23] can be usedto derive a local gaze-dependent approach by using the pixels of a HDR image in a focus zonecorresponding to the gaze area instead of the entire image [32]. This is a closer simulation to thetemporal adaptation and helps improving the image quality.

10

Page 24: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3 Implementation and results

3.1 Eye Tracker integration in Frostbite 3

Using C++ as codebase a class is implemented to integrate the eye tracker in Frostbite 3. Theengine delegates all tasks related to the eye tracker device to the class. This includes connectingand disconnecting from the device, starting and stopping the eye tracking, loading user profiledata for the device and fetching and sending gaze data. To allow different eye trackers of sameor different hardware vendors to work with the engine an abstract class is first implemented.This class is a base class for all eye tracking devices and all eye tracker implementations derivefrom this class. To keep the interface simple the abstract class has four pure virtual functionshandling common functionality in all eye trackers: initializing the eye tracker, sampling gazedata, retrieving gaze data and retrieving the sampling frequency of the eye tracker. The interfaceof the class is shown in Listing 1.

c l a s s EyeTracker{publ ic :

// Ctor and DtorEyeTracker ( ) ;v i r t u a l ~EyeTracker ( ) ;

v i r t u a l void i n i t i a l i z e ( ) = 0 ;v i r t u a l void sample ( f l o a t deltaTime ) = 0 ;v i r t u a l Vec2 getRawGazePoint ( ) const = 0 ;v i r t u a l f l o a t getSamplingFrequency ( ) const = 0 ;

// Other member f u n c t i o n sp r i v a t e :

// Member v a r i a b l e s . . .} ;

Tables 1: Abstract interface class for eye trackers.

initialize() loads the eye tracker settings, connects the eye tracker and starts the tracking. Utilityfunctions for initializing the eye tracker may be available depending on the supplied library.Among the utility functions supplied with Tobii X2-30 library are load and validation of systemand profile configurations, error registrations when connecting and disconnecting to the eyetracker and error registrations when starting and stopping the eye tracking. The error registra-tion is useful for debugging purposes if the eye tracker fails to track the client. Potential causesare disconnections, failures in connecting to the eye tracker or failures in preparing or startingthe tracking. The Tobii X2-30 library also uses an event loop to process the input data comingfrom the eye tracker. Since the event loop is a run loop (it runs indefinitely until the applica-tion is closed or aborted) it is a blocking call and therefore run on a separate thread to preventit from blocking the main thread of the application. Once the event loop is up and running aconnection to the eye tracker is established and tracking is started and processed by the event

11

Page 25: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.1. Eye Tracker integration in Frostbite 3

loop. The library function starting the tracking has a callback function which is called everytime new gaze data is sampled.

sample(. . . ) becomes a special case when using Tobii X2-30. Originally the function wasincluded in the base class to stay consistent with the interface used by all input devices in Frost-bite 3. Since the sampling function is called every frame by the engine the idea was extendedso it would sample the gaze data every frame. However, this created a conflict with the TobiiX2-30 C API. As mentioned earlier the function starting the tracking registers a callback func-tion called every time new gaze data is available, and the gaze data is cached in this callbackfunction. This makes the callback function in essence a sample function. Using the sample func-tion derived from the engine as the callback function is not possible because its type signaturediffers from the type signature required for the callback function in Tobii X2-30 C API. Due tothis the sampling function derived from the engine was left empty at first, and then later usedto check the connection of the eye tracker in every frame. If the state showed that it wasn’tconnected then it would try to re-connect to the device. This goes against the intention of thesample function in the engine. However, due to constraints imposed by both APIs it was left inthis state.

getRawGazePoint() and getSamplingFrequency() are ordinary getter functions. The formerreturns a two-dimensional vector representing the raw gaze data on the monitor. This is thegaze data cached every time by the callback function registered with the tracking. The samplingfrequency simply returns the frequency of the eye tracker sampling. Tobii X2-30 has a samplingfrequency at 30 Hz.

c l a s s TobiiEyeTracker : publ ic EyeTracker{publ ic :

// Ctor and DtorTobiiEyeTracker ( ) ;~TobiiEyeTracker ( ) ;

void i n i t i a l i z e ( ) overr ide ;

// Sampling funct ion of enginevoid sample ( f l o a t deltaTime ) overr ide ;

Vec2 getRawGazePoint ( ) const overr ide ;f l o a t getSamplingFrequency ( ) const overr ide ;

// Other member f u n c t i o n s . . .

p r i v a t e :// Callback funct ion to r e g i s t e r with the t r a c k i n gs t a t i c void gazeCB ( const tobi igaze_gaze_data ∗ gaze_data ,

void∗ userData ) ;

// Member v a r i a b l e s . . .} ;

Tables 2: Interface for Tobii Eye Trackers.

The class declaration in Listing 2 shows the interface for Tobii eye trackers. From the classdeclaration we see that gazeCB(. . . ) (the callback function registered with the tracking) has acompletely different signature from sample(. . . ) inherited from the base class which shows thesignature constraint described earlier. An instance of the Tobii Eye Tracker class is created in aninput device manager handled by Frostbite 3 which in turn calls initialize to start the tracking.A flow chart of the integration is shown in Figure 8.

12

Page 26: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 8: Flow chart of eye tracker integration in Frostbite 3.

3.2 Eye Tracker Filtering

As previously covered in section 2.1.2 high frequency measurement noise gives bad precisionduring fixations. This yields a noisy signal due to the eye tracker sampling gaze data dispersedaround the point of interest of the player. This noise is seen in the high spikes in the signal, seeFigure 9.

Figure 9: Signal with measurement noise.

13

Page 27: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Reducing the measurement noise by attenuating high frequencies requires the signal to be fil-tered. Since the noise mainly consists of high frequencies it makes low-pass filters very goodcandidates for noise attenuation. Four different low-pass filters are implemented and comparedto each other:

• Average filter: A filter that use N samples and calculates the arithmetic mean.

• First-order low-pass filter: A discretized first-order low-pass filter that allows low fre-quencies below a cutoff frequency to pass while attenuating frequencies above the cutofffrequency. The filter is affected by the sampling frequency and a time constant.

• Online cursor filter: A dynamic filter that adjusts itself after the signal trend between twomean values. This filter aims to simulate a mouse cursor [11]

• Spatial filter: A dynamic filter that adjusts its filter trend based on the distance betweentwo gaze points.

The implementation and results from each filter is described in the following sections. Notethat each filter shows different sessions of recorded gaze data. Since we’re mainly concernedwith the results from applying filters rather than results of a specific set of gaze data, differentrecording sessions don’t affect the final result; any set of gaze data should show similar resultswhen the same filter is applied to them. Similarly the dimensions of the gaze data don’t affectthe result and thus only one dimension is shown in the results for simplification.

3.2.1 Average filter

One of the more common choice of filters when attenuating noise is the average filter. As thename implies the average of a number of samples is calculated using Equation 3:

x =1N

N´1ÿ

i=0

xi (3)

x denotes the mean value, xi is the measured sample and N is the total number of samples usedin the calculation of the mean value. We see how the filter attenuates noise in a signal by lookingat the standard deviation which shows the dispersion between the samples and the mean valueof the samples. The standard deviation is calculated from Equation 4:

σ =

b

řN´1i=0 (xi ´ x)2

N ´ 1(4)

where σ is the standard deviation. Equation 4 shows that if the number of samples N is in-creased σ becomes smaller leading to less dispersion from the mean value. When samples areless dispersed from the mean the noise is being attenuated. Applying the average filter to asignal is shown in Figure 10 and 11. In Figure 10 we see the filtered signals (green and red)being much smoother compared to the raw signal which fluctuates, and we see that Equation 4holds true. Figure 11 shows a subset of the same result to see the smoothing effect of the filterbetter.

Although the average filter performs good noise attenuation, it shows poor results in main-taining edge sharpness which is seen in Figure 10. In comparison to the filtered signals the rawsignal maintains very sharp edges. Increasing the amount of samples makes the edges lose theirsharpness even more. This is seen when comparing the green signal with the red signal wherethe amount of samples is increased from 20 to 100. The edges represent the step responseswhich are sudden changes in the signal. Step responses in eye tracking occur when saccadesoccur, and less edge sharpness means a slower step response. This should be avoided since aslow step response makes the eye tracker slow at tracking the player’s gaze.

14

Page 28: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

This would be the equivalent of having an input lag when moving the mouse cursor from onepoint to another.

Another disadvantage with the average filter is that it considers all samples, even the oldestsamples, to be equally important. If the eye tracker was producing a signal that kept a constantmean (if a fixation was occurring) this would be desirable. However, generally the player willbe scanning the virtual world, causing the signal to have varying means as the gaze point willregularly move to different positions in the virtual world. If the mean varies regularly thesignal is changing its trend, and trends should avoid having old samples weight equally (orat all) on the signal to avoid being influenced with incorrect gaze data. Overall the averagefilter performs noise attenuation well but suffers from two points: maintaining edge sharpnessaffecting the trend of the signal and the prioritization of gaze data. A tradeoff between edgesharpness and noise attenuation should be considered when using this filter.

Figure 10: Signal filtered by average filter.

15

Page 29: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 11: Signal filtered by average filter (zoomed).

3.2.2 Moving Average Filter

The average filter has drawbacks in memory consumption and computational efficiency. Equa-tion 3 shows the filter requiring us to store N samples in memory when calculating the meanvalue. If N is a small number then it is trivial in modern computers but if it is large, which maybe the case in simulations of complex systems, the filter becomes very wasteful with memory.For example, let’s assume we have a data set where each sample in the set consist of two vari-ables of type float. The amount of memory consumed by this data set can be calculated fromEquation 5:

size(N)dataSet = N ¨ sizeo f ( f loat) ¨ 2 (5)

Equation 5 shows that the memory consumption grows linearly. Assuming the size of float isfour bytes (the size used by floating point units following the single-precision floating-pointformat of IEEE 754 standard, see [33]) then a set of 1000 samples (two floats per each sample)requires 8 000 bytes and a set of 10 000 samples requires 80 000 bytes which is wasteful ofmemory. Similarly the computational efficiency will be unsatisfying; looking at Equation 3again we see that N additions and one division are performed. For example, assuming wehave a data set of 1 000 samples and need to calculate the mean in every frame in a program,running at 30 FPS adds up to 30 000 additions and 30 divisions in one second. Although 30 000additions is trivial in modern hardware it still wastes many clock cycles that can be used forother data instructions. Furthermore divisions are generally expensive operations comparedto other arithmetic operations since division algorithms executed by the CPU Arithmetic LogicUnit are less efficient so a good rule is to avoid divisions if not absolutely necessary.

One way to optimize the filter is to remove the division in Equation 3 by caching the result ofthe division in memory. The computational efficiency of the filter can then be optimized further

16

Page 30: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

by making the filter recursive. A recursive filter uses the previous output from the filter as areference, see Equation 6.

xk = xk-1 +1N(xk ´ xk-N) (6)

Here xk is the average of N latest samples at instant k. We see that xk depends on three vari-ables: the previous average xk-1, the newest sample xk and the oldest sample from the previousaverage xk-N. Equation (7-9) shows how Equation 6 is derived:

xk =1N

kÿ

i=k´N+1

xi (7)

xk-1 =1N

k´1ÿ

i=k´N

xi (8)

xk ´ xk-1 =1N

(

kÿ

i=k´N+1

xi ´k´1ÿ

i=k´N

xi

)

ô

xk ´ xk-1 =1N(xk ´ xk-N) ô

xk = xk-1 +1N(xk ´ xk-N)

(9)

Equation 7 and 8 calculate the current and previous averages while Equation 9 subtracts thelatter from the former and solves xk. This filter is called Moving Average Filter (MAF) becausethe filtered output uses the most recent N samples. One can think of it as a window movingover a data set for each instant k that is calculated.

Equation 6 shows that only one addition is required and this is true for any number ofsamples used in this filter. Comparing the number of additions performed in MAF to the Aver-age Filter when using 1000 samples shows that MAF performs three orders of magnitudes lessadditions which is a large improvement at the cost of very small changes in the filter imple-mentation. While MAF improves computational efficiency it still needs to store N samples likethe Average Filter since xk-N (the oldest sample from the average of the previous instant) needsto always be available when calculating the output at instant k. In addition it also needs tostore the previous average. Overall MAF is preferred but it should be noted that it suffers fromthe same drawback in memory consumption as the Average Filter. A block diagram in stan-dard DPS notation for Equation 6 is shown in Figure 12, illustrating how the previous output isadded to the summation of the calculation giving the recursive trait of the filter.

17

Page 31: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 12: Block Diagram of Moving Average Filter.

3.2.3 First-order Low-pass filter

A first-order low-pass filter (LPF) allows low frequencies to pass through while attenuatingfrequencies higher than the cutoff frequency. The cutoff frequency is defined as the frequencywhere the filter attenuates the output ´3dB below the nominal output from the passband. Thegaze data from the eye tracker gives a signal with high frequency fluctuations so applying alow-pass filter to it attenuates the fluctuations which yields a smoother signal.

Two examples of low-pass filters are shown in Figure 13 and Figure 14. In the figures themagnitude of frequencies starts being attenuated from 1.01 rad/s and 0.398 rad/s, both whichare the cutoff frequencies of the filters. Smaller frequencies are not attenuated and fall withinthe range of the pass band.

Figure 13: First-order low-pass filter. The square represents the cutoff frequency.

18

Page 32: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 14: Butterworth First-order low-pass filter. The square represents the cutoff frequency.

The equation for a digital LPF can be derived from the transfer function of the filter in theLaplace domain shown in Equation 10:

F(s) =Y(s)

X(s)=

11 + τs

(10)

F(s) is the transfer function, X(s) and Y(s) are the Laplace transformations of the filter inputand output in time domain, τ is the time constant describing the time to reach 63% of the newsteady-state value for the filter’s step response1and s is the complex argument. The transferfunction describes the ratio of the filter output and input. If s is substituted with jω the transferfunction will output lower values as ω increases which conforms the behavior of a low-passfilter. Cross-multiplying the denominators in Equation 10 yields Equation 11:

(1 + τs) ¨ Y(s) = X(s) ô Y(s) + τs ¨ Y(s) = X(s) (11)

Equation 11 is then transformed from Laplace-domain to time-domain by taking the Laplaceinverse of it yielding Equation 12:

y(t) + τ ¨dy

dt= x(t) (12)

Equation 12 gives us the equation of a LPF in continuous time-domain. Since digital filtersare in discrete-time space we need to discretize Equation 12. We substitute t with tn to denote apoint in discrete-time space and discretize the derivate using the Backward Euler DifferentiationApproximation. This method gives us an approximation of the derivate illustrated in Figure 15.

1A step response is the output of a system when the input is a step input such as the Heaviside step function.

19

Page 33: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 15: Backward Euler Differentiation Approximation. Image courtesy to [34].

Figure 15 shows the plot of a function of time y(t), the function at two discrete-time points y(tk)and y(tk-1), the step size h, the exact slope of the function y(tk) and the slope of the secant lineformed between the two points representing the approximated derivative which we denote asy1(tk). We see the error between the exact and approximated derivative

ˇ

ˇy(tk) ´ y1(tk)ˇ

ˇ becom-ing smaller with smaller step sizes since the slope formed by the secant lines approaches closerto the exact slope, see [35] for examples illustrating this. While smaller step size gives betterapproximations of the exact derivative, making it too small introduces a round-of error. Theround-of error affects the result more than the error between the real and approximated value,and as such the approximation stops improving beyond a certain threshold of the step sizeh [35]. The equation for Backward Euler Differentiation Approximation is shown in Equation13:

dy

dt«

y(tn) ´ y(tn-1)

h, h =

1s f req

(13)

tn is a time in discrete-time space and h is our step size which represents the sampling intervalof the eye tracker where s f req is the sampling frequency of the eye tracker. Substituting thederivative in Equation 12 with Equation 13 gives us Equation 14:

y(tn) + τ ¨ +y(tn) ´ y(tn-1)

h= x(tn) (14)

Rearranging Equation 14 to solve y(tn) yields Equation 15:

y(tn) =

(

τ

h + τ

)

¨ y(tn-1) +

(

h

h + τ

)

¨ x(tn) (15)

20

Page 34: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

The fraction(

τ

h+τ

)

is commonly denoted as α:

α =τ

h + τ, (1 ´ α) =

h

h + τ(16)

Substituting the fractions in Equation 15 with Equation 16 yields in Equation 17, the final equa-tion of the discretization process and the discretized LPF:

y(tn) = α ¨ y(tn-1) + (1 ´ α) ¨ x(tn) (17)

Figure 16 and 17 show the LPF applied to a noisy signal:

Figure 16: Signal filtered by First-order Low-pass filter.

Figure 17: Signal filtered by First-order Low-pass filter (zoomed).

21

Page 35: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

The filtered output from applying LPF shows sufficiently good noise reduction similar to theAF and MAF but the memory consumption is vastly decreased in LPF. Equation 17 removesthe constraint of storing data N time instances in the past (see Equation 3 and 9 of the AF andMAF). Instead, it only requires data from the current and previous time instance. Similar toAF and MAF, division can be avoided by caching the value of α and (1 ´ α), reducing it to twomultiplications.

One problem in Equation 17 is choosing the value of y(tn-1) at the beginning of an eye track-ing session where n = 0. Since no filtering has been performed at that time instance we areleft with an undefined value for y(tn-1). One solution is to let y(tn-1) = x(tn). Using the samevalue as the first input gives a filtered output with the same value as the first raw input and thisbecomes the first “true” filtered output for the previous time instance.

3.2.3.1 Time constant against Cutoff Frequency

The relation between the time constant τ and the cutoff frequency fc in LPF is shown in Equation18:

τ =1

2π fc(18)

Recall that τ describes the time it takes for a step response to reach 63% of the new steady-stateof the signal. If we set fc to be at large frequencies τ becomes smaller, and the filtered outputreaches the steady-state faster. However, a larger fc also means allowing a wider range of highfrequencies to fall within the passband in the frequency domain which introduces more noise inthe signal. A balance needs to be found between τ and fc where a new steady-state is reachedfast enough without introducing too much noise. Equation 18 is illustrated in Figure 18 wherethe curve has a horizontal asymptote at τ = 0:

Figure 18: Time constant τ against cutoff frequency fc.

The result of applying a small and large time constant is shown in Figure 19. The green signalhas a large τ for a small fc which is the opposite of the red signal. The noise attenuation shouldthen be considerably strong in the green signal compared to the red signal while having lesssharpness in the edges since it takes longer to reach the new steady-state of the signal. This is

22

Page 36: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

illustrated more clearly in Figure 20. The red signal shows the opposite trait of the green signalfor small τ and large fc.

Figure 19: First-order Low-pass filter applied to a signal with low and high cutoff frequen-cies.

Figure 20: First-order Low-pass filter applied to a signal with low and high cutoff frequencies(zoomed).

23

Page 37: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

3.2.4 Online Cursor Filter

One restriction in the low-pass filter is its static behavior. The filter does not change its char-acteristics and instead performs the same filtering regardless of the signal trend. In Figure 19we see the low-pass filter showing two trends. The first trend is when the noise in the signal isattenuated at the cost of an increased time to reach the new steady-state value when a saccadiceye movement occurs. The second trend is reaching the new steady-state of the signal fast butat the cost of more noise in the signal. This relation between noise attenuation and increasedtime to reach the steady-state of the signal is shown in Equation 18 as described in the previouschapter and clearly seen from the green and red signal in Figure 19 and 20. An improvement tothe filter would be strong noise attenuation during fixations and also reaching the steady-stateof the signal fast when saccades occurs in the input signal. The filter would adjust itself afterthe signal trend making it dynamic. In other words, if the input signal has a constant mean thena fixation is occuring and strong noise attenuation is applied and if the input signal has a signif-icant change in the mean the filter applies weaker attenuation to reduce the time for reachingthe new steady-state of the signal.

A filter with dynamic behavior is the Online Cursor Filter (OCF) [11]. OCF was used tosimulate the mouse cursor movement by using the gaze data as input to the mouse cursor onthe monitor. Various experiments were performed such as allowing the user to fixate the cursoron desktop icons and to respond fast to saccadic eye movements. During fixations the noise wasattenuated and for prompt responses to saccades the gaze data was attenuated less to avoid longtimes to reach the new steady-state of the signal.

In the implementation of LPF we saw in Equation 18 that the time constant controls thenoise attenuation and the time it took to respond to saccades in input signals. This is used inthe implementation of OCF where the time constant τ depends on the trend of the signal. Theunderlying idea is that if the signal has a constant mean a fixation is occuring and τ is set largeto perform strong noise attenuation. However, if the mean of the signal changes significantlya saccade occurs and τ is set to a very small value to avoid the filtered output lagging behindthe new steady-state of the signal. Once the filtered output matches the new mean of the inputsignal τ grows large again to increase the noise attenuation unless a new saccade occurs. Thesteps for implementing OCF are described below:

1. Calculate two means from samples of six discrete time instances: one mean from samplests[tn], s[tn-1], s[tn-2]u and the other mean from samples ts[tn-3], s[tn-4], s[tn-5]u.

2. Calculate the mean difference meandiff between the two means from the previous step.

3. Check if meandiff exceeds a threshold g. If it exceeds the threshold a saccade occurs and analarm is triggered which sets time constant τ to a very small value τsmall for weak noiseattenuation.

4. Use Equation 17 to calculate the filtered output with the same α defined in Equation 16.

5. Reset τ along an exponential curve with a step size dτ until it reaches a sufficiently largevalue τlarge for strong noise attenuation.

In step 1 the oldest sample s[tn-5] is removed when the sample s[tn] in the next time instanceis received followed by the remaining samples being pushed back one time instance. This issimilar to the AF and MAF where we store samples N discrete time instances in the past andalways remove the oldest one when a new sample is received.

In step 2 the difference between the means is calculated to allow a smaller threshold to beused. If the difference was only between two samples taken from the raw gaze data the differ-ence could exceed the threshold even when the player is gazing at the same point because ofthe noise. This would lead to false saccades being triggered which would set the time constantto a very small value. Increasing the threshold value avoids this but in turn degrades the filter

24

Page 38: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

sensitivity for real saccades. For example, if the player is gazing at a point close to their pre-vious gaze point the difference between the two points may fall within the threshold that hadbeen increased to avoid false saccades and the movement to the new gaze point would be slowbecause the time constant would be large. Taking the mean of several samples reduces the noiseand gives us a better representation of the point the player is gazing at which avoids false sac-cades while maintaining filter sensitivity. The filter sensitivity could be increased further if thenumber of samples used in step 1 is increased per each mean value. The noise would be atten-uated even more then but this affects the delay we already get from taking the mean differenceto begin with. When using three samples per mean the difference between the two means isseparated at s[tn-2] giving us a delay of two samples. In [11] two samples correspond to a delayof 40 ms at a sampling frequency of 50 Hz. The sampling frequency of Tobii X2-30 eye trackeris 30 Hz, giving us a delay close to 66 ms. The delays are small enough to not be noticable bythe player but increasing the filter sensitivity from more samples could result in delays largeenough to be noticeable. A tradeoff between filter sensitivity and a delay in response should beconsidered.

Step 3 sets τ to τsmall if an alarm is triggered. The filtered output is then calculated in step 4and if τ was changed in step 3 then it is reset through an exponential trajectory in step 5, see [11]for more details. The choice of an exponential trajectory in step 5 is to let the time constant growslowly in the beginning so that the filter output has enough time to reach the new steady-stateof the signal before the time constant becomes large. If a linear trajectory or a trajectory wherethe time constant becomes large under a short time was used, resetting it would not have thesame effect. This is because it would yield a time constant with a large value in the next timeinstance resulting in a longer time to reach the new steady-state of the signal.

Figure 21 shows a block diagram of OCF where we see how the time constant is set to a smallvalue if the change detector exceeds the threshold g(n) which triggers an alarm. Figure 22 and23 shows gaze data filtered by OCF. The filtered signal in Figure 22 demonstrates the dynamicbehavior of the OCF which is seen when the signal has constant mean (fixation) and changesin the mean (saccades). The parts with a constant mean have a mean difference that does notexceed the threshold which triggers no alarm, therefore the time constant is kept large to givestrong noise attenuation. This is better seen in Figure 23 in the red signal which is smooth incomparison to the raw signal. When a saccade occurs the threshold is exceeded and an alarm istriggered. This sets the time constant to a small value so the filtered output can reach the newsteady-state value of the signal in a very short time. This is seen from the sharp edges of the redfiltered signal in Figure 22.

The result from OCF is comparable to the AF and MAF. Furthermore it does not suffer fromthe drawback of having a long time reaching the steady-state of the signal which LPF suffersfrom if a small cutoff frequency is used in preference over a stronger noise attenuation. Al-though the time it takes to respond to saccadic eye movement is very small when using theOCF it is still slightly slower than the response from a raw signal, as illustrated in Figure 24.Theoretically one could set τsmall to be zero which would be the same as applying no filteringat all. However, doing so would make the measurement noise slightly visible after an alarmis triggered so applying some filtering is preferred [11]. In Figure 22-24 the threshold g wasset to 100, τsmall was set to 50 ms and τlarge was set to 1500 ms. For resetting τsmall along theexponential trajectory a dτ value of 0.01 was found to be good at a sampling rate of 30 Hz.

25

Page 39: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 21: Block diagram of Online Cursor Filter. Image courtesy to [11].

Figure 22: Signal filtered by Online Cursor Filter.

26

Page 40: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 23: Signal filtered by Online Cursor Filter (zoomed).

Figure 24: Edge of signal filtered by Online Cursor Filter (zoomed).

OCF needs to store more samples compared to LPF which only required two samples (inputsample of the current time instance and the filtered output from the previous time instance, seeEquation 17). In addition to the filtered output from the previous time instance OCF requiresat least (N ´ 2) samples to be stored where N is the number of samples per mean value instep 1 of the implementation. It is also preferable to store the threshold g, the step size for theexponential trajectory dτ and the boundaries of the time constant τsmall and τlarge so there is aslightly larger memory requirement for OCF.

27

Page 41: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

OCF also requires more operations to be performed. When calculating the filtered outputit performs equal amount of operations to LPF. When calculating the two means in step 1 itperforms equal amount of operations to MAF. Furthermore the mean difference needs to be cal-culated and since we are only interested in the magnitude of the mean difference the absolutevalue in each dimension needs to be calculated. Step 5 also requires that τ is increased by dτ

and that the exponential value of that is calculated to move along the exponential curve. Finallythere are conditional statements used for several of the steps such as making sure that τ doesnot go out of its boundary and that the mean difference exceeds the threshold. Comparisons,additions and absolute values are cheap operations but a call to an exponential function is anexpensive operation [36]. In conclusion we see that OCF requires more data to be stored andmore operations to be performed. On modern computers the memory footprint is still consid-erably low and the calculations are done fast but nevertheless it is a good practice knowing thatdynamic filters are more complex and often have more parameters and operations involved.

3.2.4.1 False Alarms

A drawback with OCF is false alarms triggered from false saccades. As mentioned a thresholdcontrolling the filter sensitivity is used. Using a small threshold introduces a greater risk forfalse alarms being triggered due to the noise in the raw signal. An example is when two gazepoints lie very close to each other. Another example is when a real alarm is triggered. In thissituation the eyes try to fixate on the new point of interest on the monitor and meanwhile theeye tracker tracks neighboring points surrounding the new point of interest. This could causethe mean difference between the new point of interest and its neighboring points to exceed thethreshold which triggers another false alarm. False alarms should be avoided because they givevery distracting experiences to the players.

Figure 25 shows OCF with 100 respectively 200 as threshold values. The three arrows pointto peaks that occurred from false alarms being triggered in the raw and red signal. The first andthird peaks (arrow 1 and 3) are from false alarms triggered when the signal had a mean withsmall changes. This occurred when the player gazed only slightly from the point of interestwhich the filter picked up due to its high sensitivity. The second peak (arrow 2) occurred after asaccadic eye movement when the player attempted to fixate on a new point. In comparison theblue signal shows a small peak and no peaks at all at the same parts. The small peak in the bluesignal pointed by the second arrow is not yielded from a false alarm but from a slow responsebetween two gaze points since the time constant is kept large (τ = τlarge). This becomes evidentwhen observing edges since they are less sharp in comparison to the raw and red signals. Figure26 shows the peaks zoomed in for better clarity. Figure 27 demonstrates the effect of having alarge threshold to avoid false alarms. The red filtered signal catches up to the new state of thesignal fast because the mean difference exceeded the threshold which triggered an alarm givingthe filter a small time constant (τ = τsmall). In the blue signal the mean difference does notexceed the threshold so the time constant stays large and the filtered output responds slowlyto the saccades. A good balance is required between avoiding false alarms and filter sensitivityfor smaller changes.

28

Page 42: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 25: False Alarms in Online Cursor Filter.

Figure 26: False Alarms in Online Cursor Filter (zoomed). Left: Arrow 1, Middle: Arrow 2,Right: Arrow 3.

29

Page 43: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

Figure 27: Online Cursor Filter with large threshold giving slow response when steady-stateof the signal changes.

3.2.5 Spatial Filter

The Spatial filter (SF) is a dynamic low-pass filter behaving similarly to the OCF. The distancebetween two gaze points is calculated, and if the distance is large the noise attenuation is weakto allow the filter output to reach the steady state of the signal quickly. If the distance is small afixation is likely occurring and strong noise attenuation is performed to keep a precise fixation.Unlike the OCF there is no dependency on a threshold or time constant, and instead the filteroutput follows the trend of the signal based on the hyperbolic tangent of the calculated distance.The filter is recursive in its behavior; the last filtered output from the previous time instance isused to calculate the filtered output of the current time instance. Let x(tn) and x(yn) be ourinput and filtered output at current time instance n and let y(tn-1) be the filtered output fromthe previous time instance n ´ 1. We first calculate the distance between x(tn) and y(tn-1) usingEquation 19:

distxy =b

(x1 ´ y1)2 + (x2 ´ y2)2 + (x3 ´ y3)2 + ¨ ¨ ¨ + (xk ´ yk)2 (19)

where k is the dimension of the data. The hyperbolic tangent of the distance is then calculatedwith Equation 20:

α = tanh (distxy) =exp2¨distxy´1

exp2¨distxy+1, 0 ĺ α ĺ 1 (20)

The hyperbolic tangent of the distance returns a value lying in the interval [0, 1] where endpoint1 corresponds to no filtering and endpoint 0 corresponds to the filtered output from the previoustime instance. We modify Equation 17 to match this in Equation 21. In other words, if α is 0then Equation 21 gives the filtered output in the previous time instance, and if α is 1 it gives theraw input.

y(tn) = (1 ´ α) ¨ y(tn-1) + α ¨ x(tn) (21)

30

Page 44: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.2. Eye Tracker Filtering

The hyperbolic tangent is illustrated in Figure 28. We make two observations in this figure. Thefirst one is that the function lies in the interval [´1, 1]. Since we are only interested in interval[0, 1] we clamp all values from the function to that interval. The second observation is the needof scaling the curve by a factor s to ensure that filtering is always done to some extent. Thisis to avoid the player’s potential perceptions of noise when alpha = 1. By scaling the curvethe maximum value of the function is decreased which ensures α ă 1. This means the filteredoutput can never be equivalent to the raw input (see Equation 21) therefore giving us an outputlying in the interval [0, 1). Scaling the curve yields Equation 22:

tanh (distxy) =exp2¨distxy´1

exp2¨distxy+1¨ s, s ă 1 (22)

Equation 22 is plotted by the red line in Figure 28. Figure 29 and 30 show gaze data filteredwith the SF with s = 0.9. The figures show the filter output maintaining sharp edges and strongnoise attenuation when fixation occurs. This makes it a very good candidate among the choicesof dynamic filters.

When it comes to storage we need to store less data compared to OCF. The filter requiresthe input of the current time instance and the filtered output from the previous time instance tobe stored like OCF. If the scaling factor needs to be adjustable it needs to be stored as well. Al-though the SF has less steps involved in its calculations compared to OCF these steps use moreexpensive operations on the CPU. Calls to square root, exponential functions and a division areexpensive operations [36]. This differs from other filters where less expensive operations (addi-tions, subtractions, multiplications) are mostly used. The efficiency of these operations can beimproved. For example, the square root can be reduced by using fast versions through SIMDextensions but this is in exchange for less precision [36] and beyond the scope of this thesis.

Figure 28: Hyperbolic Tangent Function. The blue line shows the standard function whilethe red line shows the function scaled.

31

Page 45: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.3. Gaze-dependent Depth of Field

Figure 29: Signal filtered by Spatial filter.

Figure 30: Signal filtered by Spatial filter (zoomed).

3.3 Gaze-dependent Depth of Field

The blur effect in DOF is approximated by the CoC. To control where the blur is applied in thescene we make the focal distance dynamic by feeding the gaze point to it, and the focal distanceis then used as input for the CoC. The equation for the diameter of CoC is derived from the

32

Page 46: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.3. Gaze-dependent Depth of Field

thin-lens model formula used in ray optics [19]. The equation for the thin-lens model is givenin Equation 23:

1U

+1V

=1F

(23)

U is the distance from the lens to a point of an object which light rays come from, V is the imagedistance from the lens to a virtual plane where the light rays converge to a point (not to beconfused with the actual image plane in the camera) and F is the focal length of the lens. Wecan solve the equation for V giving us Equation 24:

V =FU

U ´ F(24)

The diameter of the CoC is then expressed as in [19]. Let Vu and Vp be the image distancesof points U and P. The point U is at a distance from the lens resulting in light rays projectingand forming a CoC on the image plane as they converge towards Vu. The point P is a focalpoint (a point in the focal distance) resulting in light rays projecting and forming a point on theimage plane as they converge towards Vp. See Figure 31 for an illustration of this. Vu and Vp

are expressed in Equation 25:

Vu =FU

U ´ F, Vp =

FP

P ´ F(25)

From Figure 31 we see the triangles △LDA and △CBA are similar triangles which give usEquation 26:

LD

Vu=

CB

Vu ´ Vp(26)

LD is the lens size which we denote as LD = A and CB is the diameter for CoC which wedenote as CB = Cd. Solving for Cd we obtain Equation 27:

Cd = |Vu ´ Vp|A

Vu=

(

AF

P ´ F

)

|U ´ P|

U(27)

Figure 31: Circle of Confusion. Image courtesy to [19].

We want to make P in Equation 27 gaze-dependent. This is done by the algorithm describedin [22]:

1. Retrieve the gaze point f pscr from the eye tracker. This is the gaze point on the monitor.

33

Page 47: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.3. Gaze-dependent Depth of Field

2. Filter the gaze point using a low-pass filter to get the filtered gaze point f pscr. In [22] alow-pass filter with cutoff frequency of 15 Hz is used. We use the spatial filter with a scalefactor s = 0.9.

3. Simulate the autofocus system in [18] by calculating the focal point from the depth ofpixels in a square focus zone centered on f pscr.

4. Use the focal point in Equation 27.

In step 1, f pscr is sent as input to an Eye Tracker entity for filtering. In step 2 the entity filtersthe gaze point and outputs the filtered gaze point f pscr. This output is stored in an eye trackerdata structure which is sent to the post process system in Frostbite 3. The post process systemthen goes through the final two steps of the algorithm. The flow of the algorithm is illustratedin Figure 32:

Figure 32: Flow chart of gaze-dependent DOF in Frostbite 3.

Step 3 simulates a system similar to the autofocus system found in digital cameras. Autofocussystems in digital cameras change the distance to the focal point through an internal motor thatmoves the camera lens until it has the best distance to the point of interest. We choose to lookat a square focus zone similar to autofocus systems instead of a single pixel due to the gazeaccuracy being larger than one pixel (see chapter 2.1.1). The square focus zone represents theregion being gazed at as illustrated in Figure 33.

Figure 33: Autofocus system represented by a focus zone centered on a filtered gaze point.

34

Page 48: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.3. Gaze-dependent Depth of Field

The focal point is derived from the focus zone in several ways. Two methods were chosen forthis purpose: a function calculating the average depth and a function expressed in semanticand spatial weights as described in [18]. In both methods the depths of all pixels are stored in abuffer. Given a depth buffer of the focus zone the average depth is obtained from Equation 28:

P =1N

ÿ

x,y

U(x, y), x, y P FZeye (28)

where P and U are the same as in Equation 27, (x, y) denotes a pixel in the focus zone, FZeye

denotes all pixels in the focus zone and N is the number of pixels in the focus zone.The depth-weight approach in [18] associates each depth of a pixel with a semantic and

spatial weight. In semantic weighting each pixel corresponding to an object in the scene has aset weight. Depending on the importance of the object this weight can be set to a high or lowvalue, for more details see [18]. In our implementation semantic weighting is not used since itwas considered time-consuming to implement a framework where objects have an associatedweight in Frostbite 3. In spatial weighting, pixels in the center are given more weight thanthose further out in the focus zone. A buffer acting as a filter kernel is created to store thespatial weight of each pixel. The spatial weights are calculated from the Gaussian function inEquation 29:

G(x, σ) = a ¨ e´

(x´b)2

2σ2 + c (29)

σ is the standard deviation controlling the width of the bell curve, b is the center position ofthe peak in the bell curve, a is the amplitude of the peak of the bell curve, c is the constant thatraises or lowers the curve and x is the distance of a pixel in the focus zone to the center of thefocus zone. This is illustrated in Figure 34. We denote the distance to the center of the focuszone as x = dist(p, pcenter).

Figure 34: Gaussian Function.

35

Page 49: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.3. Gaze-dependent Depth of Field

Using the depth of the pixels in the focus zone, their associated spatial weights and excludingthe semantic weighting yields Equation 30 which is a slightly modified equation of the onedescribed in [18]:

P =

ř

x,y G(dist(U(x, y), Ucenter), σ) ¨ U(x, y)ř

x,y G(dist(U(x, y), Ucenter), σ), x, y P FZeye (30)

Figure 35 shows the focal point calculated using spatial weighting and Figure 36 shows thefocal point calculated as an average depth. The spatial weights have a very subtle effect on theend result. In both figures the focus zone is located at the very front of the boat while a smallportion of the outer parts of the focus zone lies on the water. With spatial weighting the portionof the focus zone on the boat should be given more focus while the portion on the water shouldbe more blurred. If the average depth is used where all pixels are treated equally important, thewater will have less blur due to the center pixels no longer having weights that gives them moreimportance. This gives a larger depth range to the focal point because the pixels lying on thewater are contributing more to its calculation. With spatial weights the pixels will contributeless and give a smaller range which helps blurring the water. This is seen in Figure 35 wherethe water has more blur compared to Figure 36 where it has less blur.

Figure 35: Focal point calculated using spatial weighting of the focus zone.

36

Page 50: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.3. Gaze-dependent Depth of Field

Figure 36: Focal point calculated as average depth of the focus zone.

When the focal point is calculated it is set as a constant to a shader that computes and appliesthe final blur to the scene. The final result is shown in Figure 37 and 38 where spatial weightingwas used when calculating the focal point. The region gazed at always stays in focus whileeverything outside the range of the focal point is blurred. In the figures the values of the in-dependent variables in Equation 29 were chosen to be the same as the values displayed for theblack signal in Figure 34. The size of the focus zone was set to 31x31 pixels on a 24” display(1920x1200). This focus zone is larger than the focus zone needed under ideal environment con-ditions. The large size of the focus zone was intentional because the eye tracker was measuringdata in a darker environment which made the angular distance error larger.

37

Page 51: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.3. Gaze-dependent Depth of Field

Figure 37: Gaze-dependent DOF.

Figure 38: Gaze-dependent DOF.

38

Page 52: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.3. Gaze-dependent Depth of Field

3.3.1 Simulation of accommodation phenomenon

When the eye’s focus point is changed there is a delay of a few milliseconds until they haveadjusted to the new focus area. This is known as the accommodation of the eye in the HVS.We simulate this feature by following the method described in [18] where a LPF filter is used tofilter the focal point. Recall that the time constant in LPF describes the time to reach 63 % of thenew steady-state value for the filter’s step response. Therefore to increase the delay to reachinga new focal distance we want to increase the value of the time constant. The focal distance isfiltered using Equation 15 and 16. The time constant is calculated from Equation 31:

τ =π ¨ fc

2(31)

This yields Equation 32 which calculates the focal distance with the accommodation effect:

P(n) = α ¨ P(n ´ 1) + (1 ´ α) ¨ P(n) (32)

n is the current frame being processed and P is the focal distance calculated from Equation 30.Figure 39 and 40 illustrate the difference between having and not having the accommodation.The red signal has a slightly less sharp edge making it slower in reaching the new steady-statevalue of the system (the focal point in this case) compared to the unfiltered black signal. Thecutoff frequency fc was set to 0.01.

Figure 39: Focal distance calculated with accommodation effect.

39

Page 53: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.4. Gaze-dependent tone mapping

Figure 40: Focal distance calculated with accommodation effect (zoomed).

3.4 Gaze-dependent tone mapping

Many tone mapping algorithms for digital images use a global approach which operates on thewhole HDR image. Gaze-dependent tone mapping only operates on the part of the image in thefocus zone of the observer. We use the algorithm described in [32] which is based on the tonemapping algorithm introduced in [23]. The algorithm in [23] simulates the Zone System (seechapter 2.3) by approximating the middle gray by first calculating the log-average luminanceof the HDR image, see Equation 33:

LHDR =1N

exp

(

ÿ

x,y

log(δ + LHDR(x, y))

)

, x, y P HDRGlobal (33)

The log-average luminance is denoted by LHDR. HDR is the HDR image while LHDR(x, y)denotes the luminance value in the image for pixel (x,y). The total number of pixels in the HDRimage is denoted by N, δ is a small positive value to avoid singularity if the luminance of a pixelis pure black and HDRGlobal denotes all pixels in the image. The log-average luminance is thenmapped to the middle gray of the displayed image using Equation 34:

L(x, y) =a

LHDRLHDR(x, y), a P [0, 1] (34)

L(x, y) is the scaled luminance and a is a scale factor that allows the log average of the luminanceto be mapped to different values of a. To correct the issue of high luminance appearing inhighlights, a tone mapping operator compressing high luminance is used to calculate the finalluminance for the display [23], see Equation 35:

L f (x, y) =L(x, y)

1 + L(x, y)(35)

The algorithm in [32] follows Equation 33-35 but calculates the log-average luminance of theimage only from the pixels in the focus zone centered on the gaze point. The process of thealgorithm is as follows:

40

Page 54: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.4. Gaze-dependent tone mapping

1. Approximate the middle gray by calculating the log-average luminance of the HDR imagein the focus zone at frame t. This log-average luminance is denoted by PHDR(t).

2. Approximate the middle gray using the global method in [23] by calculating the log-average luminance of the whole HDR image.

3. If the middle gray from the focus zone is smaller than the middle gray of the whole imagemultiplied by a scaling factor use the middle gray from the focus zone and proceed to step4-6. If it is larger, use the middle gray of the whole image and calculate the final luminancefrom Equation 34-35.

4. Calculate the sum of log average luminance of several frames with a weight function toavoid a cascade rendering artifact. This sum is denoted by PHDR(t).

5. Map the sum to the middle gray and compress high luminance by substituting LHDR withPHDR(t) in Equation 34 and then use Equation 35.

6. Normalize the final luminance.

This algorithm is illustrated in Figure 41.

Figure 41: Flow chart of gaze dependent tone mapping in Frostbite 3.

The focus zone centered on the gaze point is created in the same way as in the algorithm forgaze-dependent DOF (see chapter 3.3). In step 1, PHDR(t) is calculated from Equation 36:

PHDR(t) =1

N1exp

(

ÿ

x,y

log(δ + LHDR(x, y, t))

)

, x, y P FZeye (36)

where FZeye denotes all pixels in the focus zone and N1 denotes the number of pixels in it. Instep 2 we use the global tone mapping method in [23] to calculate the log average luminance ofthe whole image. In step 3 a constraint is imposed to decide whether the global or local tonemapping method is used. This is because the global method approximates the middle gray ofthe scene better than the local method [32]. This constraint yields Equation 37:

PHDR(t) =

#

PHDR(t) PHDR(t) ă LHDR ¨ σ

LHDR ¨ σ otherwiseσ ą 1

(37)

where LHDR is as previously defined in Equation 33 and σ is a constant. Equation 37 removes thesliding range of the middle gray. Without the constraint the middle gray mapped to the displayimage grows in a linear fashion which makes it very high when looking at high luminanceregions in a scene. This results in poor behavior in darker regions because they become evendarker (see Figure 42 (b) and Equation 34). With the constraint the tone mapping operator hasa logarithmic behavior similar to the global method which converges to a lower value for themiddle gray being mapped (see Figure 43).

41

Page 55: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.4. Gaze-dependent tone mapping

Figure 42: The tone mapping operator (a) [32] uses the constraint from Equation 37 while(b) [37] does not use it. We see that (b) has poor quality from pure black regions when gazingat a high luminance region (the sun). In (a) we get better quality since we still see details inlower luminance regions similar to the global method in (c) [23]. Image courtesy to [32].

Figure 43: Without the constraint in step 3 the tone mapping operator gives large values forthe log-average luminance in high luminance regions of the scene as seen in the red curve.Using the constraint gives the tone mapping operator a logarithmic behavior similar to theglobal method. Image courtesy to [32].

If the algorithm picks the middle gray of the focus zone, we proceed to step 4 where we sumthe middle gray of the current frame together with those from previous frames to get PHDR(t).This is shown in Equation 38:

PHDR(t) = PHDR(t) ¨ W(t) + PHDR(t ´ 1) ¨ W(t ´ 1) + . . .+ PHDR(t ´ n) ¨ W(t ´ n)

(38)

where W is a weighting function and n is the number of previous frames summed together. Instep 5 we substitute LHDR with PHDR(t) in Equation 34 and Equation 35 and then normalize L f

with Equation 39:

Ld(x, y, t) =L f (x, y, t) ´ Lmin

Lmax ´ Lmin(39)

The post processing system in Frostbite 3 only provided the HDR texture in RGB values insteadof the luminance for each pixel so the luminance of the texture had to be calculated before thegaze-dependent tone mapping algorithm could be used. This was done by first storing theRGB values for each pixel in an auxiliary buffer. Then for each pixel in the buffer the relative

42

Page 56: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.4. Gaze-dependent tone mapping

luminance was calculated from the RGB color based on the ITU-R BT.709 standard for monitorsusing Equation 40:

Y = 0.2126R + 0.7152G + 0.0722B (40)

Y is the relative luminance and R, G and B are the color components of the pixel. In Equation 38a linear weight function is used in [32] when calculating the sum of middle grays from differentframes. In this implementation the non-linear function in Equation 41 was used since it gavebetter results on the scene being tested:

W(t) =1

(1 + t) ¨ c, c ą 0 (41)

t denotes the frame and c is a constant. In Figure 44 and 45 we see the results from gaze-dependent tone mapping. In Figure 44 the gaze area is on darker regions of the scene and we seethe details in those regions while bright regions (in the far background) are overexposed. This issimilar to the behavior of the HVS where the iris of the eyes turn bigger to gather light to allowthe eye to see details in the dark. This makes light sources become extremely bright since the eyebecomes overexposed from their brightness as the iris is wide open. Similarly in Figure 45, thegaze area falls on a region with high luminance and the gaze-dependent tone mapping adjuststhe scene to allow us to see details in the bright region while the darker regions become darker.In Figure 46 and 47 we see gaze-dependent tone mapping with and without the constraintintroduced in [32]. We see a clear difference between the two scenes where the scene in Figure47 is much darker because a much higher middle gray was used when mapping the luminance.With the constraint the global method gave a lower middle gray which helped preserving someof the details in the darker regions much better when gazing at regions with high luminance.In our examples we used a focus zone set to 127x127 pixels to better demonstrate the effects ofthe system. In Figure 44 and 45, σ was set to 20 in the constraint, c was set to 8 in our weightfunction and the number of past frames n was set to 9.

Figure 44: Gaze-dependent tone mapping when gazing at regions with low luminance. Inthis scene we see details in low luminance regions clearly while high luminance regions areoverexposed.

43

Page 57: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.4. Gaze-dependent tone mapping

Figure 45: Gaze-dependent tone mapping when gazing at high luminance regions. We seedetails in the high luminance regions while low luminance regions becomes darker.

Figure 46: Gaze-dependent tone mapping with the constraint in [32] being used. This methodpreserves more details in the regions with low luminance while gazing at regions with highluminance.

44

Page 58: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

3.4. Gaze-dependent tone mapping

Figure 47: Gaze-dependent tone mapping without the constraint used. The luminance indark regions becomes lower.

3.4.1 Scene dependency

The gaze-dependent tone mapping described in the previous chapter suffers from scene depen-dencies [32]. From the denominator in Equation 34 we see that gazing at dark regions mightlower the luminance of the scene if the middle gray of the focus zone LHDR is large enough.A region appearing dark to the player could still have luminance high enough to give a largemiddle gray which makes darker and brighter regions even darker. Similarly gazing at a brightregion might give a middle gray small enough to make brighter and darker region brighter.Both outcomes give us the opposite of the intended effect (see Figure 44-45). In one scene wecould have middle gray giving satisfactory results while in another scene it gives unsatisfy-ing results. One way of solving this is to control the middle gray of the scene by using a weightfunction (see Equation 38) that scales the middle gray so it adapts well to the entire scene. How-ever, this is very inefficient because not only is it hard to find a weight function that fulfills thiscondition but it does not guarantee it will work well with the luminance of a different scene.For example, the scene seen in Figure 44-45 first used the weight function shown in Equation42:

W(t) =1

(1 + t)(42)

where t is the frame similar to Equation 41. The problem with this weight function was that forsome dark regions of the scene the luminance gave a very low middle gray value which was acorrect behavior while in other dark regions it gave a very large middle gray giving incorrectresults, and this led to the weight function in Equation 41 being preferred. The dependency onthe scene luminance and weight function makes this method scene-referred. A better optionwould be having a general gaze-dependent tone mapping method that does not have a strictdependency on the scene luminance.

45

Page 59: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

4 Discussion

This chapter discusses which filters that were practical and if the gaze-dependent DOF and tonemapping increased the immersion of the gameplay.

4.1 Eye tracker filters

Four filters were implemented to improve the gaze precision of the eye tracker.The AF has good noise attenuation but very poor edge sharpness which leads to high track-

ing latency when saccades occur. It is also a poor choice for the hardware regarding memoryconsumption and computational efficiency.

The moving average filter improves the computational efficiency thanks to the recursivebehavior but still suffers from the other drawbacks found in the average filter.

The LPF gives good noise attenuation, improved memory consumption and improved com-putational efficiency. The drawback is that the filter does not differentiate between fixationsand saccades. This restricts the client to a filter with strong noise attenuation and high trackinglatency making it poor for saccades or a filter with low tracking latency and poor noise atten-uation for fixations. None of the two filters are suitable in games where scanning the sceneand fixating at different points of interests are common actions. Using dynamic filters solvesthis problem; SF and OCF are both recursive dynamic filters basing their noise attenuation onfixations and saccades.

In the implementation of OCF the noise is strongly attenuated during fixations because ofthe small mean difference between two means of the latest N samples. In SF this corresponds toa short distance calculated between the input of the current time instance and the filtered outputof the previous time instance. When a saccadic eye movement occurs an alarm is triggered inOCF and the time constant is set to a small value so the filtered output avoids lagging behindthe new steady state of the signal which keeps the tracking latency low. In SF this correspondsto a large distance between two points. Both filters have the same behavior when fixations andsaccadic eye movements occur. Figures 48 and 49 show a comparison between the two filters.The parameters for the filters are the same as the ones used in Figure 22-23 (OCF) and Figure29-30 (SF). Both filters output smooth signals during fixations and maintain sharp edges duringsaccades. In Figure 48 we see that SF is slightly slower than OCF when saccades occur but thiscan be improved by adjusting the value of the scale factor in Equation 22 by setting it closerto the value of one. Both filters show good quality for fixations and saccades. However, OCFcauses occasional spikes during fixations because of false alarms (see Chapter 3.2.4.1). This isdistracting during fixations because the gaze point occasionally moves away from the point ofinterest. The SF removes this issue at the expense of more expensive operations. However, withbetter control during fixations and similar gaze precision performance it gives a much betterexperience. Moreover, the filter is much more straightforward to implement and together thesefactors makes it the preferred filter.

46

Page 60: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

4.2. Gaze-dependent Depth of Field

Figure 48: Spatial Filter and Online Cursor Filter.

Figure 49: Spatial Filter and Online Cursor Filter (zoomed).

4.2 Gaze-dependent Depth of Field

Gaze-dependent DOF was tried on several scenes in Battlefield 4. The participants were em-ployees of EA DICE. Some of the participants had worked on Battlefield 4 and were very fa-miliar with the scenes while others from teams such as Frostbite were less familiar. The generalconsensus among the participants was that the outcome improved the immersion and fun as-pect of the game but with some drawbacks that took away from the overall experience. Allow-

47

Page 61: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

4.3. Gaze-dependent tone mapping

ing the player to change the focal distance to anywhere in the scene and thereby removing theconstraint of a static focal distance was seen as an improvement to the realism and immersionof the game, and the new level of interactivity from therein was considered improving the funaspect of the game. This is similar to results of experimental evaluations done in other studiesin gaze-dependent depth of field where the eye tracking system was shown to have a stronginfluence on immersion and fun interactivity [22].

One of the early problems was using the implementation without the accommodation phe-nomenon (see chapter 3.3.1). This leads to an instant change of the focal distance and partic-ipants felt they perceived little to no effect when changing their focus in the scene. With theaccommodation phenomenon participants felt more aware of the focus change which they per-ceived influencing the immersion. Another drawback which all participants experienced wasthe gaze accuracy. Although a focus zone was used to account for the error in the gaze accuracy,participants had difficulties in focusing on smaller objects. This broke some of the immersionsince they felt that the gaze feedback on the monitor did not exactly match their actual gazesince their point of interest in the scene would be out of focus.

One of the biggest drawbacks with gaze-dependent DOF which was not simulated in thescenes was the problem of knowing what the player was gazing at. This is very difficult andproblematic in scenes where focus is required on objects close to each other because it is im-possible for the application to know which object the client is truly gazing at. This scenario isillustrated in Figure 50. In the figure a focus zone centered on the gaze point measured by theeye tracker is created in response to the error in the gaze accuracy. The figure shows that theclient is actually gazing at the cogwheel while the focus zone says the client could be gazingat any of the objects. One way of improving this is using semantic weighting where objects ofmore importance are given more weight in the calculation of the focal distance [18]. However,this assumes all objects are not equally important and assuming all objects in Figure 50 are ofequal importance means semantic weighting will not work in this particular scene. The problemof knowing what the player is gazing at is one of the biggest restrictions with gaze-dependentDOF.

Figure 50: Gaze accuracy and gaze-dependent DOF.

4.3 Gaze-dependent tone mapping

Gaze-dependent tone mapping was tried on two scenes in Battlefield 4. The same participantsin gaze-dependent DOF were used when trying this implementation. Overall participants had

48

Page 62: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

4.3. Gaze-dependent tone mapping

a very positive experience with the effect and felt a closer resemblance to how the HVS adaptto the luminance conditions increased the realism and the immersion of the game.

The HVS adaptation to dark environment in the real world takes minutes to happen, andsimulating the real speed would be impractical in a game where things need to be responsiveand happen fast. For this reason the adaptation in the game happens significantly faster. Partic-ipants did not feel this took away from the immersion or the increased realism, and felt it wasmore practical to give the player immediate feedback. The implementation only had a notice-able effect on the first scene that was tried. The second scene had a different environment wherethe luminance conditions barely changed regardless of where the player was gazing. The sameproblem existed to a much lesser extent in the first scene where gazing at certain areas wouldhave a less profound effect on the overall scene luminance. This is due to the method beingscene dependent which can cause the middle gray of the focus zone to give the opposite effect.This scene dependency also affects the size of the focus zone. Compared to gaze-dependentDOF which used a 31x31 focus zone this method used a 127x127 focus zone. Using a 31x31 fo-cus zone did not yield any strong luminance adaptation causing the scene luminance to changevery little. When the larger focus zone was used, luminance adaptation was stronger whichdemonstrated the effect better. This is very similar to the experimental evaluation done in [32]where different sizes for the focus zone were chosen to get the best result for each HDR im-age. Although the large focus zone demonstrates the effect better it also contradicts the gaze-dependency of the method since it should be based on the gaze accuracy. As such it makes themethod less adaptive to the local area of the gaze. The dependency on the middle gray and thesize of the focus zone makes this implementation extremely restricted to the game since bothfactors would need to be adjusted independently of the scene luminance for each scene.

49

Page 63: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

5 Conclusion

We have shown how gaze interaction can be used in different ways to interact with a game.We showed how we integrated the eye tracker in Frostbite 3 by creating an interface which wasused when implementing a concrete class for any eye tracker device (in our case the Tobii X2-30Eye Tracker). We have seen how measurement noise affects the gaze precision and gaze accu-racy of the eye tracker. To improve the gaze precision we implemented four filters, each withtheir advantages and disadvantages. With regards to noise attenuation, tracking latency andhardware efficiency, it was shown that dynamic filters were preferred. We implemented gazedependent DOF and showed how it was used to control the focal distance. This allowed us tosimulate the natural behavior of the HVS or a camera where the focal distance is changed afterthe user’s gaze is in the virtual world. In our implementation we also used a focus zone to makeup for the gaze accuracy of the eye tracker. We also used a LPF to simulate the accommodationphenomenon of the eyes. Finally we implemented gaze-dependent tone mapping to simulatehow the HVS adjusts itself to new luminance conditions in the environment. Like gaze depen-dent DOF we used a focus zone to calculate the middle gray of the scene before calculatingthe new luminance which showed how the gaze of the user affected the exposure of the scene.We also discussed how our implementation of gaze-dependent tone mapping has a big scenedependency, therefore making it a poor choice for games.

5.1 Future work and improvements

The gaze can be used to interact with games in many other ways. Rendering DOF and tonemapping are two of many rendering techniques that can use gaze integration. The gaze of theplayer can also be used in non-rendering related areas in a game such as the game logic itself.Some of these are discussed below.

5.1.1 Foveated rendering

Large-scale polygon mesh assets are costly to render interactively in games. Games using manypolygon meshes often utilize techniques such as level of detail (LOD) to simplify the complexityof the meshes to keep the renderer efficient. For example, meshes far away from the player arerendered with fewer polygons. This removes some of the workload on the renderer while stillretaining the visual fidelity of the meshes due to the distance. To integrate gaze into this, onecould use the gaze direction in a foveated renderer. In a foveated renderer the region of thescene falling in the foveal vision is rendered with full complexity while regions with a falloffof visual acuity, namely the parafoveal and peripheral vision, use a polygonal simplificationscheme. This technique has been used in model-based LOD frameworks using view-dependentpolygonal simplification models to improve the efficiency of the renderer. For example it is usedwhen mesh simplification is tested against a contrast sensitivity function (CSF) [38]. Anotherexample is when the visual angle between the gaze direction and the vertices of a polygonin a base mesh is calculated, and then used in a visual acuity degradation function to decidewhether the triangle should be subdivided to create a higher resolution [39].

50

Page 64: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

5.1. Future work and improvements

In addition to model-based foveated renderers the gaze direction can be used in hybridimage-/model-based techniques where ray casting is used to sample an intermediate meshbetween the eye and the scene geometry [40]. The rendering efficiency in this technique comesfrom the ray distribution of the ray casting technique. The distribution is dictated by the CSFwhere it decreases as the visual angle from the gaze direction increases (which leads us closerto the parafoveal and peripheral regions of our view). Similarly the distribution of rays per unitarea is decreased in volume rendering as the visual angle from the gaze direction increases [41].In ambient occlusion, foveated rendering is used to decrease the number of rays sampled fromevery point on the hemisphere surrounding a point on the surface the further away the point isfrom the foveal vision. The shadows of gaze-dependent ambient occlusion have been shown toretain much of their original full quality while the performance of the renderer was improvedby 276 % [42].

NVIDIA created a foveated renderer using SensoMotoric Instruments head mounted eyetracker device to improve their render performance by rendering the image outside the fovealvision with lower quality [43]. In their implementation they explored artifacts arising in theperipheral vision from existing foveated renderers such as tunnel visions, flicker and blur. Byapplying post-process contrast preservation combined with blur their foveated renderer coulduse twice as much blur before noticing differences between foveated and non-foveated images.Their technique showed that even with the number of shades reduced by 70 % they could stillclosely match their perceptual target.

In conclusion, using an eye tracker allows us to implement a foveated renderer which re-duces the complexity of the scene as we move away from the foveal vision and this simplifica-tion scheme helps improving the performance of the renderer.

5.1.2 Maladaptation with gaze-dependent tone mapping

The biggest drawback with the gaze-dependent tone mapping in chapter 3.4 was the scene-dependency which made it extremely limited for dynamic virtual environments. Instead ofusing an approach with a focus zone which the middle gray is calculated from a maladaptationmodel can be simulated instead and used in a tone compression algorithm. Maladaptation oc-curs when the gaze direction is changed while the HVS is trying to adapt to different luminanceranges. Since the speed of adapting to different luminance ranges is slower than the speed ofchanging gaze direction the HVS never finishes its current luminance adaptation before jump-ing to another one, thereby never reaching its target [29]. This simulates the HVS much moreaccurately making it more applicable for interactive virtual environments.

5.1.3 Gaze interaction with game logic

Eye tracking can be used in many ways to interact with game logic. For example, the gazecould be used to interact and navigate in a game’s menu. This would allow players to multitasksuch as controlling their character while interacting with a menu at the same time, and it wouldalso allow players to access menu options much faster by just looking at the options to selectthem. Another way of using the gaze with game logic is to integrate the gaze with AI logic. Forexample, if you are gazing at a NPC in a virtual town they could be made to behave in a certainway in response to your gaze such as getting worried or aggressive.

51

Page 65: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

References

[1] Tobii Technology AB, "Tobii Eye Tracking An introduction to eye tracking and Tobii EyeTrackers," March 2014.

[2] Tobii Technology AB, "Accuracy and precision test method for remote eye trackers,"https://www.tobiipro.com/siteassets/tobii-pro/

accuracy-and-precision-tests/tobii-accuracy-and-precisiontest-method-version-2-1-

pdf/?v=2.1.1;, August 2017.

[3] Tobii Technology AB, "Specification of Gaze Accuracy and Gaze Precision, Tobii X2-30 EyeTracker,"https://www.tobiipro.com/siteassets/tobii-pro/

technical-specifications/tobii-pro-x2-30-technical-specification.

pdf/?v=1.0;, August 2017.

[4] Hornof A.J. and Halverson T., "Cleaning up systematic error in eye-tracking data by usingrequired fixation locations," Behavior Research Methods, Instruments and Computers 34(4):592-604, 2002.

[5] Holmqvist K., Nyström M., Andersson R., Dewhurst R., et al., "Eye Tracking: A Compre-hensive Guide To Methods And Measures," 2011.

[6] Smith S.W., "Digital Signal Processing: A Practical Guide for Engineers and Scientists,"Newnes, ISBN 9780750674447, 2003.

[7] "A Robust 3D Eye Gaze Tracking System Using Noise Reduction," Proceedings of the 2008Symposium on Eye Tracking Research and Applications, 2008, Jixu Chen, Yan Tong, WayneGray and Qiang Ji. , Savannah, Georgia, 189-196.

[8] Klingner J., Kumar R., and Hanrahan P., "Measuring the Task-Evoked Pupillary Responsewith a Remote Eye Tracker,":69-72, 2008.

[9] Duchowski A.T., "Eye movement analysis,"Eye Tracking Methodology: Theory and Prac-tice, Springer-Verlag New York, Inc, Secaucus, NJ, USA, 2007.

[10] Larsson G., "Evaluation Methodology of Eye Movement Classification Algorithms," M.Sthesis, School of Engineering Physics, KTH, 2010.

[11] Olsson P., "Real-time and offline filters for eye tracking," M.S thesis, Electrical Engineering,KTH, 2007.

[12] Tobii Technology AB, "The Tobii I-VT Fixation Filter," January 2014.

[13] Lauwereyns J., "Free Viewing,"Brain and the Gaze: On the Active Boundaries of Vision,MIT Press, 2012, 1-38.

[14] Besner D., "Fovea VS Parafovea: A definition,"Basic processes in reading : visual wordrecognition, L. Erlbaum Associates, Hillsdale, N.J, 1991, 199-200.

[15] Snyder H.L., "The Visual System: Capabilities and Limitations," In: Tannas L., editor. Flat-panel displays and CRTs, Van Nostrand Reinhold, New York, 1985, 54-69.

52

Page 66: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

References

[16] Demers J., "Depth of Field: A Survey of Techniques," In: Fernando R., editor. GPU Gems:Programming Techniques, Tips and Tricks for Real-Time Graphics, Pearson Higher Educa-tion, 2004.

[17] Riguer G., Tatarchuk N., and Isidoro J., "Real-Time Depth of Field Simulation," In: F. EngelW., editor. Shaderx2: Shader Programming Tips & Tricks With Directx 9, 2003, 529-554.

[18] Hillaire S., Lecuyer A., Cozot R., and Casiez G., "Depth-of-Field Blur Effects for First-Person Navigation in Virtual Environments," IEEE Computer Graphics and Applications28(6):47-55, 2008.

[19] "A Lens and Aperture Camera Model for Synthetic Image Generation," Proceedingsof the 8th Annual Conference on Computer Graphics and Interactive Techniques,1981, Michael Potmesil and Indranil Chakravarty. , Dallas, Texas, USA, 297-305,doi:10.1145/800224.806818.

[20] Kenny A., Koesling H., Delaney D., Mcloone S., et al., "A preliminary investigation intoeye gaze data in a first person shooter game,", 2017.

[21] Mantiuk R., Bazyluk B., and Tomaszewska A., "Gaze-Dependent Depth-of-Field EffectRendering in Virtual Environments," In: Ma M., Fradinho Oliveira M., and MadeirasPereira J., editors. Serious Games Development and Applications: Second InternationalConference, SGDA 2011, Lisbon, Portugal, September 19-20, 2011. Proceedings, SpringerBerlin Heidelberg, Berlin, Heidelberg, 2011, 1-12.

[22] "Using an Eye-Tracking System to Improve Camera Motions and Depth-of-Field Blur Ef-fects in Virtual Environments," 2008 IEEE Virtual Reality Conference, 2008, S. Hillaire, A.Lecuyer, R. Cozot and G. Casiez. , 47-50.

[23] "Photographic Tone Reproduction for Digital Images," Proceedings of the 29th An-nual Conference on Computer Graphics and Interactive Techniques, 2002, Erik Rein-hard, Michael Stark, Peter Shirley and James Ferwerda. , San Antonio, Texas, 267-276,doi:10.1145/566570.566575.

[24] Adams A., "The camera," Little, Brown, Boston, ISBN 978-0821221846}, 1995.

[25] Adams A., "The negative," Little Brown, Boston, ISBN 978-0821221860}, 1995.

[26] Adams A., "The print," Little, Brown, Boston, ISBN 978-0821221877}, 1983.

[27] Prindle D., "HDR as easy as 1, 2, 3: A beginner’s guide to High Dynamic Range photogra-phy,"https://www.digitaltrends.com/photography/what-is-hdr-beginners-guide-to-high-dynamic-

September 2017.

[28] Reinhard E., Ward G., Pattanaik S., and Debevec P., "High Dynamic Range Imaging: Ac-quisition, Display, and Image-Based Lighting (The Morgan Kaufmann Series in ComputerGraphics)," Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, ISBN 0125852630,2005.

[29] Mantiuk R. and Markowski M., "Gaze-Dependent Tone Mapping," In: Kamel M. andCampilho A., editors. Image Analysis and Recognition: 10th International Conference,ICIAR 2013, Povoa do Varzim, Portugal, June 26-28, 2013. Proceedings, Springer BerlinHeidelberg, Berlin, Heidelberg, 2013, 426-433.

[30] Hood D.C., "LOWER-LEVEL VISUAL PROCESSING AND MODELS OF LIGHT ADAP-TATION," Annual Review of Psychology 49(1):503-535, 1998.

[31] Miyata K., Saito M., Tsumura N., Haneishi H., et al., "Eye Movement Analysis and itsApplication to Evaluation of Image Quality," ITE Technical Report 22.3:17-20, 1998.

53

Page 67: Mohamed Al-Saderliu.diva-portal.org/smash/get/diva2:1315170/FULLTEXT01.pdf · Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University

References

[32] Yamauchi T., Mikami T., Ouda O., Nakaguchi T., et al., "Improvement and Evaluationof Real-Time Tone Mapping for High Dynamic Range Images Using Gaze Information,"6468:440-449, 2010.

[33] "IEEE Standard for Binary Floating-Point Arithmetic," ANSI/IEEE Std 754-1985, 1985.

[34] Haugen F., "Derivation of a Discrete-Time Lowpass Filter,"http://techteach.no/simview/lowpass_filter/doc/filter_algorithm.

pdf, September 2017.

[35] Jones D., "Numerical Approximations,"https://math.la.asu.edu/~dajones/class/275/ch2.pdf, September 2017.

[36] Hindriksen V., "How expensive is an operation on a CPU?"https://streamhpc.com/blog/2012-07-16/how-expensive-is-an-operation-on-a-cpu/,September 2017.

[37] "Real-time tone-mapping of high dynamic range image using gazing area information,"Proc. International Conference on Computer and Information, 2010, T. Mikami, K. Hirai,T. Nakaguchi and N. Tsumura.

[38] "Perceptually Driven Simplification for Interactive Rendering; IEEE Standard for BinaryFloating-Point Arithmetic," Proceedings of the 12th Eurographics Conference on Render-ing, 2001;, David Luebke and Benjamin Hallen. , London, UK, 223;-234.

[39] Murphy H. and T. Duchowski A., "Gaze-Contingent Level Of Detail Rendering," 2001,2001.

[40] "Hybrid Image-/Model-based Gaze-contingent Rendering," Proceedings of the 4th Sym-posium on Applied Perception in Graphics and Visualization, 2007, Hunter Murphy andAndrew T. Duchowski. , Tubingen, Germany, 107-114.

[41] "Gaze-directed Volume Rendering," Proceedings of the 1990 Symposium on Interactive 3DGraphics, 1990, Marc Levoy and Ross Whitaker. , Snowbird, Utah, USA, 217-223.

[42] Mantiuk R. and Janus S., "Gaze-Dependent Ambient Occlusion," In: Bebis G., Boyle R.,Parvin B., Koracin D., et al., editors. Advances in Visual Computing: 8th InternationalSymposium, ISVC 2012, Rethymnon, Crete, Greece, July 16-18, 2012, Revised Selected Pa-pers, Part I, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, 523-532.

[43] Patney A., Salvi M., Kim J., Kaplanyan A., et al., "Towards Foveated Rendering for Gaze-tracked Virtual Reality," ACM Trans.Graph. 35(6):179:1-179:12, 2016.

54