Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3]...

27
EYE GAZE MODELLING FOR ATTENTION PREDICTION Omran Kaddah 7/3/2019 1

Transcript of Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3]...

Page 1: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

EYE GAZE MODELLING

FOR

ATTENTION PREDICTIONOmran Kaddah

7/3/2019

1

Page 2: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

OVERVIEW

Motivation : how Attention prediction

is important for Autonomous Driving.

How the current states of art Machine

Learning tackle the problem.

Datasets & Training

Models

Conclusion

&

What we propose

7/3/2019

2

Page 3: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

MOTIVATION

7/3/2019

3

Page 4: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

MOTIVATION

Driver distraction is one of the main

causes of road accidents [1].

Predicting where the drive should

have his gaze by warnings.

Current eye gaze prediction models are

powerful. However, they still need

more improvement.

Improve accuracy

Decrease false-negatives and false positives rates

7/3/2019

4

Page 5: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

SUPERVISED MACHINE LEARNING

SETTING

Any improvement has to do with either:

❖Data

❖Model

❖The way a model is trained

7/3/2019

5

Page 6: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

ATTENTION PREDICTION

7/3/2019

6

Page 7: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

ATTENTION DATASETS• Datasets used by previous research papers[4][5]

➢ Year 2009 and 2011 respectively.

➢ Few frames.

➢ Lab-settings

• DR(eye)VE [2]➢ in-car settings.

➢ Collected from 74 rides by 8 drivers.❖ 1.0 car and 0.04 pedestrian per fame, 464 braking events

➢ Duration of 6 hours.

➢ Attention maps made by aggregation over temporal frame.

➢ Drawbacks: see next slide

• Berkeley DeepDrive Attention A.K.A. BDD-A[3]➢ in-Lab settings

➢ More in the next slide7/3/2019

7

Page 8: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

ATTENTION DATASETS

Drawbacks DR(eye)VE:Xia et all[3] discussed that stating that DR(eye)VE is single focus, human can have a covert

attention[6]. Also, it included False positive gazes(drivers tend to which are irrelevant for driving

situation[7]).

More critical situations are still needed.

Proposed solution:Xia et all[3] provided the solution with BDD-A dataset and how to train on it.

• Protocol that uses crowd-sourced driving videos

With 45 gaze providers, who played driving instructor task.

• Visual cues simultaneously demand attention (Psychological studies[8][[9])

→ aggregating and smoothing gazes of independent observers to make attention maps.

• Driving situations collected from 1,232 rides in more crowded area,

0.25 pedestrians and 4.4 cars per frame. In addition to 3x more braking events

compared to DR(eye)VE.

7/3/2019

8

Page 9: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

THE MODEL

The Model proposed by Xia et all[3] predicts the driver

attention map for a video frame given the current

and previous one.

Visual feature processing module outputs each frame

into LSTM.

Dropout layers were also used.

Output is 64x36 (8)grids of probability distributions

Cross entropy as a loss functions

Xia, Y., Zhang, D., Kim, J., Nakayama, K., Zipser, K. and Whitney, D., 2017. Predicting Driver Attention in Critical Situations. arXivpreprint arXiv:1711.06406.

7/3/2019

9

Page 10: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

TRAINING

Problem: more important situations during driving are rare. And the loss is same when making error in these rare important incidences.

We need to find a way to detect those situation, and sample more of them. But how?

BDD-A has higher rate of pedestrians, cars, and braking events. But still that is not enough.

https://www.flickr.com/photos/stretchybill/5723445927

7/3/2019

10

Page 11: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

TRAINING

Proposed in [3] Compute the mean attention field, and then find Kullback–

Leibler divergence between the mean attention field(distribution) and the

current frame. The result is the sampling weight for the given frame.

𝐾𝐿(𝐹 𝑥 ||𝐹 𝑥 ) =

𝑃𝐼𝑋𝐸𝐿

𝐹 𝑝𝑖𝑥𝑒𝑙 log(𝐹 𝑝𝑖𝑥𝑒𝑙

𝐹(𝑝𝑖𝑥𝑒𝑙))

𝐹/𝐹: 𝑥 → 0,1 , 𝑥 ∈ [0, 𝑑𝑖𝑚𝑋 × 𝑑𝑖𝑚𝑌]

Sequences are now sampled at probabilities proportional to the sequence

sampling weights.7/3/2019

11

Page 12: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

RESULTS

https://github.com/pascalxia/driver_attention_prediction

7/3/2019

12

Page 13: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

RESULTS

https://github.com/pascalxia/driver_attention_prediction

7/3/2019

13

Page 14: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

RESULTS

https://github.com/pascalxia/driver_attention_prediction

7/3/2019

14

Page 15: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

RESULTS

7/3/2019

15

[3]

Page 16: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

EYE-GAZE MODELLING

7/3/2019

16

Page 17: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

DATASETS

MSP-Gaze corpus [10]

❖ 46 participants from different ethinicties

EYEDIAP[11]

MPIIGaze[12]

7/3/2019

17

Page 18: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

DATA SETS

https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-gaze.html

7/3/2019

18

Page 19: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

MODEL

Jha and Busso[13] proposed a model that user calibration, or

invasive equipments, inspired by model in [14].

❖ Inputs: image of both eyes, both eyes give information about

the head rotation[10]

❖Model is CNN, see next slide for the architecture

Output 2D visual map describing the probability of the gaze

direction, formulating the problem as a classification problem.

Normally it is a regression. Regression by classification.

Gaussian filter is applied on the ground truth labels and

predictions to solve the cost sensitivity problem

7/3/2019

19

Page 20: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

7/3/2019

20

Page 21: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

RESULTS

7/3/2019

21➢Jha, S. and Busso, C., 2019, May. Estimation of Gaze Region Using Two Dimensional Probabilistic Maps Constructed Using Convolutional Neural Networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3792-3796). IEEE.

Page 22: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

RESULTS

7/3/2019

22

𝑑eye−*,and the distance between

the subject eye pair centre and the

ground truth/estimate gaze point

𝑑eye−mc is the distance between the user

and the monitor centre; 𝑑dmc−estimateis

the distance between the monitor centre and 290 the predicted gaze position; and 𝑑mc−true is the distance between the

monitor centre and the ground truth gaze position.

Page 23: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

PROPOSAL

7/3/2019

23

•Two birds one stone.

•One can simulate different lighting condition on the same time for both models

•Easier to match output of both models as both are in car-driving settings.

Training Models for eye gaze and attention prediction can be done simultaneously.

•In [3] Alex was used, however, though it is simple, it has many parameters, better use modern arch as Mobile-net v2[15] has 12x less parameters, same number of operation, and a better accuracy.

•Use pretrained up sampling layers of semantic-segmentation models for model proposed in [3]

•Using LSTM for eye gaze might help further improvement with learning eye moments pattern.

Use state-of-art NN architectures:

Page 24: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

CONCLUSION

There is always room for improvements.

How what was proposed in [3] overcame the drawbacks of in-

car setting and the bias of the dataset.

Significance of how the output is and the loss function for eye

modelling[13]

7/3/2019

24

Page 25: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

REFRENCES [1] Klauer, S. G., Guo, F., Simons-Morton, B. G., Ouimet, M. C., Lee, S. E., & Dingus, T. A. (2014). Distracted

driving and risk of road crashes among novice and experienced drivers. New England journal of medicine, 370(1), 54-59.

[2]Stefano Alletto, Andrea Palazzi, Francesco Solera, Simone Calderara, and Rita Cucchiara.

“Dr (eye) ve: a dataset for attention-based tasks with applications to autonomous and

assisted driving”. In: Proceedings of the IEEE Conference on Computer Vision and PatternRecognitionWorkshops. 2016, pp. 54–60.

➢ [3] Xia, Y., Zhang, D., Kim, J., Nakayama, K., Zipser, K. and Whitney, D., 2017 . Predicting Driver Attention in Critical Situations. arXiv preprint arXiv:1711.06406.

[4] Ludovic Simon, Jean-Philippe Tarel, and Roland Brémond. “Alerting the drivers about

road signs with poor visual saliency”. In: Intelligent Vehicles Symposium, 2009 IEEE.

IEEE. 2009, pp. 48–53.

[5] Geoffrey Underwood, Katherine Humphrey, and Editha Van Loon. “Decisions about

objects in real-world scenes are influenced by visual saliency before and during their

inspection”. In: Vision research 51.18 (2011), pp. 2031–2038.

[6] Patrick Cavanagh and George A Alvarez. “Tracking multiple targets with multifocal

attention”. In: Trends in cognitive sciences 9.7 (2005), pp. 349–354.7/3/2019

25

Page 26: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

REFERENCES

[7] Andrea Palazzi, Francesco Solera, Simone Calderara, Stefano Alletto, and Rita

Cucchiara.

“Learning where to attend like a human driver”. In: Intelligent Vehicles Symposium (IV),

2017 IEEE. IEEE. 2017, pp. 920–925.

[8] Rudolf Groner, Franziska Walder, and Marina Groner. “Looking at faces: Local and

global aspects of scanpaths”. In: Advances in Psychology. Vol. 22. Elsevier, 1984, pp.

523–533.

[9] SK Mannan, KH Ruddock, and DS Wooding. “Fixation sequences made during visual

examination of briefly presented 2D images”. In: Spatial vision 11.2 (1997), pp. 157–178.

7/3/2019

26

Page 27: Eye gaze modelling for Attention prediction · THE MODEL The Model proposed by Xia et all[3] predicts the driver attention map for a video frame given the current and previous one.

REFERENCES [10]N. Li and C. Busso, “Calibration free, user independent gaze estimation with tensor analysis,” Image and Vision Computing, vol. 74, pp.

10–20, June 2018.

[11] K.A.F. Mora, F. Monay, and J.-M.Odobez, “EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras,” in Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA 2014), Safety Harbor,

FL, USA, March 2014, pp. 255–258.

[12] X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, “MPIIGaze: Real-world dataset and deep appearance-based gaze estimation,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 41, no. 1, pp. 162–175, January 2019.

➢ [13] Jha, S. and Busso, C., 2019, May. Estimation of Gaze Region Using Two Dimensional Probabilistic Maps Constructed Using Convolutional Neural Networks. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3792-3796). IEEE.

[14] S. Jha and C. Busso, “Probabilistic estimation of the gaze region of the driver using dense classification,” in IEEE International Conference on Intelligent Transportation (ITSC 2018), Maui, HI, USA, November 2018, pp. 697–702.

[15]Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L.C., 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4510-4520).

7/3/2019

27