SNAG: Spoken Narratives and Gaze Dataset · SNAG: Spoken Narratives and Gaze Dataset Preethi...

SNAG: Spoken Narratives and Gaze Dataset Preethi Vaidyanathan1, Emily Prud’hommeaux2, Je� B. Pelz3, Cecilia O. Alm3

1LC Technologies, Inc. Fairfax, Virginia, USA, 2Boston College, Boston, Massachusetts, USA3Rochester Institute of Technology, Rochester, New York, USA

• Multimodal data useful to understand human perception

• No publicly available dataset with co-collected spoken narration and gaze information during naturalistic free viewing

• Unique multimodal dataset comprised of co-captured gaze and audio data, and transcriptions for the language and vision communities

• Application of SNAG to visual-linguistic annotation framework (Vaidyanathan et al. 2016) to label image regions

• 30 American English speakers, 18-25 yrs old, 13 female & 17 male • 100 general-domain images selected from MSCOCO dataset• DR-100MKII TASCAM with lapel microphone• SMI Eye-Tracker RED250, remote eye tracker running at 250Hz• Modi�ed Master-Apprentice to elicit rich details• “Describe the action in the images and tell the experimenter what is happening.”

Conclusions and Future Work

RegionLabeler: Image Annotation Tool

Multimodal DatasetBackground Labeling Images via Multimodal Alignment

Data Collection

• Visual-linguistic alignment framework independent of the type of images or expert observers.

• It can serve researchers in computer vision, computational linguistics, psycholinguistics, and others.

• Co-collect modalities such as facial expressions, galvanic skin response, or other biophysical signals with static and dynamic visual materials.

• Unique and novel resource for understanding how humans view and describe scenes with common objects.

References

• Transcripts generated with IBM Watson STT (WER ~5%)• Fixations represented using green circles, radius indicates fixation duration• Green lines represent saccades

• Wide range of type-token ratio corresponds to range of image complexity.• Overall mean type-token ratio (0.75) shows substantial lexical diversity.

• Alignments from framework compared against 1-sec delay baseline • Best AER=0.54 using MSFC vs. baseline AER=0.64

• Alignments generated via Berkeley aligner used for machine translation

Ironman

Plates

Reference alignments Mean shift �xation clustering (MSFC)

Adaptive k-means Gradient segmentation (GSEG)

cake cake

camera

plates plates

plates

platesplates

sunglasses

sunglassessunglasses

camera

sunglasses

camera

Dataset and tool available at: https://mvrl-clasp.github.io/SNAG/

Lapel mic

Eye-tracker

ASR transcript Eye movements

there’s a female cutting a Kate uh she’s smiling and has sunglasses on her headuh the cake has a picture of uh don’t know who also uh an iron man cake and alcohol maybe champagne uh she is wearing a black tank top uh there are plates and other things on the tableand they seem to be in a bar or something

Mean no. of word tokens

. of w

2360 9030

Vaidyanathan, P., Prud’hommeaux, E., Alm, C. O., Pelz, J. B., and Haake, A. R. (2016). Fusing eye movements and observer narratives for expert-driven image-region annotations. In Proceedings of the Symposium on Eye Tracking and Research Applications, pg 27-34, ACM.

SNAG: Spoken Narratives and Gaze Dataset · SNAG: Spoken Narratives and Gaze Dataset Preethi...

Documents

Transcript of SNAG: Spoken Narratives and Gaze Dataset · SNAG: Spoken Narratives and Gaze Dataset Preethi...

Aggregate Gaze Visualization with Real-time Heatmapsmiriah/publications/gaze-vis.pdf · aggregate gaze by combining gaze points (or ﬁxations) from multi-ple viewers and sacriﬁcing

Happy Birthday, LEVELED BOOK • E Snag! Happy Word Count ...Happy Birthday, Snag! A Reading A–Z Level E Leveled Book Word Count: 120 Happy Birthday, Snag! Visit for thousands of

SNAG Magazine Issue 9

GAZE Magazine

Narratives about the truth of narratives

Gaze Periculoase

SNAGGING REPORT OF A PROPERTY · SNAG 5 Dents to entrance wall SNAG 6 Entrance wall chips SNAG 7 Floor and skirting require ﬁnish SNAG 8 Kick boards needed 11:19 27/03/2018 11:19

Snag It Editor

SNaG WEIGHT - Shark River Mail · PDF file1148 8/0 Snag Snag Weight 8/0 Treble Hook Style 3551 Production 2 1-1/2 43.00 ... Unfortunately the Do-It Corporation cannot produce custom

Interactions between gaze-evoked blinks and gaze shifts in ...

Snag Dynamics in the PN, WC, BM, EC, and SO variants · PDF fileSnag Dynamics in the PN, WC, BM, EC, and SO variants The snag fall, snag decay, ... of FFE, based on work by Kim Mellen,

Wildlife implications across snag treatment types in …lakestatesfiresci.net/docs/LSFSC-October2017Webinar-Weiss-Snags...Wildlife implications across snag treatment types in ... Sitta

Virtual Gaze

Visual Narratives Narratives Disturbed

snag list 21-05-2015

Gaze Tracking

Bsa snag presentation_reyes_acosta_12052014

THE STORY OF SNAG 56 - Internet Archive

Gaze vs. Mouse: A Fast and Accurate Gaze-Only Click ...

Gaze palsy