Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... ·...
Transcript of Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... ·...
![Page 1: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/1.jpg)
Voice-ledinteractiveexplorationofaudio Rishi Shukla November 2019 bbc.co.uk/rd
Research & Development
![Page 2: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/2.jpg)
Voice-led interactive exploration of audio
Page 2 of 39 Research & Development
CONTENTSEXECUTIVE SUMMARY .................................................................................................................... 4
ACKNOWLEDGEMENTS ................................................................................................................... 6
INTRODUCTION ................................................................................................................................ 7
DIGITAL AUDIO AND SMART SPEAKERS ................................................................................................... 7
CURRENT LIMITATIONS OF VOICE-LED AUDIO INTERACTION ..................................................................... 8
FUTURE POSSIBILITIES FOR VOICE-LED AUDIO INTERACTION .................................................................... 8
EXISTING BBC R&D WORK RELATED TO THESE FUTURE POSSIBILITIES ..................................................... 9
RATIONALE AND DESIGN DECISIONS .......................................................................................... 11
AIMS AND DESIGN SCOPE .................................................................................................................... 11
SYSTEM ARCHITECTURE ...................................................................................................................... 12
VOICE INTERACTION DESIGN ................................................................................................................ 13
IMPLEMENTATION AND ENGINEERING ....................................................................................... 15
PROTOTYPE REALISATION ................................................................................................................... 15
SOUND DESIGN ................................................................................................................................... 16
BINAURAL SPATIAL AUDIO RENDERING................................................................................................. 18
SMART TECHNOLOGY SIMULATION ...................................................................................................... 19
USER RESEARCH DESIGN .............................................................................................................. 20
PARTICIPANT RECRUITMENT ................................................................................................................ 20
STUDY FORMAT .................................................................................................................................. 20
DATA COLLECTION.............................................................................................................................. 23
FINDINGS......................................................................................................................................... 24
![Page 3: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/3.jpg)
Voice-led interactive exploration of audio
Page 3 of 39 Research & Development
PATTERNS OF USE FOR VOICE-LED CONTENT DISCOVERY ....................................................................... 24
SPECIFIC PROFILE DIFFERENCES FOR VOICE-LED CONTENT DISCOVERY ................................................... 26
SMART HEADPHONES AND SMART SPEAKER – BEHAVIOUR AND USABILITY RATINGS ............................... 28
SMART HEADPHONES AND SMART SPEAKER – CONNECTEDNESS ............................................................ 31
CONCLUSIONS AND RECOMMENDATIONS................................................................................. 33
REVIEW OF RESEARCH AIMS ................................................................................................................. 33
FUTURE WORK .................................................................................................................................... 34
APPENDIX ....................................................................................................................................... 36
REFERENCES ...................................................................................................................................... 36
MENTAL MODEL DRAWINGS ................................................................................................................. 38
![Page 4: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/4.jpg)
Voice-led interactive exploration of audio
Page 4 of 39 Research & Development
Executivesummary
• The dominance of digital audio distribution and rapid growth in smart speaker
technology present new challenges for interactive exploration of audio-only media.
• Smart headphones are an emergent technology whose unique affordances for
accessing, interacting with and displaying content are only starting to be explored.
• This research project developed a working prototype of a voice-led interactive
application for navigating the BBC’s Glastonbury 2019 highlights on smart speaker
and smart headphones, called the Auditory Archive Explorer.
• The voice-led interaction design and two prototype versions were evaluated using a
mixed methods user experience research approach. The two iterations featured
identical content, but alternate modes of audio presentation. Mono playback was used
in the case of the smart speaker. Full binaural synthesis (i.e. static placement of sound
sources in virtual 3D space using head-tracking) augmented the smart headphone
experience.
• Twenty-two people were selected to participate in the research study after an open
call to register interest.
• Analysis of user interaction showed strong evidence that the design supports effective
browsing and discovery of content, with a learning curve that was quickly surmounted
without guidance, irrespective of either prior exposure to voice technologies or first
language. These quantitative patterns were also supported by participants’ self-
reported qualitative ratings on usability. It was notable, however, that frequent smart
assistant users were significantly more favourable about the system’s capacity to
support content discovery than infrequent users (despite these two groups showing
![Page 5: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/5.jpg)
Voice-led interactive exploration of audio
Page 5 of 39 Research & Development
no notable differences in their patterns of interaction or activity). This indicates that
merits of the prototype seemed more evident to those familiar with current voice
interaction design limitations.
• No significant differences were found in the pattern of interactions, activity, or recall
of core system features between users of the two prototype versions, indicating that
the binaural spatial design did not present any practical benefits in this instance.
However, two potential limiting factors in the study design are noted that could have
prevented differences from emerging clearly (namely, the interaction delay inherent
with voice command input and the short duration of the task length). However,
analysis of mental model representations suggests a markedly different character or
nature in headphone users’ interaction and connection with the content.
![Page 6: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/6.jpg)
Voice-led interactive exploration of audio
Page 6 of 39 Research & Development
Acknowledgements
This research was conducted within the Internet Research and Future Services section of
BBC R&D. Many minds contributed towards its development, but the project benefitted
from the specific contribution of the following people: Nicky Birch, for conceiving and
commissioning the research and steering its conceptualisation and focus; Alex Robertson
for guidance on decisions around content and its presentation, including production and
editing of all music and speech featured in the prototype; Holly Challenger and Joanna
Rusznica for significant work on the design, recruitment, management and analysis of the
user experience research. BBC R&D also extends thanks and gratitude to CereProc for
kind permission to use its synthetic voice technology in the prototype for this research.
![Page 7: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/7.jpg)
Voice-led interactive exploration of audio
Page 7 of 39 Research & Development
Introduction
This research project was motivated by recent trends in digital audio consumption
through smart technology. An overview of these factors is presented here to provide
context for the research.
Digital audio and smart speakers
Digital distribution has accounted for more than 50% of UK radio engagement since
2016 – either online or via digital audio broadcasting receivers [1]. In the same year, over
half of global recorded music revenue was generated by download or streaming services
for the first time. The current annual proportion of digital streaming and download
revenue stands at 59% [2], [3]. In parallel, smart speaker technology has quickly gained a
niche but significant footing in the audio technology market, with an estimated 20% of
UK households adopting devices between 2016 and 2019 [4]. In summary, digital delivery
is now firmly established as the dominant mode of audio distribution. Smart speakers are
a fresh and immediate gateway to any form of content within this realm, incorporating
convenient access to live radio, music streaming services and podcast or catch-up
programming.
Although they offer ready access to all forms of digital audio, current trends indicate that
the proportion of smart speaker usage directed towards on-demand content is
comparatively high, if examined against the overall distribution of listening activity. If all
audio devices and sources are considered collectively, 75% of consumption in 2018 was
dedicated to live radio. By comparison, smart speaker engagement attracts usage with a
relatively balanced combination of live audio broadcasting (54%) and request-led
listening (45%). [1]
![Page 8: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/8.jpg)
Voice-led interactive exploration of audio
Page 8 of 39 Research & Development
Current limitations of voice-led audio interaction
Discovering unfamiliar content on audio-only devices is problematic because it relies
entirely on a user’s memory and current mood to motivate listening activity. There are no
external triggers to direct exploration. For on-demand content, in particular, voice-led
search technologies encourage either known-item requests (i.e. “play me this
track/album/artist”), or unscrutinised use of provider-generated collections (i.e. “play me
happy/newly released/1980s tracks”). Recent UK music industry analysis of adoption
noted concerns that smart speakers could encourage listeners towards less engaged
forms of interaction with audio. As acceptance of pre-curated or algorithmically
generated recommendations and playlists increases, so listeners might become
disconnected from individual works and artists that form these compilations. Two
answers to this possible shortcoming are offered. Firstly, it is supposed that use of
“branded” recommendations or playlists with smart speakers caters to types of casual
listeners who, in the past, typically used radio as background anyway. Secondly, it is
suggested that these devices are simply not designed for discovery and exploration,
which will continue through other media and that user data gained there can be used to
populate tailored recommendation lists more effectively. [5]
Future possibilities for voice-led audio interaction
Classing voice-led audio interaction as a reductive or secondary experience would ignore
three potential opportunities encompassed by this technology:
1. As the same industry report goes on to note, speech-delivered search invites the
possibility of more verbose and nuanced queries. These could be harnessed to
achieve better recommendations if semantic analysis algorithms and metadata
structures are suitably designed in future. [5]
![Page 9: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/9.jpg)
Voice-led interactive exploration of audio
Page 9 of 39 Research & Development
2. Interaction design for voice-led technology is still in its infancy. More consideration
can be given to how pre-curated or auto-compiled content might be previewed and
navigated efficiently with voice-controlled, audio-only devices.
3. Voice-controlled devices are not limited to smart speakers. Smart headphones or
“hearables” are an emergent technology that (amongst other features) enable
voice interaction for eyes-free control of a connected smart device. Many
manufacturers now offer headphones or earbuds with inbuilt voice interaction
capability and, increasingly, some form of layered or augmented reality audio,
which blends transmitted audio with real world sounds using adaptive ambient
noise cancellation.1 Specifically, the Bose AR platform further supports motion
detection in enabled hardware, offering the potential to deliver responsive binaural
audio – or surround sound over headphones.2
Existing BBC R&D work related to these future possibilities
The first of the three opportunities identified above is a vast field of research, not simply
within BBC R&D, but for the media industry and academic research institutions
worldwide. The relationship between recommender systems and voice assistants was far
beyond the scope of investigation for this research project. The BBC has been working
prominently in the second area of potential, releasing its first of several services for smart
speakers in 20173 and aiming to launch its own assistant in 20204. These offers for
current audiences have been accompanied in parallel by BBC R&D’s creative investigation
1 For recent overviews of some available smart headphones at the time of writing, see: https://www.wired.co.uk/article/best-wireless-earbuds; https://www.wired.co.uk/article/best-bluetooth-headphones; https://www.wareable.com/hearables/best-hearables-smart-wireless-earbuds 2 https://www.bose.com/en_us/better_with_bose/augmented_reality.html 3 https://www.bbc.co.uk/mediacentre/latestnews/2017/smart-speakers 4 https://www.bbc.co.uk/news/technology-49481210
![Page 10: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/10.jpg)
Voice-led interactive exploration of audio
Page 10 of 39 Research & Development
of interaction and user experience design for voice content, spanning multiple projects
over two years [6]–[8]. Finally, BBC R&D has also been in early experimentation with the
third point of innovation, using Bose AR to explore how audio augmented reality could
redefine programming and audience experiences.5
Creative approaches to voice experience design and the affordances of smart
headphones both informed the objectives and design process for this research project.
5 https://www.bbc.co.uk/rd/blog/2019-03-audio-augmented-reality-spatial-sound
![Page 11: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/11.jpg)
Voice-led interactive exploration of audio
Page 11 of 39 Research & Development
Rationaleanddesigndecisions
The project sought to research novel interaction mechanisms for surfacing BBC content
on voice-controlled devices. This section outlines the specific objectives and design
principles that informed the process.
Aims and design scope
Two research aims were established:
1. Develop and evaluate an interactive prototype for navigation of audio content by
voice
2. Compare a smart headphone vs. smart speaker engagement in terms of
• exploration behaviour
• connectedness to content
Additionally, two key design principles determined the prototype development. Firstly, it
was to be populated with a defined set of existing BBC content arranged in segmented
form, rather than full length programming. Use of short form audio was necessary to
support the time-constrained, lab-based interactions necessary in planned user
experience research. (Creative applications of segmented content are also a significant
focus of BBC R&D’s current workstream on Object-Based Media6). Secondly, the
navigation design had to include seamless previewing of content to enable rapid catch-up
style exploration, with a significant degree of agency devolved to the user.
6 https://www.bbc.co.uk/rd/object-based-media
![Page 12: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/12.jpg)
Voice-led interactive exploration of audio
Page 12 of 39 Research & Development
System architecture
The prototype that resulted was not conceived as a completist experience with tractable
pathways to each piece of featured content. To be effective, it had to avoid complex
layers of menus and navigation and instead encourage engrossing onward journeys
commonly associated with discovery platforms like YouTube. YouTube-style engagement
was a conscious conceptual influence, since it currently accounts for 47% of all on-
demand music streaming [3]. The resulting design was termed Auditory Archive Explorer
(AAE) and is represented visually in figure 1:
Figure 1: Auditory Archive Explorer overview
The prototype is populated with 50 highlights from the BBC’s coverage of the
Glastonbury 2019 festival. Each of the 50 segments is categorised into one of five moods:
‘happy’, ‘sad’, ‘energetic’, ‘mellow’ or ‘dark’. Users hear a 30 second introduction using
synthesised speech generated from one of CereProc’s commercially available voices7.
7 https://www.cereproc.com/en/node/1166
![Page 13: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/13.jpg)
Voice-led interactive exploration of audio
Page 13 of 39 Research & Development
This opening summarises the navigation concept and presents the voice commands
“start”, “select”, “forward”, “back” and “change”. The latter four commands are used to
both navigate through menus and control playback of audio tracks once selected, as
shown in figure 1. The 50 audio pieces are accessed via two menu layers (to select a mood
and then a track). A range of notification sounds serve different purposes including:
feedback confirmation for voice command recognition; delineation of menu items;
signposting of navigation transitions. The same synth voice adds short announcements
for further contextual orientation. Menu options are represented by eight-second
auditory previews of content – either short montages with a voiceover (for the five mood
categories), or a representative excerpt from a given piece (in the track menu). Menus
only feature five items – which repeat on a loop – but the track menu features a further
“refresh” option to repopulate the content. When selected, the title and artist are
announced and segue over the start of the track at the beginning of playback.
Voice interaction design
It should be noted at this stage that only limited consideration was given to the voice
interaction design. The focus of the research was to evaluate the audio-only mode of
displaying and navigating content – not the efficacy of the voice control mechanics per
se. Given this premise, the four navigation voice commands were defined, as far as
possible, to:
• be succinct and clear for the purpose of recognition;
• echo functions familiar to visual interface paradigms (e.g. “select”) and media
player controllers (e.g. “forward” and “back”);
• provide semantic coherence with their dual-purpose functions for navigating
menus and controlling audio playback.
![Page 14: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/14.jpg)
Voice-led interactive exploration of audio
Page 14 of 39 Research & Development
The inclusion of the “change” command is evidently less intuitive, since it meets these
criteria less well and its function is too easily confused with “back” in the menu navigation
context. It would have been preferable also to limit the voice navigation commands to
just three, which is a common approach found in media playing systems. In these cases, it
is quite typical for “back” to return to a previous context if triggered within the first three
seconds of playback. “Change” was included with these drawbacks and inconsistencies
fully in mind, but as a means of enabling the research construct and in the absence of an
immediate and more elegant solution.
![Page 15: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/15.jpg)
Voice-led interactive exploration of audio
Page 15 of 39 Research & Development
Implementationandengineering
Existing technologies and hardware were used to make a fully-interactive prototype that
simulated both smart headphone and smart speaker engagements. This section details
some of the software and sound engineering decisions that created the experience.
Prototype realisation
Four pieces of software were combined to produce the AAE prototype. All system code
was implemented in the Max visual programming language, with audio content and
mixing handled in the Reaper digital audio workstation:
Figure 2: Auditory Archive Explorer software architecture8
8 https://github.com/dlublin/SpeakOSC; https://cycling74.com/; https://www.reaper.fm/; https://edtrackerpro.mybigcommerce.com/
SpeakOSC Leverages Mac OSX ‘Dictation’ function to convert recognised voice commands to OSC messages.
Max Interprets and sends OSC (and head-tracking data) to control navigation (and 3D binaural scene rendering).
Reaper Acts as a database for all audio files (music excerpts, speech, notifications) and as a playback engine.
Head Tracking Data (For smart speaker implementation). 3DoF head rotation co-ordinates transmitted wirelessly.
![Page 16: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/16.jpg)
Voice-led interactive exploration of audio
Page 16 of 39 Research & Development
Sound design
Spatial placement of sound sources in remote conferencing has been shown not only to
be a preferred format for presenting audio information, but that it benefits memory, focal
assurance (assignment of identity to location) and comprehension [9]. Binaural signal
processing enables 3D sound placement over headphones – an effect that can be
enhanced by incorporating head tracking to simulate static virtual scenes.9 In contrast,
smart speakers typically provide mono playback of content. On these devices, sound
sources usually have no spatial separation except through a combination of volume level
and applied reverberation, for limited impression of relative distance.
Linear arrangement of content for the headphone and speaker simulations of AAE was
identical, but the spatial sound production differed significantly. Figure 3 illustrates how
spatial sound positioning was used to segregate audio information streams in the smart
headphone version. In navigation mode (a), menu item previews were spatialised in front
of the listener using incremental positions from around nine o-clock to three o’clock (-
85°, -30°, 0°, 35° and 85° azimuth). Voice announcements and navigation transition
effects were placed in elevated positions, whilst speech recognition sounds used regular
stereophonic playback (i.e. were heard from “inside” the listener’s head). In track playback
mode (b), speaker positions were simulated binaurally over headphones, which created
the impression of “externalised” listening. By contrast, all audio streams in the smart
speaker version were co-located at the same point in space – the position of the smart
speaker itself.
9 There is considerable work available on how binaural perception and synthesis can be exploited in the design of audio information systems, but [10] represents a major contribution.
![Page 17: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/17.jpg)
Voice-led interactive exploration of audio
Page 17 of 39 Research & Development
Figure 3: Sound design for navigation and audio playback
a) Navigation
a) Playback
![Page 18: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/18.jpg)
Voice-led interactive exploration of audio
Page 18 of 39 Research & Development
Binaural spatial audio rendering
Effective binaural rendering is computationally expensive and contingent on a number of
considerations. Hardware performance and the complexity of the binaural rendering
implementation have an interdependent and fundamental bearing on fidelity and,
therefore, spatial realism [10]. Quality of experience is also dependent on the unique
anatomic and cognitive makeup of individuals, so tends to be highly subjective [11].
Bose AR is a prominent commercial development platform for authoring smart
headphone experiences. The technical constraints in its software and associated
hardware were used as a yardstick for optimising the prototype. By design, the AAE
binaural implementation therefore:
• runs virtual Third Order Ambisonics in Reaper using AmbiX plugins10 – equivalent
to the most complex 3D audio algorithm available to Bose AR;
• uses the Institute for Electronic Music and Acoustics 24 speaker “Cube” room
impulse response set (at 2048 sample length to include early room reflection data)
for real-time binaural rendering – avoiding application of additional reverb, which
would be computationally challenging to implement on a mobile platform;
• has 68 milliseconds (ms) of known system latency in DSP buffering, but a total
interval likely to be beyond 100ms – probably below, but approaching, the known
round-trip responsiveness of 196-246ms for Bose AR11.
10 http://www.matthiaskronlachner.com/?p=2015 11 https://developer.bose.com/guides/bose-ar/end-end-system-latency (requires developer login).
![Page 19: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/19.jpg)
Voice-led interactive exploration of audio
Page 19 of 39 Research & Development
In short, the AAE prototype was purposefully designed with a spatial audio resolution and
head-movement response time that approximated current or near-future high-street
technology.
Smart technology simulation
Likewise, the smart headphone and speaker experiences were simulated with hardware
that approximated the capabilities of consumer devices, illustrated in figure 4. The
onboard microphone of a 13-inch 2017 model MacBook Air provided input for voice
capture and all software described ran on the same device. For the headphone version,
open-back wired Sennheiser HD650 were paired with a wireless EdTracker Pro for motion
detection. The speaker version was delivered through a Zamakol ZK606 with wired
connection.
Figure 4: Hardware configuration for smart headphone/speaker
experiences
voice commands voice commands
head tracking data
audio playback
audio playback
Headphones Speaker
![Page 20: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/20.jpg)
Voice-led interactive exploration of audio
Page 20 of 39 Research & Development
Userresearchdesign
Both versions of the prototype were evaluated in a user experience testing lab at the BBC
Broadcast Centre, London, using the processes described in this section.
Participant recruitment
Twenty-two participants were recruited from an open call and selected to achieve desired
balance across the criteria in table 1.
Table 1: Participant profiles recruited for user evaluation
Headphones
(10)
Speaker
(12)
Age 23-38 23-34
Gender 6 female
4 male
5 female
7 male
First language 5 English
5 another
6 English
6 another
Voice assistant usage
6 infrequent
4 regular
5 infrequent
7 regular
Headphone participants either had no previous exposure to binaural audio listening
(eight people) or had only experienced the technology on a few occasions (two people).
Study format
Participants undertook an individual one-hour study session, comprising:
![Page 21: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/21.jpg)
Voice-led interactive exploration of audio
Page 21 of 39 Research & Development
1. Pre-task interview (approx. 10 minutes)
A short introductory discussion investigated participants’ existing behaviour in music
exploration and discovery, before they were given a high-level summary of the
prototype concept.
2. Onboarding (approx. 10 minutes)
Some training and supervised use of the system was required before participants
completed the evaluation task. Those undertaking the headphone experience were
exposed to a one-minute audio demonstration. This presented a direct comparison
between spatial positioning in standard stereo playback and sound placement using
head-tracked binaural audio. Those undertaking the speaker experience received a
short tutorial on projecting their voice with sufficient loudness and clarity to be
recognised over system playback, using the keyword “testing”. In either mode,
participants were then given a short time to navigate the system freely. They were
provided with a written prompt reiterating the four navigation commands and
informed only that these enabled interaction with the system menus and controlled
track playback. They were given a minimum of 1m30s (but never more than 2m00s) to
ensure they successfully completed two or more voice commands. This pre-exposure
phase also included the 30 second narrated system introduction:
Welcome to the Glastonbury 2019 audio browser. Browse the performance highlights
to suit your mood, using just your voice. You can use four voice commands to navigate
the browser: “select”, “forward”, “back”, “change”. Say “start” to begin browsing or
wait to hear this information again.
![Page 22: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/22.jpg)
Voice-led interactive exploration of audio
Page 22 of 39 Research & Development
Participants were invited to ask questions after the pre-exposure phase, but no further
instruction or indication was given on how or when commands were to be used, or the
actions they effected.
3. Evaluation task (15 minutes)
Participants were given a copy of the following task, which they also retained for
reference:
You have 15 minutes to explore the archive as far as you can and find six new
tracks that you like. Use the pen and paper provided to make your list as you go.
Make a note of the track names and artists that you choose and anything
particular you liked about each one.
A final opportunity to ask questions was provided before starting. Again, no
guidance was given about how to operate the system or access track and artist
names. When ready, participants were left alone to complete the task with the
prototype, which stopped responding automatically after exactly 15 minutes.
4. Post-task questionnaire (approx. 10 minutes)
Participants were given a short questionnaire to assess the prototype’s usability and
evaluate user experience (discussed in detail in the following section).
5. Post-task interview (approx. 10 minutes)
A short closing discussion explored participants’ responses to interacting with the
prototype in greater depth.
![Page 23: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/23.jpg)
Voice-led interactive exploration of audio
Page 23 of 39 Research & Development
Data collection
Data was collected from participants using four sources:
• real-time logs of their interactions with AAE during the task
• qualitative ratings from the post-task survey evaluation
• a hand-drawn mental model of the system they interacted with
• video recordings of the pre- and post-task interviews
![Page 24: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/24.jpg)
Voice-led interactive exploration of audio
Page 24 of 39 Research & Development
Findings
Data analysis was focussed on the original project aims: evaluating AAE’s voice-led
interactive navigation mechanism and comparing the two smart technology
implementations.
Patterns of use for voice-led content discovery
Figure 5 illustrates the average activity of all 22 participants viewed collectively.
Approximately two thirds of all time was used to browse content and a third was
dedicated to listening. This division seems an appropriate balance given the task – i.e.
that the majority of time was spent previewing options, but a significant minority
checking full content of tracks, to affirm choices and discover artist/title information.
In their 15 minutes, participants on average previewed half of the available content and
selected six tracks for further listening, but tended to fall short of their assigned target to
note six new tracks they liked. All participants accessed more than one mood category.
All but one participant listened to multiple tracks and 19 listened to five or more. These
patterns of interaction suggest that, as a cohort, users had no real difficulties in
navigating the system to discover new content and striving for the specified response,
even if they were unable to fulfil this in the allotted time.
![Page 25: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/25.jpg)
Voice-led interactive exploration of audio
Page 25 of 39 Research & Development
Figure 5: Average activity for all participants
Figure 6: All visits by content area
Listening time36%
Searching time64%
Mean Median
Previews heard (50 tracks)
24.05 25
Track listens (50 tracks)
5.64 6
Responses given (6 track target)
4.27 5
No participants
All participants
![Page 26: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/26.jpg)
Voice-led interactive exploration of audio
Page 26 of 39 Research & Development
The range and concentration of exploration presents a similarly positive outlook. Figure 6
shows the combined reach of participant activity during the 15-minute task. Each circular
node represents a location in AAE, spanning from the primary mood menu (1 x large),
through track menus (10 x medium) and individual tracks (50 x small). The value within
circles specifies how many participants visited that point in the system. As expected, visit
counts reduce at deeper locations (outer points in the diagram), since these destinations
are more removed from the starting point of the user. However, more notable is that
activity is fairly well balanced in terms of menu item precedence – that visits are quite
evenly distributed left-to-right, at all levels in the diagram. (Though there was slightly
greater traffic through the ‘happy’ category, this could be ascribed partly to initial trial-
and-error experimentation with voice commands and orientation. It could also be
reasonably supposed that the slightly lower level of traffic through the ‘dark’ route might
be due to the more specialist appeal of this category.) This demonstrates that the system
design supported exploration across all sections of the content, seemingly without any
undue effect from the order in which categories or tracks were presented.
Specific profile differences for voice-led content discovery
Prior experience with voice assistant technology appeared to influence how favourably
users viewed AAE in its capacity for content discovery. Frequent users included all those
who self-reported daily or weekly voice assistant interaction, infrequent were those who
declared their usage to be monthly or never. Regular users were significantly more likely
to provide favourable responses to the statement in figure 7 (Mann-Whitney U-test
p=0.045). However, equivalent disparities were not found between these two groups in
any of the other four qualitative usability ratings (shown separately in figure 10).
Additionally, there was no notable difference in the extent of task completion (i.e.
number of written responses given) between these groups. These patterns suggest,
![Page 27: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/27.jpg)
Voice-led interactive exploration of audio
Page 27 of 39 Research & Development
therefore, that the prototype’s potential to aid music discovery was more evident to
those familiar with (current limitations of) accessing audio content through voice
assistants.
Figure 7: ‘I was able to discover something new’ ratings by voice assistant
usage
Participants with English as their first language were significantly more likely to progress
further with completing the task (two-way ANOVA p=0.038)12. However, there was no
notable discrepancy found between these groups in their actual interactions with the
prototype. The group without English as a first language tended to register a comparable
count of voice commands and encountered similar proportions of preview content and
full track playback. This points towards the likelihood that language fluency did not
12 A two-way ANOVA compared the effect of gender and first language and these were found to be independent. Female participants were more likely to progress with the task to a similarly significant extent.
10
4
1
5
10
0
2
4
6
8
10
12
Infrequent voice assistant users Regular voice assistant users
Parti
cipa
nts
Usage group
Somewhat disagree Somewhat agree Totally agree
![Page 28: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/28.jpg)
Voice-led interactive exploration of audio
Page 28 of 39 Research & Development
present a barrier to engaging with the system itself, but that deciphering the spoken
artist names and track titles was more challenging. Post-task interviews revealed a
number of comments that the pace, volume and accent (a Scottish male) of the narrator
made it difficult to discern in some cases. Added to this is the fact that transcribing artist
and track names is a relatively artificial construct included for the purposes of the
research study, which would not typically be pursued in a real-world content discovery
journey.
Figure 8: Task completion rate (number of choices provided) by first
language
Smart headphones and smart speaker – behaviour and usability ratings
No significant differences were found in any of the interactions (t-test), the task
completion rate (Mann-Whitney U-test) or self-reported usability ratings (ANOVA)
between users of either implementation. Figure 9 shows how similar the two groups were
AnotherEnglishFirst Language
0
1
2
3
4
5
6
Res
pons
es g
iven
(out
of 6
)
![Page 29: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/29.jpg)
Voice-led interactive exploration of audio
Page 29 of 39 Research & Development
in their use of the respective versions of AAE over 15 minutes. Speaker users appear, on
aggregate, to have been marginally more proactive in their behaviour, but in no instance
was this to a statistically significant degree. Likewise, figure 10 illustrates that opinion
expressed through usability ratings coalesced very comparably between users of either
version. Both the absolute and relative values between rating statements are mirrored
for the headphone and speaker responses. Although the headphone ratings are
consistently more favourable in this instance, again this is never to any statistically
notable extent.
Figure 9: Task completion rate (number of choices provided) by first
language
15.5
24.3
6.2
11.6
5.1
24.8
4.4
17
29
8
13
6
24
4.50
6
12
18
24
30
“Select” “Forward” “Back” “Change” Listens Previews Responses
Coun
t
Command / Activity
HeadphonesSpeaker
![Page 30: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/30.jpg)
Voice-led interactive exploration of audio
Page 30 of 39 Research & Development
Figure 10: Self-reported usability ratings by platform
Though differences in behaviour and usability between the experiences have not been
revealed, both figure 9 and 10 present promising data regarding the effectiveness of the
prototype design. Participants were exposed to AAE for a very limited time and without
any operating instructions. Both groups of users nevertheless felt confident skipping
through menus to identify content of potential interest (“forward” by far the most used
command), but more rarely having to repeat an option (“back” used the least). The fact
that “select” was used noticeably more frequently than “change” emphasises that at least
a proportion of users discovered the extended uses of the former command – i.e. to
refresh a track menu list and/or to restart playback from any position during a piece. This
quantitative data is confirmed in the self-reported usability ratings, where all responses
averaging within the affirmative (where the statement is favourable) or negative (where
the statement is pejorative) thirds. In summary, both versions of AAE seemed to enable
users to onboard themselves and pursue a specific time-bound task straightforwardly and
with a good degree of success.
1 2 3 4
I was able to successfully completethe task I was given
I found the systemdifficult to use
The system was easyto navigate
I was able to discoversomething new
I needed to learn a lot of thingsbefore I was able to get going
with this system
Headphones
Speaker
Don’t agree at all Somewhat disagree Somewhat agree Totally agree
![Page 31: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/31.jpg)
Voice-led interactive exploration of audio
Page 31 of 39 Research & Development
Smart headphones and smart speaker – connectedness
Following the evaluation task, participants were asked to recall details of the main mood
menu that they encountered. A scoring system was applied to assess how well users
remembered the number, name and sequence of the mood categories (‘happy’, ‘sad’,
‘energetic’, ‘mellow’, ‘dark’) in the primary menu of the prototype. The maximum
available score was 12. Of the twenty-two participants, twelve scored full marks and only
five scored less than 9, with 5 being the lowest score registered. Importantly, there was
no significant difference found (t-test) in users’ ability to recall the makeup of the mood
menu between either version of AAE. In this instance, therefore, there is no evidence to
suggest that spatial presentation of menu options benefitted storage and recall of that
information from users’ short-term memory.
Participants were also asked to provide a mental model illustration of the system they
had experienced (“draw a visual representation of the system you interacted with”). Full
analysis of all of these diagrams is beyond the scope of this report. However, submissions
were subsequently discussed collectively by three BBC researchers, who together
summarised characteristic differences in the representations.
Table 2: Characterisation of mental model illustrations, by prototype
version
Headphones Speaker
• physical space and entities • hierarchical structures
• portrayals of experience • flowcharts / decision trees
• narrative explanations • process definitions
• curves and circles • hard angles
![Page 32: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/32.jpg)
Voice-led interactive exploration of audio
Page 32 of 39 Research & Development
The descriptions in table 2 were not all evident universally, but they represent a
consensus overview on the general character of the two collections, which was quite
clearly distinct. One of each illustration is appended to this report to show instances
where these traits were exemplified perhaps most strongly. Figure 11 shows a depiction
that superimposed a physical journey onto the headphone experience; figure 12 shows a
logic diagram interpretation of the speaker experience. The qualitative summary in table
2 suggests that, overall, the spatialised auditory environment had a markedly different
effect for headphone users in the nature or character of their interaction and connection
with the content.
![Page 33: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/33.jpg)
Voice-led interactive exploration of audio
Page 33 of 39 Research & Development
Conclusionsandrecommendations
The original project aims are revisited to summarise research outcomes and areas of
future work to be considered.
Review of research aims
1. Develop and evaluate an interactive prototype for navigation of audio content by
voice
The data gathered in this user study indicates that the AAE design presents a good
potential basis for voice-led interactive audio exploration. Users of the prototype
were shown to navigate through a wide range of content, without any prominent
precedence bias and with a balance of activity in line with expectations, given the
task they were set. Despite only 16.5 to 17 minutes’ exposure and relying on self-
orientation, aggregate interaction patterns also suggest that users were able to
navigate confidently, fluently and accurately, with some use of more advanced
features. These statistical findings about the user experience are supported by
qualitative, self-declared usability ratings.
2. Compare a smart headphone vs. smart speaker engagement in terms of
• exploration behaviour
There was no evidence that the spatial arrangement of content and the system
notifications used in this study had any effect on patterns of interaction, activity
or task success. It should be noted that aspects of the study design could have
presented potential limiting factors in this respect (see Future Work section
below).
![Page 34: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/34.jpg)
Voice-led interactive exploration of audio
Page 34 of 39 Research & Development
• connectedness to content
Spatial presentation of menu options did not significantly aid retention or recall
of category information, comparted to monoaural presentation. However, data
gathered from users’ mental model system representations strongly suggest
that the smart headphone version did create an evidently different type of
interactive experience.
Future work
There are three considerations worth noting that could be addressed if subsequent
iteration occurs in this area of investigation.
• Allow more time (at least 20 minutes) for user interaction
It is possible that the 15-minute engagement (plus 1m30s–2m pre-exposure)
was too short to establish full fluency with the system. Users will have spent a
good proportion of their allotted task time to continue self-orientation with the
voice commands and system structure. If differences in exploration behaviour
were to emerge between the two prototype versions, it’s possible they might
only present when a base degree of interaction proficiency is established by
users. There is a possibility that the relatively short duration used by this study
did not allow that threshold to be surpassed for a sufficient amount of time.
• Test gesture instead of voice interaction with headphones
Likewise, it is possible that the latent nature of voice interaction itself could
limit users’ ability to take advantage of any added perceptual orientation
afforded by the headphone version. In the prototype, spatial cues communicate
where, in a menu list of five, the user is currently located. If the interaction
![Page 35: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/35.jpg)
Voice-led interactive exploration of audio
Page 35 of 39 Research & Development
method was instantaneously responsive (as in most media playing technology),
this would potentially allow skipping forward or back to previously heard menu
options by relying on (memory of) its virtual position as an anchor. Delayed
responsiveness in voice interaction could have been a further limiting factor on
users’ ability to exploit the navigational benefits in the headphone versions.
• Use other content and browsing contexts
Although the prototype used Glastonbury 2019 content for the purposes of
this study, it could have been populated with any segmented content, or even
full programme content (though the latter would have been more challenging
to evaluate in a controlled study). Rather than using pre-determined
categorisation, content could also be presented using dynamic categories to
introduce a recommender system mechanic to the experience. These
adaptations would provide more insight on potential avenues for exploring how
BBC content can be surfaced more effectively on audio-only devices.
![Page 36: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/36.jpg)
Voice-led interactive exploration of audio
Page 36 of 39 Research & Development
Appendix
References
[1] Ofcom, “Communications market report,” 2018. [Online]. Available:
https://www.ofcom.org.uk/__data/assets/pdf_file/0022/117256/CMR-2018-
narrative-report.pdf. [Accessed: 18-Oct-2019].
[2] IFPI, “Global music report: annual state of the industry,” 2017. [Online]. Available:
http://www.ifpi.org/downloads/GMR2017.pdf. [Accessed: 18-Nov-2017].
[3] IFPI, “Global music report: annual state of the industry,” 2019. [Online]. Available:
https://www.ifpi.org/news/IFPI-GLOBAL-MUSIC-REPORT-2019. [Accessed: 18-
Oct-2019].
[4] Ofcom, “The communications market report: interactive data,” 2019. [Online].
Available: https://www.ofcom.org.uk/research-and-data/multi-sector-
research/cmr/interactive-data. [Accessed: 18-Oct-2019].
[5] BPI & ERA, “Everybody’s talkin’: smart speakers & their impact on music
consumption,” 2018. [Online]. Available:
https://www.bpi.co.uk/media/1645/everybodys-talkin-report.pdf. [Accessed: 18-
Oct-2019].
[6] H. Cooke, J. Moore, A. Wood, and J. Rusznica, “The Inspection Chamber: UX analysis
and recommendations,” 2018. [Online]. Available:
http://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/IC_UX_Analysis.pdf.
[Accessed: 18-Oct-2019].
[7] N. Birch and H. Cooke, “Designing a voice application: top tips from UK voice
producers,” 2019. [Online]. Available:
http://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-
files/VoiceApps_DesignTips.pdf. [Accessed: 18-Oct-2019].
![Page 37: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/37.jpg)
Voice-led interactive exploration of audio
Page 37 of 39 Research & Development
[8] E. Young and L. Miller, “Smart speaker UX & user emotion,” 2019. [Online].
Available: http://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-
files/Smart_Speakers_and_User_Emotion_BBCRD_2019.pdf. [Accessed: 18-Oct-
2019].
[9] J. J. Baldis, “Effects of spatial audio on memory, comprehension, and preference
during desktop conferences,” Conf. Hum. Factors Comput. Syst. - Proc., no. 3, pp.
166–173, 2001.
[10] D. R. Begault, 3D sound for virtual reality and multimedia, 1st ed. London, UK:
Academic Press Limited, 1994.
[11] J. Blauert, Spatial hearing: the psychophysics of human sound localization, Revised.
Cambridge, Massachusetts: MIT Press, 1997.
![Page 38: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/38.jpg)
Voice-led interactive exploration of audio
Page 38 of 39 Research & Development
Mental model drawings
Figure 11: Example mental model drawing of smart headphone experience
![Page 39: Voice-led interactive exploration of audiodownloads.bbc.co.uk/rd/workstreams/irfs-misc/bbc-rd... · 2020. 1. 10. · Voice-led interactive exploration of audio Page 5 of 39 Research](https://reader034.fdocuments.in/reader034/viewer/2022051902/5ff293194eb02a04d30961d9/html5/thumbnails/39.jpg)
Voice-led interactive exploration of audio
Page 39 of 39 Research & Development
Figure 12: Example mental model drawing of smart speaker experience