Exploring tools for expressive voice
-
Upload
eilatann -
Category
Technology
-
view
919 -
download
1
description
Transcript of Exploring tools for expressive voice
![Page 1: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/1.jpg)
Exploring Tools for Expressive Voice Affective Computing Fall 2011 Natalie Freed
![Page 2: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/2.jpg)
Project Goal
¨ Build and evaluate tools to support people modulating their voice (speed, loudness, pitch) in a performative context
¨ Approach: real time feedback, playfulness
¨ Technology: speech analysis, audio manipulation
![Page 3: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/3.jpg)
Applications
1. “Stretching Your Range” (loudness and speech rate)
2. “Playing with Voices” (pitch and intonation)
3. Pilot study: expert vs. computer analysis of voice modulation
4. Study design to determine effect of interventions 1 and 2
![Page 4: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/4.jpg)
System Architecture (both)
Praat running on web server
audio file
analysis
visualization
Flash + Actionscript + Adobe Air
Goal: flexible, cross-platform, can run on portable devices, Praat audio analysis made into a web service that can be used for future applications.
CGI call
![Page 5: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/5.jpg)
Application 1: Stretching Your Range (Loudness and Speech Rate)
Navigate through exercises
Target area
feedback is plotted
![Page 6: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/6.jpg)
Evaluation of “Stretching Your Range”
1. Read book 3. Read book with feedback
2. Speech modulation exercises with feedback
n = 4 Not a controlled study: was not compared to a group that read the book twice without using the tool, or with the prompts alone. Goal: qualitative analysis of interface + learn how to measure effectiveness and degree of voice modulation based on different audio recordings to prepare for controlled study.
![Page 7: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/7.jpg)
Human Analysis
4 20-second audio samples from each participant. For each recording (pre and post)
• 30 seconds after start • 1 minute before end
pre
post
Order randomized
Expert evaluator: public speaking instructor Bill Hoogterp
1-7 scale
![Page 8: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/8.jpg)
Human Analysis
Audio (mean of 2 samples per participant)
How effectively does this speaker keep the listener's
attention?
How much is this speaker modulating the speed of his
or her voice?
How much is this speaker modulating the loudness of
his or her voice?
How much is this speaker modulating other aspects of
his or her voice, such as pitch, rhythm, or intonation?
A PRE 2 3 2 2
A POST 2.5 2 2.5 2
B PRE 4.5 4 4 3.5
B POST 4.5 4 4 3.5
C PRE 3.5 3.5 3.5 3.5
C POST 4 4 4 3.5
D PRE 2.5 2.5 2 2
D POST 4 3.5 4 2.5
![Page 9: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/9.jpg)
Human Analysis
1-tailed T test for correlated samples (within-groups), alpha=0.05
Not significantly significant – but would be if there were one more participant who upheld trend. => Need a larger n!
0
1
2
3
4
Keep Attention
Modulate loudness
Modulate speed
Modulate other
Pre (mean)
Post (mean)
![Page 10: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/10.jpg)
Software Analysis (Praat)
ID duration speaking
rate articulation
rate loudness range pitch range
Pitch standard deviation
intensity range
Intensity standard deviation
mode intensity
A PRE 240.83 2.69 4.40 58.34 448.83 48.65 35.68 9.88 57.84
A POST 253.42 2.56 4.31 56.99 360.89 39.77 35.09 9.39 56.36
B PRE 119.26 3.79 4.87 58.12 440.20 59.61 39.86 8.39 62.95
B POST 120.59 3.46 4.82 58.59 434.35 67.06 38.68 9.30 60.70
C PRE 179.42 3.41 4.63 61.72 446.26 69.52 39.70 10.08 64.52
C POST 212.30 2.96 4.51 60.49 447.73 76.11 43.66 10.77 62.26
D PRE 232.86 2.31 2.90 52.73 454.57 68.46 33.95 7.94 58.25
D POST 206.29 2.61 3.74 60.44 441.37 88.80 47.96 9.65 61.99
![Page 11: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/11.jpg)
A pre A post
B pre B post
C pre C post
D pre D post
Pre and post recordings
Same book
![Page 12: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/12.jpg)
A pre A post
B pre B post
C pre C post
D pre D post
Pre and post recordings (time-stretched)
![Page 13: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/13.jpg)
“Effectiveness” (self report and evaluated)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
A B C D
Pre
Post
Expert evaluation of effectiveness at keeping listener’s attention of audio samples (1-7)
0
1
2
3
4
5
6
A B C D
SR Effectiveness
Again not enough data for meaningful results (and the questions are not comparable here), but the interesting question for future work: how accurately do people estimate their own effectiveness?
Self-report of own “public speaking effectiveness” (Likert, 1-7)
![Page 14: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/14.jpg)
Human/Software?
48
50
52
54
56
58
60
62
64
A B C D
Pre
Post
loudness range (max – min)
0 50
100 150 200 250 300 350 400 450 500
A B C D
Pre
Post
Pitch range (max – min)
0
0.5
1
1.5
2
2.5
3
3.5
4
A B C D
Pre
Post
speech rate (pause ratio)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
A B C D
Pre
Post 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
A B C D
Pre
Post
Speed modulation Other/pitch modulation
0
0.5
1
1.5
2
2.5
3
3.5
4
A B C D
Pre
Post
Loudness modulation Human analysis:
Software analysis:
How to most accurately evaluate and compare overall success at modulation? Exploring which metrics might map to expert human evaluation, or how they can be used together.
![Page 15: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/15.jpg)
Redesign of “Stretching Your Range”
The prompts themselves are helpful, important to identify value of feedback component.
-“You can go louder!” Personalized and credible encouragement. -Identifying what you can’t hear about your own voice.
Can we avoid calibration/microphone/perceptual loudness issue for ease of deployment? Calibration is possible, but it may not be necessary to have an absolute sense of loudness to get people to stretch their range. New approach: continuous visualization. eliminate problem of when feedback arrives, allow people to speak without unnatural breaks. New playback button: reinforce feedback, introduce distance from own voice.
![Page 16: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/16.jpg)
Proposed redesign
Three main exercises based on study results:
1. Stretching range to limits 2. Slowing down (pausing more and longer) 3. Spanning full range
![Page 17: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/17.jpg)
1. Stretching range to limits
continuous audio level
target area
![Page 18: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/18.jpg)
2. Slowing down
pause duration target
![Page 19: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/19.jpg)
3. Spanning full range
infrequently occurring loudness
frequently occurring loudness
![Page 20: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/20.jpg)
Interface 2: Playing with Voices (Pitch and intonation)
Demo at: http://vimeo.com/33385700
![Page 21: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/21.jpg)
Interface 2: Playing with Voices (Pitch and intonation)
Video at: http://vimeo.com/33385700
1. Reader’s voice is recorded and sent to server for analysis.
2. Audio is compared to different recordings, closest match in pitch and intonation is returned.
3. “Doctor” character (the hand puppet) plays back the audio through embedded speaker.
=> Puppet mimics the reader’s character voices, encouraging silliness.
![Page 22: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/22.jpg)
“Playing with Voices” User Feedback
“Wanted to try the extremes – because the extreme voices are funny!” “I liked it when it spoke with the same rhythm.” “I wanted it to mimic me.” “Turning the pages breaks the rhythm, but the pause before it speaks is right.”
First tested with random voices for puppet playback to learn what people expected, what was engaging. Feedback from (5) users:
Mimicry and extreme voices appealed, so built into final application (video on previous slide).
![Page 23: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/23.jpg)
Controlled Study Design
Research question: Do these interventions impact speech modulation and expressiveness? n > 10. All participants read the same book. Control: Read book once, prompt to read more expressively, read book again. Group A: Read book, [study 1: exercises with no feedback], read book again.
[OR study 2: read “no more monkeys” book with random voices] Group B: Read book, [study 2: exercises with feedback], read book again. [OR study 2: read “no more monkeys” book with pitch-matched voices] Secondary question: compare human evaluations (not expert only) to software evaluation and identify correlated measures.
![Page 24: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/24.jpg)
References
Boersma, P. and Weenink, D. " Praat: doing phonetics by computer," Version 4.4.16 ed, 2006. Camlot, J. et al. “The Victorianator.” 2011. http://ludicvoice.concordia.ca/?page_id=28 Hoque, M. E., Lane, J. K., el Kaliouby, R., Goodwin, M., Picard, R.W., Exploring Speech Therapy Games with Children on the Autism Spectrum, Proceedings of InterSpeech, Brighton, UK, September 6-10, 2009. Lewis, J. and Tsonis, F. “SenseText: Gesture Based Control of Text Visualization”. Proceedings of the 6th International Workshop on Gesture in Human-Computer Interaction and Simulation, Berder Island, France, May 18-20, 2005. Rodenburg, P. 1953. The actor speaks : voice and the performer. New York, NY : St. Martin's Press, 2000.
![Page 25: Exploring tools for expressive voice](https://reader033.fdocuments.in/reader033/viewer/2022052410/554c485bb4c90570648b5463/html5/thumbnails/25.jpg)
Thank you!
Ehsan Hoque: speech analysis scripts, guidance, COUHES assistance Bill Hoogterp: expert evaluation of audio data Ryan McDermott: help with web server setup and XML parsing Adam Setapen: Read book for demo video Cynthia Breazeal: help with COUHES approval User study participants