Virtual Podium with HTC Vive Abstract
Transcript of Virtual Podium with HTC Vive Abstract
1
CS294W Final Paper Team Soapbox Jesse Min(jesikmin, 05786379), JeongWoo Ha (jwha, 05833965), Min Kim(tomas76, 05860540)
Virtual Podium with HTC Vive Abstract
Public speaking is a difficult task for many people because speakers can be intimidated by
speaking before large groups and speaking in new environments. To improve speaking ability, speakers
would benefit from getting feedback, but it is not feasible for most people to receive constant feedback.
Our project tries to solve the problem by creating a Virtual Reality application with diverse virtual
presenting environments with augmented features such as custom powerpoint slides and speaker notes.
To give meaningful speech feedback, Hound SDK and Fitbit API are incorporated to provide automatic
feedback of verbal tics and heart rates.
I. Introduction
In order to make speech practicing more effective, we have created a virtual presentation
application with which a speaker can practice: Virtual Podium with HTC Vive. In this VR environment,
speakers can experience the circumstances in which they are going to give a speech in advance and
thereby become easily be accustomed to the setting and familiarized with a large audience as well. In
addition, our VR app provides personalized speech feedback such as report on heart rate and verbal tics
using Natural Language Processing and third party APIs. With Hound SDK, we are able to convert voice
into text and search the verbal tics such as “like”, and “you know.” With Fitbit API, we can retrieve the
heart rate of a user during mock presentations.
This paper is organized as follows. Firstly, some of related works that are relevant will be
reviewed. Secondly, we will provide high level and detailed description of our application. Thirdly, the
2
process and results of user study will be discussed. Fourthly, conclusion and possible future work will be
provided. Lastly, we will discuss the things we have learned throughout the project.
II. Related Work
There are a number of web/mobile/standalone speech practice applications. Some mobile
applications like “Articulation Station” are aimed at comparing pronunciation of the users to the standard
pronunciation and provides analysis on in which part the user should correct his or her pronunciation.
Most relevant apps are aimed at people with special needs- patients who particularly need speech therapy
for stroke, aphasia, and autism. In addition, most of the existing speech practice or analysis applications
provide users intensive feedback, such as how long they have arbitrarily halted during their speech and
overall pace of the speech.
In the field of virtual reality, there has been applications that provide users extensive and
realistic vicarious experience, but more, there are VR applications that make people feel as if they are in
the Star Wars movie or as if they are standing in the middle of the Sahara..
However, there has not been any substantial effort to combine these two fields of technology to
provide an app for public speaker who would like to practice and improve their ability as well as receive
feedback. We were motivated by the idea that if we utilize open APIs such as Hound SDK and Fitbit
API, we could give people speech analysis and rich feedback as existing commercial speech practice
apps do. We can also render virtual conference room or 360-video recorded environments in virtual
reality (more precisely on HTC Vive HMD). As a result, we decided to integrate those two state-of-art
technologies to offer users with new paradigm of speech practice in VR machine.
3
III. High-level Project Description
Each execution thread of Virtual Podium consists of three major parts: entry stage, practice
stage, and analysis stage. The below Figure 1 describes the three stages in more detail.
Figure 1. Program execution stages
In the entry stage, after a user runs the program, he first needs to sign into the app so that the
program can retrieve all the personalized presets and basic personal data from the web server. Then, the
user can select between two practice modes – virtual conference room mode and real room mode
rendered by 360-video. We currently have rough mockups for these but still need to complete the work
in the future.
In the practice stage, if the user chose to practice in 360-video mode, he can make a choice
among many options such as small classroom setting and large auditorium setting. The user can actually
begin practicing his or her speech from this point. On the HTC Vive head mounted display (or HMD),
the user will be able to view simple statistics including time and heart rate on real-time basis. In addition,
the user can provide a typed script and powerpoint slides to the program beforehand so that he can view
the script and slides in the HMD while practicing and move onto the next sentence or slide by giving a
4
trigger with two HTC controllers. The user can also remove the text script, if he wants to, by holding his
head up and make the text appear again by bowing his head slightly.
During the analysis stage, as soon as the user finishes practicing, the program utilizes Fitbit API
to analyze the overall heart rate. It also employs Hound SDK to report how many and often the user
unconsciously exhibited verbal tics such as “like” and “you know” during the practice. After fetching
HTTP response from those APIs, Virtual Podium displays a neatly organized report onto its web
dashboard personalized for each user.
IV. Detailed Project Description
Vizard was used to build the main virtual reality application. The advantage of using Vizard is
that many Python libraries as well as Python scripts can be incorporated into our application. To model
3D conference room, we used predefined models in Sketchup and modified them according to our needs.
Our main feature of the program consists of practicing speech in the modeled virtual environment with
3D rendered powerpoints and script. To control powerpoint slides and script, we utilized two HTC Vive
controllers. Vizard automatically parses user provided powerpoint slides and script to render to the
virtual environment. When the user clicks the right controller, the powerpoint slide will move to the
next slide. Similarly, the next line of the speaker note will be rendered if the left controller is clicked. In
addition, we rendered a 3D timer in the virtual environment to show the user how long the time elapsed.
The below two figures illustrate how the rendered powerpoint slides and speaker notes look like in
virtual environment.
5
Figure 2. Screenshot of custom speaker notes in virtual modeled environment.
Figure 2. Screenshot of powerpoint slides in virtual modeled environment.
To provide the user with useful feedback, we incorporated Fitbit API and Hound API. The Fitbit
API is used to provide the user with heart rates recorded during the presentation, and Hound API is used
to run NLP algorithms. On the desktop that runs the Vizard application, a set of GUI elements is
displayed to enable call to Hound and Fitbit API. When the user clicks a Hound button, a Hound API
6
will be activated to record user’s speech into text. If the Fitbit button is clicked, the heart rate from the
presentation start time to presentation end time will be recorded. To avoid interference with the main
virtual reality program, Fitbit API and Hound API calls are made through multithreading. Appropriate
tokens and authorizations are provided through REST API calls to access personal data from the server.
We discovered that single thread calls will freeze the program. The program will exit when the analysis
is done. The analysis result is stored in the MySQL database to be used by the Dashboard web
application.
Dashboard web application is built with RoR, and it will be hosted through Heroku with
Postgres database. We enable user signup and user signin to access personalized analysis data. D3.js is
used to display the Fitbit and Hound analysis data into the web dashboard.
We allow the user a few different modes with different room sizes. We filmed a 360 video of a
large classroom and small classrooms, both filled with students to use as our virtual reality application (a
CS106A review session, a CS161 class, and a CS294S class were filmed with the consent of instructors).
The videos were converted to sphere format and rendered in HTC virtual environment. A user can
choose which room he or she wants to present in and start presenting in the virtual environment selected.
The limitation of this approach is that the users are not able to walk around in the virtual environment
while the 3D modeled virtual environment allows people to move in the virtual space.
7
Figure 3. Application multi threading execution flow
V. User Research
Our user research strategy consisted of two different parts – first, each user study participant was
interviewed on overall impression toward the application and completed a survey (Appendix B) and
second, we analyzed every user’s heart rate and voice recorded. We conducted the user study on 8
potential users, five recruited and three of us. Each subject of study of five recruited users was a Stanford
undergraduate student from different majors and years. The participants of the study were debriefed of
instructions such as how to calibrate the focus of HMD and how to use HTC controllers. Every one of
them practiced their own speech in three different modes: virtual conference room mode, 360-angle
small lecture room, and 360-angle large lecture hall. We then asked users to complete the survey form,
conducted interview on general comment on our app, and analyzed on collected heart rate and verbal tick
data.
Figure 4 displays heart rate change with different virtual environment mode for each user. Six
out of eight participants or 75% of participants showed tendency of increasing heart rate as the
8
environment was shifted from virtual conference room mode to 360-video small lecture room and from
360-video small lecture room to 360-video large lecture hall mode. Except subject 4 and 7, a
participant’s heartbeat became faster or, in other words, a participant felt more anxiety as room changed
from a virtual room to a more realistic setting and a small room to a spacious lecture hall. This heart rate
data collected by Fitbit on a real-time basis was coherent with the user's explicit remark during interview
and survey that virtual room was interesting yet unrealistic, while 360-video setting, particularly
360-video large lecture hall mode, was surprisingly realistic, compelling, and effective not only because
of the size of the room but also due to the larger size of audience.
Figure 4. Average heart rates of subjects according to virtual environments
Besides the heartbeat, we converted voice data of testers into text on a real-time basis using
Hound SDK. Such processed text data was linearly searched to find certain verbal ticks including “like,”
“well,” and “you know.” The example statistic for subject 6 is shown below:
9
Figure 5. Verbal tics for subject 6
Users evaluated this verbal-tick-specific speech feedback as one of the most useful features in
this application as most of them were not aware of their occasional habits such as “like” while they were
speaking. However, there was still a limitation as well. As our tester recruiting was done during the week
9, one of the busiest weeks for every Stanford student, we could not force testers to prepare for 5 or 10
minutes long speech. Because users could give only 2 to 3 minutes speech at most, capturing verbal tick
in such short speech sometimes did not result in meaningful result. Nonetheless, according to the survey,
users were satisfied the feature.
In addition, many users provided with several suggestions for future work in their survey
response. Some people suggested of employing more realistic audio sound along with the current
realistic visual environment – for instance, users are able to listen to their sound with appropriate
echo/ambient sound through earphones. Others suggested for a high-resolution 360 video. From our
experience, 1080K resolutions video, which is usually considered as a fairly high-resolution video, was
not enough for 360 setting where sphere mapping onto HMD is required. Our Samsung 360 Gear was
acceptable for our prototype but not sufficient for a complete application in the future.
10
VI. Conclusion
VR technologies have become more widely approachable to the general public. There are much
more ways the VR technologies could be used besides gaming and entertainment. There are already
some implementation of VR technology in the medical field - to cure the users with specific phobias.
Although public speaking is one of the many skills that is important and helpful for many people, many
people have difficulties acquiring the desired skills, VR programs could help them reduce public
speaking anxiety and to help them practice before an actual speech. Leveraging various existing
technologies, such as Hound SDK and Fitbit API, we have and created a virtual environment by which
users can practice public speaking skills anywhere at anytime. According to our user research, we have
discovered that people are indeed interested in using such a technology and they do feel that virtual
environments can partially replace the actual presentation environments for practice.
This project is just a first step in what could potentially become a very powerful application.
What we could make out of it really depends on how we create new demands and meet the future needs.
Virtual reality could change behaviors of human beings, and this powerful methods could and should be
used wisely to assist the users to change for the better. As we will be able to incorporate other high-end
technologies such as haptic sensories, it would be possible to create more realistic virtual
speech-practicing environment to relieve public speaking anxiety.
VII. Future Work
In this section, we will discuss the limitations of our current technology and four possible future
improvements that can make our application more feasible and effective.
Firstly, more realistic rendering of the real-world environment where the users would actually
give a speech would be crucial. Although we have created virtual 3D environments using SketchUp and
360 videos to make them look real, our app still has some limits. One crucial addition would be to enable
user to interact with the virtual audience in HTC Vive. What makes speakers in real life get really
11
nervous is the intimidati audience. If we could program the virtual environment to have audience who
react depending on the user’s speech, volume, gestures, and etc., it would be more helpful for the users
to get a good better of what it is like to give a pitch in front of the anticipating audience.
Secondly, another powerful addition would be the more accurate integration of various feedback
features. Our current integration of Hound and Fitbit is just a start of what we could provide through the
virtual speech practice. We could implement gestures trackers through Leap Motion to observe the user’s
hand gestures and walking patterns on the stage; we could implement eye gaze trackers to analyze how
much time the user spends to engage with the each audience in the room; we could also record the user
practicing through the VR machine so that the user would see himself.
Thirdly, enabling multiple users to join the same virtual environment to practice public speaking
would be another key addition to this project. Multiple users in the same virtual environment could help
people who are not physically together to practice ahead of time with the anticipated speech
environment.
Lastly, we could potentially have a platform where people can share virtual speech environment
that they created to filmed. The more options the users have, the better for them to experience diverse
environments with dynamic audiences.
VIII. Discussion
The VR industry has made some great progress in the recent years, and we currently have many
technologies that are implementable with the VR techs in many ways. There are powerful software tools
such as Unity, Unreal Engine, and Vizard to create high quality virtual 3D environments. Furthermore,
360-video technology is available and now being used widely in various fields. Integrating real-world
videos of various conference rooms, lecture halls, auditoriums helped the users be familiar with the
actual environment much better than the 3D modeled room. Accessing these real environments through
12
virtual reality machines helped many testers to feel comfortable when they actually gave speeches in the
real environments.
We were able to integrate other helpful features that could help the users to get fruitful feedbacks
after the practice. Using Hound SDK, we were able to detect how many times the users used the filler
words such as ‘like,’ ‘um,’ and ‘ah’. Providing with the users at which point of their speech they used the
filler words helped them to locate where they were feeling more anxious and uncomfortable about the
contents. Another technology we implemented was Fitbit - analyzing the user’s heart rate aside with their
speech helped them focus which part of their speech to focus on. In addition, we could potentially use
the heart rate data to analyze how we could help the users prepare in certain speech environments.
Another key aspect of the project was the user testing and user research. We wanted to get
comprehensive feedback from people with different academic backgrounds. We also wanted to analyze
the psychological effects of this project. There were many useful feedback such as dual controller being
too burdensome as they would not be holding the controllers in a real speech environment. Nevertheless,
many subjects really liked the options of choosing the pre-filmed real world 360-video mode and the
virtual 3D model mode. Some of them said that multiple practices with the virtual environment would
actually help them reduce anxiety about public speaking. However, more accurate measures of heart rate
and other scientific measurements are needed to prove that this application would actually help reduce
public speaking anxiety.
IX. Team Member Contributions
All of our members equally contributed to the project for the entire quarter. We all worked
together to tackle major issues regarding Vizard IDE, sphere mapping of 360 video, integration of
Hound SDK / Fitbit API with the HTC Vive application, and so on. Each member still had some
particular portions of the project more dedicated than others. Jesse Min put most of his time in
sphere-mapping 360-video onto the HMD and implementing Fitbit API and Hound SDK for measuring
13
heart rate and speech analysis. JeongWoo Ha was dedicated to setting up the Sketchup environment and
rendering it smoothly onto the HTC HMD through Vizard. He put a great deal of effort into fine-tuning
visual details in the VR environment, such as coordinating and display of text script/slides. Min Kim
committed most of his time to implement motion sensing part of VR application and sending trigger to
the program with HTC controller. He also worked a great deal on streamlining the VR application with
the analysis stage (Fitbit API / Hound SDK) including the web dashboard.
14
X. Bibliography
1. Hound SDK Documentation: https://www.houndify.com/docs 2. Houndify Python Github Example: https://github.com/Mause/houndipy 3. Fitbit API Documentation: https://dev.fitbit.com/docs/ 4. Fitbit API Blog Tutorial:
https://roboticape.wordpress.com/2014/01/13/first-steps-into-the-quantified-self-getting-to-know-the-fitbit-api/ Vizard 5.0 Documentation: http://docs.worldviz.com/vizard/
5. PythonMulti-threading: https://www.toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python
6. Samsung GEAR 360 Camera How-to: http://www.samsung.com/global/galaxy/gear-360/how-to/get-start XI. Appendix
A. JSON Response of Hound SDK and Fitbit API
15
B. User Research Survey
User Test Questionnaire
Name Gender Age ● Personal Speech Experience
Q1. Do you usually feel nervous giving a speech in front of the public? Not At All 1 2 3 4 5 6 Definitely
Q2. What part of the public speaking are you most uncomfortable with?
Q3. Have you completed Stanford PWR2 course?
No Yes Q3-1. If yes, how did you practice your final presentation?
Q3-2. If yes , do you think the app might have helped your PWR2 presentations? Not At All 1 2 3 4 5 6 Definitely
Q3-3. If No , do you think this app will help your future PWR2 presentations? Not At All 1 2 3 4 5 6 Definitely
_____________________________________________________________________________
● Virtual Reality Experience
Q4. How real was the virtual conference room? Not At All 1 2 3 4 5 6 Definitely
Q5. How real was the small 360-video rendered room? Not At All 1 2 3 4 5 6 Definitely
Q6. How real was the large 360-video rendered room? Not At All 1 2 3 4 5 6 Definitely
Q7. Did the large 360-video rendered room make you more nervous than the small room?
Not At All 1 2 3 4 5 6 Definitely Q8. Would you use the 360-video rendered room or the 3D virtual conference room for practicing?
Not At All 1 2 3 4 5 6 Definitely
16
_____________________________________________________________________________
● Overall Experience
Q9. Will you use this app if this app is officially released in the future? Not At All 1 2 3 4 5 6 Definitely
Q10. What were some advantages of this app?
Q11. What were some drawbacks of this app?
Q12. Please briefly summarize and comment on overall impression after using this VR speech practicing application. (Any suggestion is also welcomed.)
Q13. What additional features would you like to see in this application?