A Multimodal Ouija Board for Aircraft Carrier Deck Operations
Transcript of A Multimodal Ouija Board for Aircraft Carrier Deck Operations
A Multimodal Ouija Board for Aircraft Carrier Deck Operations by
Birkan Uzun S.B., C.S. M.I.T., 2015
Submitted to the
Department of Electrical Engineering and Computer Science
in Partial Fulfillment of the Requirements for the Degree of
Master of Engineering in Computer Science and Engineering
at the
Massachusetts Institute of Technology
June 2016
Copyright 2016 Birkan Uzun. All rights reserved.
The author hereby grants to M.I.T. permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole and in part in any medium now known or
hereafter created.
Author ……………………………………………………………………………………………... Department of Electrical Engineering and Computer Science
April 6, 2016
Certified by ………………………………………………………………………………………... Randall Davis, Professor
Thesis Supervisor
Accepted by ……………………………………………………………………………………….. Dr. Christopher J. Terman
Chairman, Masters of Engineering Thesis Committee
1
2
A Multimodal Ouija Board for Aircraft Carrier Deck Operations by
Birkan Uzun
Submitted to the
Department of Electrical Engineering and Computer Science
April 6, 2016
in Partial Fulfillment of the Requirements for the Degree of
Master of Engineering in Computer Science and Engineering
Abstract
In this thesis, we present improvements to DeckAssistant, a system that provides a traditional
Ouija board interface by displaying a digital rendering of an aircraft carrier deck that assists deck
handlers in planning deck operations. DeckAssistant has a large digital tabletop display that
shows the status of the deck and has an understanding of certain deck actions for scenario
planning. To preserve the conventional way of interacting with the oldschool Ouija board where
deck handlers move aircraft by hand, the system takes advantage of multiple modes of
interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the
system. The system responds with its own speech and gestures, and it updates the display to
show the consequences of the actions taken by the handlers. The system can also be used to
simulate certain scenarios during the planning process. The multimodal interaction described
here creates a communication of sorts between deck handlers and the system. Our contributions
include improvements in handtracking, speech synthesis and speech recognition.
3
4
Acknowledgements
Foremost, I would like to thank my advisor, Professor Randall Davis, for the support of
my work, for his patience, motivation and knowledge. His door was always open whenever I had
a question about my research. He consistently allowed this research to be my own work, but
steered me in the right direction with his meaningful insights whenever he thought I needed it.
I would also like to thank Jake Barnwell for helping with the development environment
setup and documentation.
Finally, I must express my gratitude to my parents and friends who supported me
throughout my years of study. This accomplishment would never be possible without them.
5
6
Contents
1. Introduction……………………………………………………………………………..13
1.1. Overview…………………………………………………………………………13
1.2. Background and Motivation……………………………….…..….………..….....14
1.2.1. Ouija Board History and Use…………………………………………….14
1.2.2. Naval Push for Digital Information on Decks………………………....…15
1.2.3. A Multimodal Ouija Board………………………………………………16
1.3. System Demonstration………………………………………………………...…17
1.4. Thesis Outline……………………………………………………………………20
2. Deck Assistant Functionality…………………………………………………………..21
2.1. Actions in DeckAssistant……...………………………………………………....21
2.2. Deck Environment………...……………………………………………………..22
2.2.1. Deck and Space Understanding…....…………………………………….22
2.2.2. Aircraft and Destination Selection…………..…………………………...23
2.2.3. Path Calculation and Rerouting.…………………………………………23
2.3. Multimodal Interaction..…………………………………………………………24
2.3.1. Input………...……………………………………………………………24
2.3.2. Output………...………………………………………………………….24
3. System Implementation…….…………………………………………………………..28
3.1. Hardware………………….…...…………………………………………………28
3.2. Software……………….......……………………………………………………..29
3.2.1. Libraries……………………....…....…………………………………….29
7
3.2.2. Architecture……………………...…………..…………………………...30
4. Hand Tracking...……………………….…….…..………………………......…..……..32
4.1. The Leap Motion Sensor…....……………………………………………………33
4.1.1. Pointing Detection………....……………………………....…………….34
4.1.2. Gesture Detection…………………………....…………………………...35
5. Speech Synthesis and Recognition……………………………………………………..37
5.1. Speech Synthesis……….……...…………………………………………………37
5.2. Speech Recognition…..…...…………………………………………………..…38
5.2.1. Recording Sound………………......……………………………………..38
5.2.2. Choosing a Speech Recognition Library..…..…………………………...39
5.2.3. Parsing Speech Commands…....…………………………………………40
5.2.4. Speech Recognition Stack in Action……………………………………..41
6. Related Work…………………….……………………………………………………..44
6.1. Navy ADMACS.……….……...…………………………………………………44
6.2. Deck Heuristic Action Planner……....…………………………………………..44
7. Conclusion….…………………….……………………………………………………..45
7.1. Future Work…...……….……...…………………………………………………46
8. References…..…………………….……………………………………………………..47
9. Appendix…....…………………….……………………………………………………..48
9.1. Code and Documentation....…...…………………………………………………48
8
List of Figures
Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images..15
Figure 2: The ADMACS Ouija board. Source: Google Images…………………………………16
Figure 3: DeckAssistant’s tabletop display with the digital rendering of the deck [1]...........…..17
Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1]....18
Figure 5: The initial arrangement of the deck [1]..........................................................................19
Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1].......19
Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is
blocked [1].....................................................................................................................................19
Figure 8: DeckAssistant displays an alternate location for the F18 that is blocking the path
[1]...............................................………………………………………………………………....20
Figure 9: The logic for moving aircraft [1]...................................................................................22
Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images....................................23
Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered
over is highlighted green [1]..........................................................................................................25
Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1]......................................25
Figure 13: Aircraft circled in red, meaning there is not enough room in region [1].....................26
Figure 14: Alternate region to move the C2 is highlighted in blue [1]........................................27
Figure 15: The hardware used in DeckAssistant………………………………………………...28
Figure 16: DeckAssistant software architecture overview……………………………………....31
Figure 17: The Leap Motion Sensor mounted on the edge of the table top display……………..33
9
Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer
Portal……………………………………………………………………………………………..35
Figure 19: Demonstration of multiple aircraft selection with the pinch gesture………………...36
Figure 20: A summary of how the speech recognition stack works……………………………..43
10
List of Tables
Table 1: Set of commands that are recognized by DeckAssistant……………………………….41
11
List of Algorithms
Algorithm 1: Summary of the pointing detection process in pseudocode…………………….…35
12
1. Introduction
1.1. Overview
In this thesis, we present improvements to DeckAssistant, a digital aircraft carrier Ouija
Board interface that aids deck handlers with planning deck operations. DeckAssistant supports
multiple modes of interaction, aiming to improve the user experience over the traditional Ouija
Boards. Using handtracking, gesture recognition and speech recognition, it allows deck handlers
to plan deck operations by pointing at aircraft, gesturing and talking to the system. It responds
with its own speech using speech synthesis and updates the display, which is a digital rendering
of the aircraft carrier deck, to show results when deck handlers take action. The multimodal
interaction described here creates a communication of sorts between deck handlers and the
system. DeckAssistant has an understanding of deck objects and operations, and can be used to
simulate certain scenarios during the planning process.
The initial work on DeckAssistant was done by Kojo Acquah, and we build upon his
implementation [1]. Our work makes the following contributions to the fields of
HumanComputer Interaction and Intelligent User Interfaces:
It discusses how using the Leap Motion Sensor is an improvement over the Microsoft
Kinect in terms of handtracking, pointing and gesture recognition.
It presents a speech synthesis API which generates speech that has high pronunciation
quality and clarity. It investigates several speech recognition APIs, argues which one is
the most applicable, and introduces a way of enabling voiceactivated speech recognition.
13
Thanks to the refinements in handtracking and speech, it provides a natural, multimodal
way of interaction with the first largescale Ouija Board alternative that has been built to
help with planning deck operations.
1.2. Background and Motivation
1.2.1. Ouija Board History and Use
The flight deck of an aircraft carrier is a complex scene, riddled with incoming aircraft,
personnel moving around to take care of a variety of tasks and the ever present risk of hazards
and calamity. Flight Deck Control (FDC) is where the deck scene is coordinated and during
flight operations it's one of the busiest places on the ship. The deck handlers in FDC send
instructions to the aircraft directors on the flight deck who manage all aircraft movement,
placement and maintenance for the deck regions they are responsible for.
FDC is filled with computer screens and video displays of all that is occurring outside on
deck, but it is also home to one of the most crucial pieces of equipment in the Navy, the Ouija
board (Figure 1). The Ouija board is a waisthigh replica of the flight deck at 1/16 scale that has
all the markings of the flight deck, as well as its full compliment of aircraft — all in cutout
models, and all tagged with items like thumbtacks and bolts to designate their status. The board
offers an immediate glimpse of the deck status and allows the deck handlers in charge the ability
to manipulate the model deck objects and make planning decisions, should the need arise. The
board has been in use since World War II and has provided a platform of collaboration for deck
handlers in terms of strategy planning for various scenarios on deck.
It is widely understood that the first round of damage to a ship will likely take out the
electronics; so to ensure the ship remains functional in battle, everything possible has a
14
mechanical backup. Even though the traditional board has an advantage of being immune to
electronic failures, there is potential for digital Ouija board technology to enhance the
deckoperationplanning functionality and experience.
Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images.
1.2.2. Naval Push for Digital Information on Decks
Even though the Ouija board has been used to track aircraft movement on aircraft carriers
for over seventy years, the Navy is working on a computerized replacement due to limitations of
the current model. As one of the simplest systems aboard Navy ships, the Ouija boards can only
be updated manually, i.e. when the deck handlers move models of aircraft and other assets
around the model deck to match the movements of the reallife counterparts. The board does not
offer any task automation, information processing or validation to help with strategy planning for
various deck scenarios.
15
Figure 2: The ADMACS Ouija board. Source: Google Images.
The new Ouija board replacement (Figure 2) is part of the Aviation Data Management
and Control System (ADMACS) [2], a set of electronic upgrades for carriers designed to make
use of the latest technologies. This system requires the deck handler to track flight deck activity
via computer, working with a monitor that will be fed data directly from the flight deck. In
addition, the deck handler can move aircraft around on the simulated deck view using mouse and
keyboard.
1.2.3. A Multimodal Ouija Board
The ADMACS Ouija board fixes the problem of updating the deck status in realtime
without any manual work. It also allows the deck handlers to move aircraft on the simulated deck
view using mouse and keyboard as noted. However, most deck handlers are apparently skeptical
of replacing the existing system and they think that things that are not broken should not be fixed
[6]. Considering these facts, imagine a new Ouija board with a large digital tabletop display that
could show the status of the deck and had an understanding of certain deck actions for scenario
planning. To preserve the conventional way of interacting with the oldschool Ouija board where
16
deck handlers move aircraft by hand, the system would take advantage of multiple modes of
interaction. Utilizing handtracking and speech recognition techniques, the system could let deck
handlers point at objects on deck and speak their commands. In return, the system could respond
with its own synthesized speech and update the graphics to illustrate the consequences of the
commands given by the deck handlers. This would create a twoway communication between the
system and the deck handlers.
1.3. System Demonstration
To demonstrate how the multimodal Ouija Board discussed in Section 1.2.3 works in
practice and preview DeckAssistant in action, we take a look at an example scenario from [1]
where a deck handler is trying to prepare an aircraft for launch on a catapult. The deck handler
needs to move the aircrafttobelaunched to the catapult while moving other aircraft that are
blocking the way to other locations on deck.
The system has a large tabletop display showing a digital, realistic rendering of an
aircraft carrier deck with a complete set of aircraft (Figure 3).
Figure 3: DeckAssistant’s tabletop display with the digital rendering of the deck [1].
17
The deck handler stands in front of the table and issues commands using both hand
gestures and speech (Figure 4). DeckAssistant uses either the Leap Motion Sensor (mounted on
the edge of the display) or the Microsoft Kinect (mounted above the display) for handtracking.
The deck handler wears a wireless Bluetooth headset that supports a twoway conversation with
the system through speech.
Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1].
Figure 5 shows the initial aircraft arrangement of the deck. There are eleven F18s (grey
strike fighter jets) and two C2s (white cargo aircraft) placed on the deck. There are four
catapults at the front of the deck, and two of them are open. The deck handler will now try to
launch one of the C2s on one of the open catapults, and that requires moving a C2 from the
elevator, which is at the rear of the deck, to an open catapult, which is at the front of the deck.
After viewing the initial arrangement of the deck, the deck handler points at the aircraft to
be moved, the lower C2, and speaks the following command: “Move this C2 to launch on
Catapult 2”. The display shows where the deck handler is pointing at with an orange dot, and the
selected aircraft is highlighted in green (Figure 6).
18
Figure 5: The initial arrangement of the deck [1].
Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1].
Now, DeckAssistant does its analysis to figure out whether the command given by the
deck handler can be accomplished without any extra action. In this case, there is an F18
blocking the path the C2 needs to take to go to the catapult (Figure 7).
Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is blocked [1].
19
DeckAssistant knows that the F18 has to be moved out of the way. It uses graphics and
synthesized speech to let the deck handler know that additional actions are needed to be taken
and ask for the handler’s permission in the form of a yesno question (Figure 8).
Figure 8: DeckAssistant displays an alternate location for the F18 that is blocking the path [1].
The aircraft are moved in the simulation if the deck handler agrees to the actions
proposed by the system. If not, the system reverts back to the state before the command. If the
deck handler does not like the action proposed by the system, they can cancel the command and
move aircraft around based on their own strategies. The goal of DeckAssistant here is to take
care of small details while the deck handler focuses the more important deck operations without
wasting time.
1.4. Thesis Outline
In the next section, we talk about what type of actions are available in DeckAssistant and
how they are taken, what the system knows about the deck environment, and how the
multimodal interaction works. Section 3 discusses the hardware and software used as well as
introducing the software architecture behind DeckAssistant. Sections 4 and 5 look at
implementation details discussing handtracking, speech synthesis and recognition. Section 6
talks about related work. Section 7 discusses future work and concludes.
20
2. DeckAssistant Functionality
This section gives an overview of actions available in DeckAssistant, discusses what
DeckAssistant knows about the deck environment and the objects, and explains how the
multimodal interaction happens.
2.1. Actions in DeckAssistant
The initial version of DeckAssistant focuses only on simple deck actions for aircraft
movement and placement. These actions that allow deck handlers to perform tasks such as
moving an aircraft from one location to another or preparing an aircraft for launch on a catapult.
These deck actions comprise the logic to perform a command given by the deck handler (Figure
9). As the example in Section 1.3 suggests, these actions are built to be flexible and interactive.
This means that the deck handler is always consulted for their input during an action, they can
make alterations with additional commands, or they can suggest alternate actions if needed. The
system takes care of the details, saving the deck handler’s time and allowing them to concentrate
on more important tasks.
These are four actions available within DeckAssistant, as noted in [1]:
Moving aircraft from start to destination.
Finding an alternate location for aircraft to move if the intended destination is full.
Clearing a path for aircraft to move from start to end location.
Moving aircraft to launch on catapults.
21
Figure 9: The logic for moving aircraft [1].
2.2. Deck Environment
DeckAssistant has an understanding of the deck environment, which includes various
types of aircraft, regions on deck and paths between regions (See Chapter 4 of [1] for the
implementation details of the deck environment and objects).
2.2.1. Deck And Space Understanding
DeckAssistant’s user interface represents a scale model of a real deck just like a
traditional Ouija Board. The system displays the status of aircraft on this user interface and use
the same naming scheme that the deck handlers use for particular regions of the deck (Figure
10). The deck handlers can thus refer to those regions by their names when using the system.
Each of these regions contain a set of parking spots in which the aircraft can reside. These
parking spots help the system determine the arrangement of parked aircraft and figure out the
22
occupancy in a region. This means that the system knows if a region has enough room to move
aircraft to or if the path from one region to another is clear.
Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images.
2.2.2. Aircraft and Destination Selection
Each aircraft on deck is a unique object that has a tail number (displayed on each
aircraft), type, position, status and other information that is useful for the system’s simulation.
Currently, we support two different types of aircraft within DeckAssistant: F18s and C2s.
Selection of aircraft can be done two ways. The deck handler can either point at the
aircraft (single or multiple) as shown in the example in Section 1.3, or, they can refer to the
aircraft by their tail numbers, for instance, “Aircraft Number8”.
Destination selection is similar. Since destinations are regions on the deck, they can be
referred to by their names or they can be pointed at.
2.2.3. Path Calculation and Rerouting
During path planning, the system draws straight lines between regions and uses the
wingspan length as the width of the path to make sure that there are no aircraft blocking the way
and that the aircraft to move can fit into its path.
23
If a path is clear but the destination does not have enough open parking spots, the system
suggests alternate destinations and routes, checking the nearest neighboring regions for open
spots.
2.3. Multimodal Interaction
The goal of the multimodal interaction created by DeckAssistant’s user interface is to
create a communication between the deck handler and the system. The input in this interaction is
a combination of hand gestures and speech performed by the deck handler. The output is the
system’s response with synthesized speech and graphical updates.
2.3.1. Input
DeckAssistant uses either the Leap Motion Sensor or the Microsoft Kinect for tracking
hands. Handtracking allows the system to recognize certain gestures using the position of the
hands and fingertips. Currently, the system can only interpret pointing gestures where the deck
handler points at aircraft or regions on the deck.
Commands are spoken into the microphone of the wireless Bluetooth headset that the
deck handler wears, allowing the deck handler to issue a command using speech alone. In this
case, the deck handler has to provide the tail number of the aircraft to be moved as well as the
destination name. An example could be: “Move Aircraft Number8 to the Fantail”.
Alternatively, the deck handler can combine speech with one or more pointing gestures. In this
case, for example, the deck handler can point at an aircraft to be moved and say “Move this
aircraft”; and then he can point at the destination and say “over there”.
2.3.2. Output
The system is very responsive to any input. As soon as the deck handler does a pointing
24
gesture, an orange dot appears on the screen, indicating where the deck handler is
pointing at (Figure 11 (a)). If the deck handler is pointing at an aircraft, the system highlights
that aircraft with a green color, indicating a potential for selection (Figure 11 (b)). Eventually, if
the deck handler takes an action to move aircraft on deck, the selected aircraft are highlighted in
orange. As mentioned earlier, the deck handler can select multiple aircraft (Figure 12).
(a) (b)
Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered over is
highlighted green [1].
(a) (b)
Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1].
The system’s responses to the deck handler’s input depend on the type of action and the
aircraft arrangement on deck. If a certain action can be processed without additional actions, the
25
system completes it and confirms it by saying “Okay, done”. If the action cannot be completed
for any reason, the system explains why using its synthesized speech and graphical updates, and
asks for the deck handler’s permission to take an alternate action. In the case of deck handler
approval, the system updates the arrangement on deck. The deck handler declines the suggested
alternate action, the system reverts back to its previous state before the deck handler issued their
command.
Section 1.3 gave us an example of this scenario where the system warned the user of the
aircraft that was blocking the path to a catapult and it recommended an alternate spot to move the
aircraft blocking the way. When the deck handler approved, then it could move the aircraft to
launch on the catapult.
Let’s take a look at another scenario. Figure 13 shows an example of a situation where a
C2 cannot be moved to the fantail since there are no open parking spots there. The system
circles all the blocking aircraft in red, and suggests an alternate region on deck to move the C2.
In that case, the new region is highlighted in blue and a clear path to it is drawn (Figure 14). If
the deck handler accepts this suggested region, the system moves the C2 there. If not, it reverts
back to its original state and waits for new commands.
Figure 13: Aircraft circled in red, meaning there is not enough room in region [1].
26
Figure 14: Alternate region to move the C2 is highlighted in blue [1].
27
3. System Implementation
In this section, we introduce DeckAssistant’s hardware setup, the software libraries used
and the software architecture design.
3.1. Hardware
Figure 15: The hardware used in DeckAssistant.
As it can be seen in Figure 15, DeckAssistant’s hardware setup consists of:
Four downwardfacing Dell 5100MP projectors mounted over the tabletop. These
projectors create a 42 by 32 inch seamless display with a 2800 x 2100 pixel resolution.
28
A white surface digitizer. The display is projected onto this surface.
A Leap Motion Sensor or a Microsoft Kinect (V1) for tracking hands over the table
surface. The system can use either sensor.
A Logitech C920 Webcam for viewing the entire surface. This webcam is used to
calibrate the seamless display using the ScalableDesktop Classic software.
A wireless Bluetooth headset for supporting a twoway conversation with the system.
This setup is powered by a Windows 7 desktop computer with an AMD Radeon HD 6870
graphics card. It should be noted that the need for the surface digitizer, projectors and webcam
would be eliminated if the system was configured to use a flat panel for the display.
3.2. Software
All of DeckAssistant’s code is written in Java 7 in the form of a standalone application.
This application handles all the system functionality: graphics, speech recognition, speech
synthesis, and gesture recognition.
3.2.1. Libraries
Four libraries are used to provide the desired functionality:
Processing: for graphics; it is a fundamental part of our application framework.
AT&T Java Codekit: for speech recognition.
Microsoft Translator Java API: for speech synthesis.
Leap Motion Java SDK: provides the interface to the Leap Motion Controller sensor for
handtracking.
29
3.2.2. Architecture
DeckAssistant’s software architecture is structured around three stacks that handle the
multimodal input and output. These three stacks run in parallel and are responsible for speech
synthesis, speech recognition and handtracking. The Speech Synthesis Stack constructs
sentences in response to a deck handler’s command and generates an audio file for that sentence
that is played through the system’s speakers. The Speech Recognition Stack constantly listens for
commands, does speechtotext conversion and parses the text to figure out the command that
was issued. The HandTracking Stack interfaces either with the Leap Motion Sensor or the
Microsoft Kinect, processes the data received and calculates the position of the user’s pointing
finger over the display as well as detecting additional gestures. These three stacks each provide
an API (Application Program Interface) so that the other components within DeckAssistant can
communicate with them for a multimodal interaction.
Another crucial part of the architecture is the Action Manager component. The Action
Manager’s job is to manipulate the deck by communicating with the three multimodal interaction
stacks. Once a deck handler’s command is interpreted, it is passed into the Action Manager
which updates the deck state and objects based on the command and responds by leveraging the
Speech Synthesis Stack and graphics.
Finally, all of these stacks and components run on a Processing loop that executes every
30 milliseconds. Each execution of this loop makes sure the multimodal input and output are
processed. Figure 16 summarizes the software architecture. The DeckAssistant Software Guide
(see Appendix for URL) details the implementation of each component within the system.
30
Figure 16: DeckAssistant software architecture overview.
31
4. Hand Tracking
In Chapter 5 of his thesis [1], Kojo Acquah discusses methods for tracking hands and
recognizing pointing gestures using a Microsoft Kinect (V1). These initial handtracking
methods of DeckAssistant can only recognize outstretched fingers on hands that are held mostly
perpendicular to the focal plane of the camera. They do not work well with other hand poses,
leaving no way to recognize other gestures. Authors of [8] provide a detailed analysis of the
accuracy and resolution of the Kinect sensor’s depth data. Their experimental results show that
the random error in depth measurement increases with increasing distance to the sensor, ranging
from a few millimeters to approximately 4 centimeters at the maximum range of the sensor. The
quality of the data is also found to be affected by the low resolution of the depth measurements
that depend on the frame rate (30fps [7]). The authors thus suggest that the obtained accuracy, in
general, is sufficient for detecting arm and body gestures, but is not sufficient for precise finger
tracking and hand gestures. Experimenting with DeckAssistant’s initial version to take certain
actions, we note a laggy and lowaccuracy handtracking performance by the Kinect sensor. In
addition, the Kinect always has to be calibrated before DeckAssistant can be used. This is a
timeconsuming process. Finally, the current setup has a usability problem; when deck handlers
stand in front of the tabletop and point at the aircraft on the display, their hands block the
projectors’ lights causing shadows in the display.
Authors of [9] present a study of the accuracy and robustness of the Leap Motion Sensor.
They use an industrial robot with a reference pen allowing suitable position accuracy for the
experiment. Their results show high precision (an overall average accuracy of 0.7mm) in
fingertip position detection. Even though they do not achieve the accuracy of 0.01mm, as stated
32
by the manufacturer [3], they claim that the Leap Motion Sensor performs better than the
Microsoft Kinect in the same experiment.
This section describes our use of the Leap Motion Sensor, to track hands and recognize
gestures, allowing for a highdegree of subjective robustness.
4.1. The Leap Motion Sensor
The Leap Motion Sensor is a 3” long USB device that tracks hand and finger motions. It
works by projecting infrared light upward from the device and detecting reflections using
monochromatic infrared cameras. Its field of view extends from 25mm to 600mm above the
device with a 150° spread and a high frame rate (>200fps) [3]. In addition, more information
about the hands is provided by the Application Programming Interface (API) of the Leap Motion
Sensor than the Microsoft Kinect’s (V1).
Figure 17: The Leap Motion Sensor mounted on the edge of the tabletop display.
The Leap Motion Sensor is mounted on the edge of the tabletop display, as shown above
in Figure 17. In this position, hands no longer block the projector’s lights, thereby eliminating
33
the shadows in the display. The sensor also removes the need for calibration before use, enabling
DeckAssistant to run without any extra work. Finally, thanks to its accuracy in fingertracking,
the sensor creates the opportunity for more hand gestures to express detail in deck actions (see
Section 4.1.2).
4.1.1. Pointing Detection
The Leap Motion API provides us with motion tracking data as a series of frames. Each
frame contains measured positions and other information about detected entities. Since we are
interested in detecting pointing, we look at the fingers. ThePointableclass in the API reports
the physical characteristics of detected extended fingers such as tip position, direction, etc. From
these extended fingers, we choose the pointing finger as the one that is farthest toward the front
in the standard Leap Motion frame of reference. Once we have the pointing finger, we retrieve its
tip position by calling the Pointable class’ stabilizedTipPosition()method. This
method applies smoothing and stabilization on the tip position, removing the the flickering
caused by sudden hand movements and yielding a more accurate pointing detection that
improves the interaction with our 2D visual content. The stabilized tip position lags behind the
original tip position by a variable amount (not specified by the manufacturer) [3] depending on
the speed of movement.
Finally, we map the tip position from the Leap Motion coordinate system to our system’s
2D display. For this, we use the API class InteractionBox. This class represents a
cuboidshaped region contained in the Leap Motion’s field of view (Figure 18). The
InteractionBoxprovides normalized coordinates for detected entities within itself. Calling
the normalizePoint()method of this class returns the normalized 3D coordinates for the tip
34
position within the range [0...1]. Multiplying the X and Y components of these normalized
coordinates by the our system’s screen dimensions, we complete the mapping process and obtain
the 2D coordinates in our display. Algorithm 1 summarizes the pointing detection process.
Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer Portal.
Algorithm 1: Summary of the pointing detection process in pseudocode.
As discussed in Section 2.3.2, the mapped tip position is displayed on the screen as an
orange dot.
4.1.2. Gesture Detection
We implemented a new gesture for multiple aircraft selection, using a combination of
pointing and pinching. The deck handler can point with their index finger while pinching with
35
their thumb and middle finger to select multiple aircraft. We detect this gesture using the Leap
Motion API’s pinchStrength()method. If the deck handler is pinching, the value returned
by this method is 1, and 0 otherwise. However, since this value can be affected by movements of
the deck handler’s hand due to the device’s sensitivity, we apply a moving average method to
make sure that the majority of the values we receive from the method indicate pinching. In
addition, we recognize this gesture only if the user is pinching with the thumb and the middle
finger. We do this by iterating through the list of fingers in a frame and checking the distance
between their tip positions and the thumb’s tip position. The middle finger’s tip position in this
case is supposed to have the smallest distance to the thumb’s tip position. The reason for this
check is that we do not want to recognize other hand poses as a pinch gesture. For example, if
the deck handler is pointing with their index finger and the other fingers are not extended, the
system might think that the user is pinching. However, that is not the case and the check we run
along with the moving average applied pinch strength value prevents the recognition of such
cases. Figure 19 shows an example of multiple aircraft selection using the pinch gesture.
Figure 19: Demonstration of multiple aircraft selection with the pinch gesture.
36
5. Speech Synthesis and Recognition
This section details the improvements in Speech Synthesis and Speech Recognition for
DeckAssistant.
5.1. Speech Synthesis
The initial version of DeckAssistant, as discussed in [1, Section 6.1], used the FreeTTS
package for speech synthesis. Even though FreeTTS provides an easytouse API and is
compatible with many operating systems, it lacks pronunciation quality and clarity in speech. To
solve this problem, we implemented a speech synthesizer interface that acts as a front to any
speech synthesis library that we plug in. One library that works successfully with our system is
the Microsoft Translator API, a cloudbased automatic, machine translation service that supports
multiple languages. Since our application uses the English language, we do not use any of the
translation features of the service. Instead, we use it to generate a speech file from the text we
feed in.
As explained in Section 3.2.2, speech is synthesized in response to a deck handler’s
commands. Any module in the software can call the Speech Synthesis Engine of the Speech
Synthesis Stack to generate speech. Once called, the Speech Synthesis Engine feeds the text to
be spoken into the Microsoft Translator API through the interface we created. The Microsoft
Translator API then makes a request to the Microsoft Translator API which returns a WAV file
that we play through our system’s speakers. In the case of multiple speech synthesis requests, the
system queues these requests and handles them in order. Using the Microsoft Translator API
enables us to provide highquality speech synthesis with clear voices. It should be noted that the
37
future developers of DeckAssistant can incorporate any speech synthesis library into the system
with ease.
5.2. Speech Recognition
The CMU Sphinx 4 library is used for recognizing speech in the initial version of
DeckAssistant [1, Section 6.2]. Even though Sphinx provides an easy API to convert speech into
text with acoustic models and a grammar (rules for specific phrase construction) of our choice,
the speech recognition performance is poor in terms of recognition speed and accuracy. In the
experiments we ran during development, we ended up repeating ourselves several times until the
recognizer picked up what we were saying. In response, we introduced a speech recognizer
interface that provides us with the flexibility to use any speech recognition library. Other
modules in DeckAssistant can call this interface and use the recognized speech as needed.
5.2.1. Recording Sound
The user can talk to DeckAssistant at any time, without the need for extra actions such as
pushtotalk or gestures. For this reason, the system should constantly be recording using the
microphone, understanding when the user is done issuing a command, and generating a WAV
file of the spoken command. Sphinx’s Live Speech Recognizer took care of this by default.
However, since the speech recognizer library we decided to use (discussed in the next section)
did not provide any live speech recognition, we had to implement our own sound recorder that
generates WAV files with the spoken commands. For this task, we use SoX (Sound Exchange), a
crossplatform command line utility that can record audio files and process them. The SoX
command constantly runs in the background to record any sound. It stops recording once no
sound is detected after the user has started speaking. It then trims out certain noise bursts and
38
writes the recorded speech to a WAV file which is sent back to DeckAssistant. Once the speech
recognizer is done with the speechtotext operation, this background process is run again to
record new commands. For more details about SoX, please refer to the SoX Documentation [4].
5.2.2. Choosing a Speech Recognition Library
To pick the most suitable speech recognition library for our needs, we experimented with
four popular APIs:
Google Speech: It did not provide an official API. We had to send an HTTP request to
their service with a recorded WAV file to get the speechtotext response, and were
limited to 50 requests per day. Even though the responses for random sentences that we
used for testing were accurate, it did not work very well for our own grammar since the
library does not provide any grammar configuration. A simple example could be the
sentence “Move this C2”. The recognizer thought that we were saying “Move this see
too”. Since we had a lot of similar issues with other commands, we decided not to use
this library.
IBM Watson Speech API: Brand new, easytouse API. It transcribed the incoming audio
and sent it back to our system with minimal delay, and speech recognition seemed to
improve as it heard more. However, like Google Speech, it did not provide any grammar
configuration which caused inaccuracy in recognizing certain commands in our system.
Therefore, we did not use this library.
Alexa Voice Service: Amazon recently made this service available. Even though the
speech recognition works well for the purposes it was designed for, it unfortunately
cannot be used as a pure speechtotext service. Instead of returning the text spoken, the
39
service returns an audio file with a response which is not useful for us. After hacking
with the service, we managed to extract the text that was transcribed from the audio file
we sent in. However, it turns out that the Alexa Voice Service can only be used when the
user says the words “Alexa, tell DeckAssistant to…” before issuing a command. That is
not very usable for our purposes, so we choose not to work with this service.
AT&T Speech: This system allowed us to configure a vocabulary and a grammar that
made the speech recognition of our specific commands very accurate. Like the IBM
Watson Speech API, the transcription of the audio file we sent in was returned with
minimal delay. Therefore, we ended up using this library for our speech recognizer. The
one downside of this library was that we had to pay a fee to receive Premium Access for 1
the Speech API.
As explained in Section 5.2.1, recognition is performed after each spoken command
followed by a brief period of silence. Once the AT&T Speech library recognizes a phrase in our
grammar, we pass the transcribed text into our parser.
5.2.3. Parsing Speech Commands
The parser extracts metadata that represents the type of the command being issued as well
as any other relevant information. Each transcribed text that is sent to the parser is called a base
command. Out of all the base commands, only the Decision Command (Table 1) represents a
meaningful action by itself. The parser interprets the rest of the commands in two stages, which
allows for gestural input alongside speech. We call these combined commands. Let’s look at an
example where we have the command “Move this aircraft, over there”. When issuing this
1 AT&T Developer Premium Access costs $99.
40
command, the deck handler points at the aircraft to be moved and says “Move this aircraft...”,
followed by “...over there” while pointing at the destination. In the meantime, the parser sends
the metadata extracted from the text to the Action Manager, which holds the information until
two base commands can be combined into a single command for an action to be taken. In
addition, the Action Manager provides visual and auditory feedback to the deck handler during
the process. A full breakdown of speech commands are found in [1] and listed here:
Base Commands
Name Function Example(s)
Move Command Selects aircraft to be moved. “Move this C2…”
Location Command Selects destination of move. “…to the fantail.”
Launch Command Selects catapult(s) to launch aircraft on.
“…to launch on Catapult 2.”
Decision Command Responds to a question from DeckAssistant
“Yes”, “No”, “Okay”.
Combined Commands
Name Function Combination
Move to Location Command Moves aircraft to a specified destination.
Move Command + Location Command
Move Aircraft to Launch Command
Moves aircraft to launch on one or more catapults.
Move Command + Launch Command
Table 1: Set of commands that are recognized by DeckAssistant.
5.2.4. Speech Recognition Stack in Action
In Figure 20, we outline how the Speech Recognition Stack works with the Action
Manager to create deck actions. As already discussed in Section 5.2.1, the SoX process that we
run is constantly recording and waiting for commands. Figure 20 uses a command that moves an
41
aircraft to a deck region as an example. When the deck handler issues the first command, the
SoX process sends the speech recognizer a WAV file to transcribe. The transcribed text is then
sent to the speech parser which extracts the metadata. Once the speech recognizer is done
transcribing, it restarts the recording of sound through the SoX Command to listen for future
commands. Step 1 on Figure 20 shows that the metadata extracted represents a Move Command
for an aircraft that is being pointed at. The Action Manager receives this information at Step 2,
understands that it is a base command and it waits for another command to combine them into a
single command that represents a deck action. In the meantime, the Action Manager consults the
Selection Engine at Step 3 to get the information for the aircraft that is being pointed at. This
allows the Action Manager highlight the aircraft that is selected. Meanwhile, the deck handler
speaks the rest of the command, which is sent to the parser. Step 4 shows the metadata that is
assigned to the base command spoken. In this case, we have a Location Command and the name
of the deck region which is the destination. In Step 5, the Action Manager constructs the final
command with the second base command, and it fetches the destination information through the
Deck Object. Finally, a Deck Action is created (Step 7) with the information gathered from the
Speech Recognition Stack and other modules.
Implementation of Deck Actions is described in [1, Section 7].
42
Figure 20: A summary of how the speech recognition stack works.
43
6. Related Work
This section presents the work done previously that inspired the DeckAssistant project.
6.1. Navy ADMACS
As mentioned in Section 1.2.2, the Navy is moving towards a more technologically
developed and connected system called ADMACS that is a realtime data management system
connecting the carrier's air department, ship divisions and sailors who manage aircraft launch and
recovery operations.
6.2. Deck Heuristic Action Planner
Ryan et al. have developed ‘a decision support system for flight deck operations that
utilizes a conventional integer linear programbased planning algorithm’ [5]. In this system, a
human operator inputs the end goals as well as constraints, and the algorithm returns a proposed
schedule of operations for the operator’s approval. Even though their experiments showed that
human heuristics perform better than the plans produced by the algorithm, human decisions are
usually conservative and the system can offer alternate plans. This is an early attempt to aid
planning on aircraft carriers.
44
7. Conclusion
In this thesis, we introduced improvements to DeckAssistant, a system that provides a
traditional Ouija board interface by displaying a digital rendering of an aircraft carrier deck that
assists deck handlers in planning deck operations. DeckAssistant has a large digital tabletop
display that shows the status of the deck and has an understanding of certain deck actions for
scenario planning. To preserve the conventional way of interacting with the oldschool Ouija
board where deck handlers move aircraft by hand, the system takes advantage of multiple modes
of interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the
system. The system responds with its own speech and updates the display to show the
consequences of the actions taken by the handlers. The system can also be used to simulate
certain scenarios during the planning process. The multimodal interaction described here creates
a communication of sorts between deck handlers and the system.
Our work includes three improvements to the initial version of DeckAssistant built by
Kojo Acquah [1]. First is the introduction of the Leap Motion Sensor for pointing detection and
gesture recognition. We presented our subjective opinions on why the Leap Motion device
performs better than the Microsoft Kinect and we explain how we achieve pointing detection and
gesture recognition using the device. The second improvement is better speech synthesis from
our introduction of a new speech synthesis library that provides highquality pronunciation and
clarity in speech. The third improvement is better speech recognition. We discuss the use cases
of several speech recognition libraries and figure out which one is the best for our purposes. We
explain how to integrate this new library into the current system with our own methods of
recording voice.
45
7.1. Future Work
While the current version of DeckAssistant focuses only on aircraft movement based on
deck handler actions, future versions may be able to implement algorithms where the system can
simulate the most optimal ordering of operations for an end goal, while accounting for deck and
aircraft status such as maintenance needs.
Currently, DeckAssistant’s display that is created by the four downwardfacing projectors
mounted over the tabletop (discussed in Section 3.1) has a high pixel resolution. However, it is
not as seamless as it should be. The ScalableDesktop software is being used to accomplish an
automatic edgeblending of the four displays, however the regions where the projectors overlap
are still visible. Moreover, the ScalableDesktop software has to be run for calibration every time
a user tries to start DeckViewer, and the brightness of the display is low. Instead of the projectors
and the tabletop surface, a highresolution, touchscreen LED TV might be mounted flat on a
table. This would provide a seamless display free of projector overlaps and remove the need for
timeconsuming calibration. In addition, with the touchscreen feature, we can introduce drawing
gestures where the deck handler can draw out the aircraft movement as well as take notes on the
screen.
46
8. References
[1] Kojo Acquah. Towards a Multimodal Ouija Board for Aircraft Carrier Deck Operations. June 2015. [2] US Navy Air Systems Command. Navy Training System Plan for Aviation Data Management and Control System. March 2002. [3] The Leap Motion Sensor. Leap Motion for Mac and PC. November 2015. [4] SoX Documentation. http://sox.sourceforge.net/Docs/Documentation. February 2013. [5] Ryan et al. Comparing the Performance of Expert User Heuristics and an Integer Linear Program in Aircraft Carrier Deck Operations. 2013. [6] Ziezulewicz, Geoff. "Oldschool 'Ouija Board' Being Phased out on Navy Carriers." Stars and Stripes. Stars and Stripes, 10 Aug. 2011. Web. 03 Mar. 2016. [7] Microsoft. Kinect for Windows Sensor Components and Specifications. Web. 07 Mar. 2016. [8] Khoshelham, K.; Elberink, S.O. Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 2012, 12, 1437–1454. [9] Weichert, F.; Bachmann, D.; Rudak, B.; Fisseler, D. Analysis of the accuracy and robustness of the leap motion controller. Sensors 2013, 13, 6380–6393.
47
9. Appendix
9.1. Code and Documentation
The source code of DeckAssistant, documentation on how to get up and running with the
system, and the DeckAssistant Software Guide is available on GitHub:
https://github.mit.edu/MUGCSAIL/DeckViewer.
48