A Multimodal Ouija Board for Aircraft Carrier Deck Operations

A Multimodal Ouija Board for Aircraft Carrier Deck Operations by

Birkan Uzun S.B., C.S. M.I.T., 2015

Submitted to the

Department of Electrical Engineering and Computer Science

in Partial Fulfillment of the Requirements for the Degree of

Master of Engineering in Computer Science and Engineering

at the

Massachusetts Institute of Technology

June 2016

Copyright 2016 Birkan Uzun. All rights reserved.

The author hereby grants to M.I.T. permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole and in part in any medium now known or

hereafter created.

Author ……………………………………………………………………………………………... Department of Electrical Engineering and Computer Science

April 6, 2016

Certified by ………………………………………………………………………………………... Randall Davis, Professor

Thesis Supervisor

Accepted by ……………………………………………………………………………………….. Dr. Christopher J. Terman

Chairman, Masters of Engineering Thesis Committee

1

A Multimodal Ouija Board for Aircraft Carrier Deck Operations by

Birkan Uzun

Submitted to the

Department of Electrical Engineering and Computer Science

April 6, 2016

in Partial Fulfillment of the Requirements for the Degree of

Master of Engineering in Computer Science and Engineering

Abstract

In this thesis, we present improvements to DeckAssistant, a system that provides a traditional

Ouija board interface by displaying a digital rendering of an aircraft carrier deck that assists deck

handlers in planning deck operations. DeckAssistant has a large digital tabletop display that

shows the status of the deck and has an understanding of certain deck actions for scenario

planning. To preserve the conventional way of interacting with the oldschool Ouija board where

deck handlers move aircraft by hand, the system takes advantage of multiple modes of

interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the

system. The system responds with its own speech and gestures, and it updates the display to

show the consequences of the actions taken by the handlers. The system can also be used to

simulate certain scenarios during the planning process. The multimodal interaction described

here creates a communication of sorts between deck handlers and the system. Our contributions

include improvements in handtracking, speech synthesis and speech recognition.

3

Acknowledgements

Foremost, I would like to thank my advisor, Professor Randall Davis, for the support of

my work, for his patience, motivation and knowledge. His door was always open whenever I had

a question about my research. He consistently allowed this research to be my own work, but

steered me in the right direction with his meaningful insights whenever he thought I needed it.

I would also like to thank Jake Barnwell for helping with the development environment

setup and documentation.

Finally, I must express my gratitude to my parents and friends who supported me

throughout my years of study. This accomplishment would never be possible without them.

5

Contents

1. Introduction……………………………………………………………………………..13

1.1. Overview…………………………………………………………………………13

1.2. Background and Motivation……………………………….…..….………..….....14

1.2.1. Ouija Board History and Use…………………………………………….14

1.2.2. Naval Push for Digital Information on Decks………………………....…15

1.2.3. A Multimodal Ouija Board………………………………………………16

1.3. System Demonstration………………………………………………………...…17

1.4. Thesis Outline……………………………………………………………………20

2. Deck Assistant Functionality…………………………………………………………..21

2.1. Actions in DeckAssistant……...………………………………………………....21

2.2. Deck Environment………...……………………………………………………..22

2.2.1. Deck and Space Understanding…....…………………………………….22

2.2.2. Aircraft and Destination Selection…………..…………………………...23

2.2.3. Path Calculation and Rerouting.…………………………………………23

2.3. Multimodal Interaction..…………………………………………………………24

2.3.1. Input………...……………………………………………………………24

2.3.2. Output………...………………………………………………………….24

3. System Implementation…….…………………………………………………………..28

3.1. Hardware………………….…...…………………………………………………28

3.2. Software……………….......……………………………………………………..29

3.2.1. Libraries……………………....…....…………………………………….29

7

3.2.2. Architecture……………………...…………..…………………………...30

4. Hand Tracking...……………………….…….…..………………………......…..……..32

4.1. The Leap Motion Sensor…....……………………………………………………33

4.1.1. Pointing Detection………....……………………………....…………….34

4.1.2. Gesture Detection…………………………....…………………………...35

5. Speech Synthesis and Recognition……………………………………………………..37

5.1. Speech Synthesis……….……...…………………………………………………37

5.2. Speech Recognition…..…...…………………………………………………..…38

5.2.1. Recording Sound………………......……………………………………..38

5.2.2. Choosing a Speech Recognition Library..…..…………………………...39

5.2.3. Parsing Speech Commands…....…………………………………………40

5.2.4. Speech Recognition Stack in Action……………………………………..41

6. Related Work…………………….……………………………………………………..44

6.1. Navy ADMACS.……….……...…………………………………………………44

6.2. Deck Heuristic Action Planner……....…………………………………………..44

7. Conclusion….…………………….……………………………………………………..45

7.1. Future Work…...……….……...…………………………………………………46

8. References…..…………………….……………………………………………………..47

9. Appendix…....…………………….……………………………………………………..48

9.1. Code and Documentation....…...…………………………………………………48

8

List of Figures

Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images..15

Figure 2: The ADMACS Ouija board. Source: Google Images…………………………………16

Figure 3: DeckAssistant’s tabletop display with the digital rendering of the deck [1]...........…..17

Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1]....18

Figure 5: The initial arrangement of the deck [1]..........................................................................19

Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1].......19

Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is

blocked [1].....................................................................................................................................19

Figure 8: DeckAssistant displays an alternate location for the F18 that is blocking the path

[1]...............................................………………………………………………………………....20

Figure 9: The logic for moving aircraft [1]...................................................................................22

Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images....................................23

Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered

over is highlighted green [1]..........................................................................................................25

Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1]......................................25

Figure 13: Aircraft circled in red, meaning there is not enough room in region [1].....................26

Figure 14: Alternate region to move the C2 is highlighted in blue [1]........................................27

Figure 15: The hardware used in DeckAssistant………………………………………………...28

Figure 16: DeckAssistant software architecture overview……………………………………....31

Figure 17: The Leap Motion Sensor mounted on the edge of the table top display……………..33

9

Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer

Portal……………………………………………………………………………………………..35

Figure 19: Demonstration of multiple aircraft selection with the pinch gesture………………...36

Figure 20: A summary of how the speech recognition stack works……………………………..43

10

List of Tables

Table 1: Set of commands that are recognized by DeckAssistant……………………………….41

11

List of Algorithms

Algorithm 1: Summary of the pointing detection process in pseudocode…………………….…35

12

1. Introduction

1.1. Overview

In this thesis, we present improvements to DeckAssistant, a digital aircraft carrier Ouija

Board interface that aids deck handlers with planning deck operations. DeckAssistant supports

multiple modes of interaction, aiming to improve the user experience over the traditional Ouija

Boards. Using handtracking, gesture recognition and speech recognition, it allows deck handlers

to plan deck operations by pointing at aircraft, gesturing and talking to the system. It responds

with its own speech using speech synthesis and updates the display, which is a digital rendering

of the aircraft carrier deck, to show results when deck handlers take action. The multimodal

interaction described here creates a communication of sorts between deck handlers and the

system. DeckAssistant has an understanding of deck objects and operations, and can be used to

simulate certain scenarios during the planning process.

The initial work on DeckAssistant was done by Kojo Acquah, and we build upon his

implementation [1]. Our work makes the following contributions to the fields of

HumanComputer Interaction and Intelligent User Interfaces:

It discusses how using the Leap Motion Sensor is an improvement over the Microsoft

Kinect in terms of handtracking, pointing and gesture recognition.

It presents a speech synthesis API which generates speech that has high pronunciation

quality and clarity. It investigates several speech recognition APIs, argues which one is

the most applicable, and introduces a way of enabling voiceactivated speech recognition.

13

Thanks to the refinements in handtracking and speech, it provides a natural, multimodal

way of interaction with the first largescale Ouija Board alternative that has been built to

help with planning deck operations.

1.2. Background and Motivation

1.2.1. Ouija Board History and Use

The flight deck of an aircraft carrier is a complex scene, riddled with incoming aircraft,

personnel moving around to take care of a variety of tasks and the ever present risk of hazards

and calamity. Flight Deck Control (FDC) is where the deck scene is coordinated and during

flight operations it's one of the busiest places on the ship. The deck handlers in FDC send

instructions to the aircraft directors on the flight deck who manage all aircraft movement,

placement and maintenance for the deck regions they are responsible for.

FDC is filled with computer screens and video displays of all that is occurring outside on

deck, but it is also home to one of the most crucial pieces of equipment in the Navy, the Ouija

board (Figure 1). The Ouija board is a waisthigh replica of the flight deck at 1/16 scale that has

all the markings of the flight deck, as well as its full compliment of aircraft — all in cutout

models, and all tagged with items like thumbtacks and bolts to designate their status. The board

offers an immediate glimpse of the deck status and allows the deck handlers in charge the ability

to manipulate the model deck objects and make planning decisions, should the need arise. The

board has been in use since World War II and has provided a platform of collaboration for deck

handlers in terms of strategy planning for various scenarios on deck.

It is widely understood that the first round of damage to a ship will likely take out the

electronics; so to ensure the ship remains functional in battle, everything possible has a

14

mechanical backup. Even though the traditional board has an advantage of being immune to

electronic failures, there is potential for digital Ouija board technology to enhance the

deckoperationplanning functionality and experience.

Figure 1: Deck handlers collaboratively operating on an Ouija Board. Source: Google Images.

1.2.2. Naval Push for Digital Information on Decks

Even though the Ouija board has been used to track aircraft movement on aircraft carriers

for over seventy years, the Navy is working on a computerized replacement due to limitations of

the current model. As one of the simplest systems aboard Navy ships, the Ouija boards can only

be updated manually, i.e. when the deck handlers move models of aircraft and other assets

around the model deck to match the movements of the reallife counterparts. The board does not

offer any task automation, information processing or validation to help with strategy planning for

various deck scenarios.

15

Figure 2: The ADMACS Ouija board. Source: Google Images.

The new Ouija board replacement (Figure 2) is part of the Aviation Data Management

and Control System (ADMACS) [2], a set of electronic upgrades for carriers designed to make

use of the latest technologies. This system requires the deck handler to track flight deck activity

via computer, working with a monitor that will be fed data directly from the flight deck. In

addition, the deck handler can move aircraft around on the simulated deck view using mouse and

keyboard.

1.2.3. A Multimodal Ouija Board

The ADMACS Ouija board fixes the problem of updating the deck status in realtime

without any manual work. It also allows the deck handlers to move aircraft on the simulated deck

view using mouse and keyboard as noted. However, most deck handlers are apparently skeptical

of replacing the existing system and they think that things that are not broken should not be fixed

[6]. Considering these facts, imagine a new Ouija board with a large digital tabletop display that

could show the status of the deck and had an understanding of certain deck actions for scenario

planning. To preserve the conventional way of interacting with the oldschool Ouija board where

16

deck handlers move aircraft by hand, the system would take advantage of multiple modes of

interaction. Utilizing handtracking and speech recognition techniques, the system could let deck

handlers point at objects on deck and speak their commands. In return, the system could respond

with its own synthesized speech and update the graphics to illustrate the consequences of the

commands given by the deck handlers. This would create a twoway communication between the

system and the deck handlers.

1.3. System Demonstration

To demonstrate how the multimodal Ouija Board discussed in Section 1.2.3 works in

practice and preview DeckAssistant in action, we take a look at an example scenario from [1]

where a deck handler is trying to prepare an aircraft for launch on a catapult. The deck handler

needs to move the aircrafttobelaunched to the catapult while moving other aircraft that are

blocking the way to other locations on deck.

The system has a large tabletop display showing a digital, realistic rendering of an

aircraft carrier deck with a complete set of aircraft (Figure 3).

Figure 3: DeckAssistant’s tabletop display with the digital rendering of the deck [1].

17

The deck handler stands in front of the table and issues commands using both hand

gestures and speech (Figure 4). DeckAssistant uses either the Leap Motion Sensor (mounted on

the edge of the display) or the Microsoft Kinect (mounted above the display) for handtracking.

The deck handler wears a wireless Bluetooth headset that supports a twoway conversation with

the system through speech.

Figure 4: A deck handler using DeckAssistant with hand gestures and speech commands [1].

Figure 5 shows the initial aircraft arrangement of the deck. There are eleven F18s (grey

strike fighter jets) and two C2s (white cargo aircraft) placed on the deck. There are four

catapults at the front of the deck, and two of them are open. The deck handler will now try to

launch one of the C2s on one of the open catapults, and that requires moving a C2 from the

elevator, which is at the rear of the deck, to an open catapult, which is at the front of the deck.

After viewing the initial arrangement of the deck, the deck handler points at the aircraft to

be moved, the lower C2, and speaks the following command: “Move this C2 to launch on

Catapult 2”. The display shows where the deck handler is pointing at with an orange dot, and the

selected aircraft is highlighted in green (Figure 6).

18

Figure 5: The initial arrangement of the deck [1].

Figure 6: Deck handler points at the aircraft to be moved while speaking the command [1].

Now, DeckAssistant does its analysis to figure out whether the command given by the

deck handler can be accomplished without any extra action. In this case, there is an F18

blocking the path the C2 needs to take to go to the catapult (Figure 7).

Figure 7: DeckAssistant uses graphics to tell the deck handler that the path to destination is blocked [1].

19

DeckAssistant knows that the F18 has to be moved out of the way. It uses graphics and

synthesized speech to let the deck handler know that additional actions are needed to be taken

and ask for the handler’s permission in the form of a yesno question (Figure 8).

Figure 8: DeckAssistant displays an alternate location for the F18 that is blocking the path [1].

The aircraft are moved in the simulation if the deck handler agrees to the actions

proposed by the system. If not, the system reverts back to the state before the command. If the

deck handler does not like the action proposed by the system, they can cancel the command and

move aircraft around based on their own strategies. The goal of DeckAssistant here is to take

care of small details while the deck handler focuses the more important deck operations without

wasting time.

1.4. Thesis Outline

In the next section, we talk about what type of actions are available in DeckAssistant and

how they are taken, what the system knows about the deck environment, and how the

multimodal interaction works. Section 3 discusses the hardware and software used as well as

introducing the software architecture behind DeckAssistant. Sections 4 and 5 look at

implementation details discussing handtracking, speech synthesis and recognition. Section 6

talks about related work. Section 7 discusses future work and concludes.

20

2. DeckAssistant Functionality

This section gives an overview of actions available in DeckAssistant, discusses what

DeckAssistant knows about the deck environment and the objects, and explains how the

multimodal interaction happens.

2.1. Actions in DeckAssistant

The initial version of DeckAssistant focuses only on simple deck actions for aircraft

movement and placement. These actions that allow deck handlers to perform tasks such as

moving an aircraft from one location to another or preparing an aircraft for launch on a catapult.

These deck actions comprise the logic to perform a command given by the deck handler (Figure

9). As the example in Section 1.3 suggests, these actions are built to be flexible and interactive.

This means that the deck handler is always consulted for their input during an action, they can

make alterations with additional commands, or they can suggest alternate actions if needed. The

system takes care of the details, saving the deck handler’s time and allowing them to concentrate

on more important tasks.

These are four actions available within DeckAssistant, as noted in [1]:

Moving aircraft from start to destination.

Finding an alternate location for aircraft to move if the intended destination is full.

Clearing a path for aircraft to move from start to end location.

Moving aircraft to launch on catapults.

21

Figure 9: The logic for moving aircraft [1].

2.2. Deck Environment

DeckAssistant has an understanding of the deck environment, which includes various

types of aircraft, regions on deck and paths between regions (See Chapter 4 of [1] for the

implementation details of the deck environment and objects).

2.2.1. Deck And Space Understanding

DeckAssistant’s user interface represents a scale model of a real deck just like a

traditional Ouija Board. The system displays the status of aircraft on this user interface and use

the same naming scheme that the deck handlers use for particular regions of the deck (Figure

10). The deck handlers can thus refer to those regions by their names when using the system.

Each of these regions contain a set of parking spots in which the aircraft can reside. These

parking spots help the system determine the arrangement of parked aircraft and figure out the

22

occupancy in a region. This means that the system knows if a region has enough room to move

aircraft to or if the path from one region to another is clear.

Figure 10: Regions on an aircraft carrier’s deck. Source: Google Images.

2.2.2. Aircraft and Destination Selection

Each aircraft on deck is a unique object that has a tail number (displayed on each

aircraft), type, position, status and other information that is useful for the system’s simulation.

Currently, we support two different types of aircraft within DeckAssistant: F18s and C2s.

Selection of aircraft can be done two ways. The deck handler can either point at the

aircraft (single or multiple) as shown in the example in Section 1.3, or, they can refer to the

aircraft by their tail numbers, for instance, “Aircraft Number8”.

Destination selection is similar. Since destinations are regions on the deck, they can be

referred to by their names or they can be pointed at.

2.2.3. Path Calculation and Rerouting

During path planning, the system draws straight lines between regions and uses the

wingspan length as the width of the path to make sure that there are no aircraft blocking the way

and that the aircraft to move can fit into its path.

23

If a path is clear but the destination does not have enough open parking spots, the system

suggests alternate destinations and routes, checking the nearest neighboring regions for open

spots.

2.3. Multimodal Interaction

The goal of the multimodal interaction created by DeckAssistant’s user interface is to

create a communication between the deck handler and the system. The input in this interaction is

a combination of hand gestures and speech performed by the deck handler. The output is the

system’s response with synthesized speech and graphical updates.

2.3.1. Input

DeckAssistant uses either the Leap Motion Sensor or the Microsoft Kinect for tracking

hands. Handtracking allows the system to recognize certain gestures using the position of the

hands and fingertips. Currently, the system can only interpret pointing gestures where the deck

handler points at aircraft or regions on the deck.

Commands are spoken into the microphone of the wireless Bluetooth headset that the

deck handler wears, allowing the deck handler to issue a command using speech alone. In this

case, the deck handler has to provide the tail number of the aircraft to be moved as well as the

destination name. An example could be: “Move Aircraft Number8 to the Fantail”.

Alternatively, the deck handler can combine speech with one or more pointing gestures. In this

case, for example, the deck handler can point at an aircraft to be moved and say “Move this

aircraft”; and then he can point at the destination and say “over there”.

2.3.2. Output

The system is very responsive to any input. As soon as the deck handler does a pointing

24

gesture, an orange dot appears on the screen, indicating where the deck handler is

pointing at (Figure 11 (a)). If the deck handler is pointing at an aircraft, the system highlights

that aircraft with a green color, indicating a potential for selection (Figure 11 (b)). Eventually, if

the deck handler takes an action to move aircraft on deck, the selected aircraft are highlighted in

orange. As mentioned earlier, the deck handler can select multiple aircraft (Figure 12).

(a) (b)

Figure 11: (a) Orange dot represents where the user is pointing at. (b) Aircraft being hovered over is

highlighted green [1].

(a) (b)

Figure 12: (a) Single aircraft selected. (b) Multiple aircraft selected [1].

The system’s responses to the deck handler’s input depend on the type of action and the

aircraft arrangement on deck. If a certain action can be processed without additional actions, the

25

system completes it and confirms it by saying “Okay, done”. If the action cannot be completed

for any reason, the system explains why using its synthesized speech and graphical updates, and

asks for the deck handler’s permission to take an alternate action. In the case of deck handler

approval, the system updates the arrangement on deck. The deck handler declines the suggested

alternate action, the system reverts back to its previous state before the deck handler issued their

command.

Section 1.3 gave us an example of this scenario where the system warned the user of the

aircraft that was blocking the path to a catapult and it recommended an alternate spot to move the

aircraft blocking the way. When the deck handler approved, then it could move the aircraft to

launch on the catapult.

Let’s take a look at another scenario. Figure 13 shows an example of a situation where a

C2 cannot be moved to the fantail since there are no open parking spots there. The system

circles all the blocking aircraft in red, and suggests an alternate region on deck to move the C2.

In that case, the new region is highlighted in blue and a clear path to it is drawn (Figure 14). If

the deck handler accepts this suggested region, the system moves the C2 there. If not, it reverts

back to its original state and waits for new commands.

Figure 13: Aircraft circled in red, meaning there is not enough room in region [1].

26

Figure 14: Alternate region to move the C2 is highlighted in blue [1].

27

3. System Implementation

In this section, we introduce DeckAssistant’s hardware setup, the software libraries used

and the software architecture design.

3.1. Hardware

Figure 15: The hardware used in DeckAssistant.

As it can be seen in Figure 15, DeckAssistant’s hardware setup consists of:

Four downwardfacing Dell 5100MP projectors mounted over the tabletop. These

projectors create a 42 by 32 inch seamless display with a 2800 x 2100 pixel resolution.

28

A white surface digitizer. The display is projected onto this surface.

A Leap Motion Sensor or a Microsoft Kinect (V1) for tracking hands over the table

surface. The system can use either sensor.

A Logitech C920 Webcam for viewing the entire surface. This webcam is used to

calibrate the seamless display using the ScalableDesktop Classic software.

A wireless Bluetooth headset for supporting a twoway conversation with the system.

This setup is powered by a Windows 7 desktop computer with an AMD Radeon HD 6870

graphics card. It should be noted that the need for the surface digitizer, projectors and webcam

would be eliminated if the system was configured to use a flat panel for the display.

3.2. Software

All of DeckAssistant’s code is written in Java 7 in the form of a standalone application.

This application handles all the system functionality: graphics, speech recognition, speech

synthesis, and gesture recognition.

3.2.1. Libraries

Four libraries are used to provide the desired functionality:

Processing: for graphics; it is a fundamental part of our application framework.

AT&T Java Codekit: for speech recognition.

Microsoft Translator Java API: for speech synthesis.

Leap Motion Java SDK: provides the interface to the Leap Motion Controller sensor for

handtracking.

29

3.2.2. Architecture

DeckAssistant’s software architecture is structured around three stacks that handle the

multimodal input and output. These three stacks run in parallel and are responsible for speech

synthesis, speech recognition and handtracking. The Speech Synthesis Stack constructs

sentences in response to a deck handler’s command and generates an audio file for that sentence

that is played through the system’s speakers. The Speech Recognition Stack constantly listens for

commands, does speechtotext conversion and parses the text to figure out the command that

was issued. The HandTracking Stack interfaces either with the Leap Motion Sensor or the

Microsoft Kinect, processes the data received and calculates the position of the user’s pointing

finger over the display as well as detecting additional gestures. These three stacks each provide

an API (Application Program Interface) so that the other components within DeckAssistant can

communicate with them for a multimodal interaction.

Another crucial part of the architecture is the Action Manager component. The Action

Manager’s job is to manipulate the deck by communicating with the three multimodal interaction

stacks. Once a deck handler’s command is interpreted, it is passed into the Action Manager

which updates the deck state and objects based on the command and responds by leveraging the

Speech Synthesis Stack and graphics.

Finally, all of these stacks and components run on a Processing loop that executes every

30 milliseconds. Each execution of this loop makes sure the multimodal input and output are

processed. Figure 16 summarizes the software architecture. The DeckAssistant Software Guide

(see Appendix for URL) details the implementation of each component within the system.

30

Figure 16: DeckAssistant software architecture overview.

31

4. Hand Tracking

In Chapter 5 of his thesis [1], Kojo Acquah discusses methods for tracking hands and

recognizing pointing gestures using a Microsoft Kinect (V1). These initial handtracking

methods of DeckAssistant can only recognize outstretched fingers on hands that are held mostly

perpendicular to the focal plane of the camera. They do not work well with other hand poses,

leaving no way to recognize other gestures. Authors of [8] provide a detailed analysis of the

accuracy and resolution of the Kinect sensor’s depth data. Their experimental results show that

the random error in depth measurement increases with increasing distance to the sensor, ranging

from a few millimeters to approximately 4 centimeters at the maximum range of the sensor. The

quality of the data is also found to be affected by the low resolution of the depth measurements

that depend on the frame rate (30fps [7]). The authors thus suggest that the obtained accuracy, in

general, is sufficient for detecting arm and body gestures, but is not sufficient for precise finger

tracking and hand gestures. Experimenting with DeckAssistant’s initial version to take certain

actions, we note a laggy and lowaccuracy handtracking performance by the Kinect sensor. In

addition, the Kinect always has to be calibrated before DeckAssistant can be used. This is a

timeconsuming process. Finally, the current setup has a usability problem; when deck handlers

stand in front of the tabletop and point at the aircraft on the display, their hands block the

projectors’ lights causing shadows in the display.

Authors of [9] present a study of the accuracy and robustness of the Leap Motion Sensor.

They use an industrial robot with a reference pen allowing suitable position accuracy for the

experiment. Their results show high precision (an overall average accuracy of 0.7mm) in

fingertip position detection. Even though they do not achieve the accuracy of 0.01mm, as stated

32

by the manufacturer [3], they claim that the Leap Motion Sensor performs better than the

Microsoft Kinect in the same experiment.

This section describes our use of the Leap Motion Sensor, to track hands and recognize

gestures, allowing for a highdegree of subjective robustness.

4.1. The Leap Motion Sensor

The Leap Motion Sensor is a 3” long USB device that tracks hand and finger motions. It

works by projecting infrared light upward from the device and detecting reflections using

monochromatic infrared cameras. Its field of view extends from 25mm to 600mm above the

device with a 150° spread and a high frame rate (>200fps) [3]. In addition, more information

about the hands is provided by the Application Programming Interface (API) of the Leap Motion

Sensor than the Microsoft Kinect’s (V1).

Figure 17: The Leap Motion Sensor mounted on the edge of the tabletop display.

The Leap Motion Sensor is mounted on the edge of the tabletop display, as shown above

in Figure 17. In this position, hands no longer block the projector’s lights, thereby eliminating

33

the shadows in the display. The sensor also removes the need for calibration before use, enabling

DeckAssistant to run without any extra work. Finally, thanks to its accuracy in fingertracking,

the sensor creates the opportunity for more hand gestures to express detail in deck actions (see

Section 4.1.2).

4.1.1. Pointing Detection

The Leap Motion API provides us with motion tracking data as a series of frames. Each

frame contains measured positions and other information about detected entities. Since we are

interested in detecting pointing, we look at the fingers. ThePointableclass in the API reports

the physical characteristics of detected extended fingers such as tip position, direction, etc. From

these extended fingers, we choose the pointing finger as the one that is farthest toward the front

in the standard Leap Motion frame of reference. Once we have the pointing finger, we retrieve its

tip position by calling the Pointable class’ stabilizedTipPosition()method. This

method applies smoothing and stabilization on the tip position, removing the the flickering

caused by sudden hand movements and yielding a more accurate pointing detection that

improves the interaction with our 2D visual content. The stabilized tip position lags behind the

original tip position by a variable amount (not specified by the manufacturer) [3] depending on

the speed of movement.

Finally, we map the tip position from the Leap Motion coordinate system to our system’s

2D display. For this, we use the API class InteractionBox. This class represents a

cuboidshaped region contained in the Leap Motion’s field of view (Figure 18). The

InteractionBoxprovides normalized coordinates for detected entities within itself. Calling

the normalizePoint()method of this class returns the normalized 3D coordinates for the tip

34

position within the range [0...1]. Multiplying the X and Y components of these normalized

coordinates by the our system’s screen dimensions, we complete the mapping process and obtain

the 2D coordinates in our display. Algorithm 1 summarizes the pointing detection process.

Figure 18: Leap Motion’s InteractionBox, colored in red. Source: Leap Motion Developer Portal.

Algorithm 1: Summary of the pointing detection process in pseudocode.

As discussed in Section 2.3.2, the mapped tip position is displayed on the screen as an

orange dot.

4.1.2. Gesture Detection

We implemented a new gesture for multiple aircraft selection, using a combination of

pointing and pinching. The deck handler can point with their index finger while pinching with

35

their thumb and middle finger to select multiple aircraft. We detect this gesture using the Leap

Motion API’s pinchStrength()method. If the deck handler is pinching, the value returned

by this method is 1, and 0 otherwise. However, since this value can be affected by movements of

the deck handler’s hand due to the device’s sensitivity, we apply a moving average method to

make sure that the majority of the values we receive from the method indicate pinching. In

addition, we recognize this gesture only if the user is pinching with the thumb and the middle

finger. We do this by iterating through the list of fingers in a frame and checking the distance

between their tip positions and the thumb’s tip position. The middle finger’s tip position in this

case is supposed to have the smallest distance to the thumb’s tip position. The reason for this

check is that we do not want to recognize other hand poses as a pinch gesture. For example, if

the deck handler is pointing with their index finger and the other fingers are not extended, the

system might think that the user is pinching. However, that is not the case and the check we run

along with the moving average applied pinch strength value prevents the recognition of such

cases. Figure 19 shows an example of multiple aircraft selection using the pinch gesture.

Figure 19: Demonstration of multiple aircraft selection with the pinch gesture.

36

5. Speech Synthesis and Recognition

This section details the improvements in Speech Synthesis and Speech Recognition for

DeckAssistant.

5.1. Speech Synthesis

The initial version of DeckAssistant, as discussed in [1, Section 6.1], used the FreeTTS

package for speech synthesis. Even though FreeTTS provides an easytouse API and is

compatible with many operating systems, it lacks pronunciation quality and clarity in speech. To

solve this problem, we implemented a speech synthesizer interface that acts as a front to any

speech synthesis library that we plug in. One library that works successfully with our system is

the Microsoft Translator API, a cloudbased automatic, machine translation service that supports

multiple languages. Since our application uses the English language, we do not use any of the

translation features of the service. Instead, we use it to generate a speech file from the text we

feed in.

As explained in Section 3.2.2, speech is synthesized in response to a deck handler’s

commands. Any module in the software can call the Speech Synthesis Engine of the Speech

Synthesis Stack to generate speech. Once called, the Speech Synthesis Engine feeds the text to

be spoken into the Microsoft Translator API through the interface we created. The Microsoft

Translator API then makes a request to the Microsoft Translator API which returns a WAV file

that we play through our system’s speakers. In the case of multiple speech synthesis requests, the

system queues these requests and handles them in order. Using the Microsoft Translator API

enables us to provide highquality speech synthesis with clear voices. It should be noted that the

37

future developers of DeckAssistant can incorporate any speech synthesis library into the system

with ease.

5.2. Speech Recognition

The CMU Sphinx 4 library is used for recognizing speech in the initial version of

DeckAssistant [1, Section 6.2]. Even though Sphinx provides an easy API to convert speech into

text with acoustic models and a grammar (rules for specific phrase construction) of our choice,

the speech recognition performance is poor in terms of recognition speed and accuracy. In the

experiments we ran during development, we ended up repeating ourselves several times until the

recognizer picked up what we were saying. In response, we introduced a speech recognizer

interface that provides us with the flexibility to use any speech recognition library. Other

modules in DeckAssistant can call this interface and use the recognized speech as needed.

5.2.1. Recording Sound

The user can talk to DeckAssistant at any time, without the need for extra actions such as

pushtotalk or gestures. For this reason, the system should constantly be recording using the

microphone, understanding when the user is done issuing a command, and generating a WAV

file of the spoken command. Sphinx’s Live Speech Recognizer took care of this by default.

However, since the speech recognizer library we decided to use (discussed in the next section)

did not provide any live speech recognition, we had to implement our own sound recorder that

generates WAV files with the spoken commands. For this task, we use SoX (Sound Exchange), a

crossplatform command line utility that can record audio files and process them. The SoX

command constantly runs in the background to record any sound. It stops recording once no

sound is detected after the user has started speaking. It then trims out certain noise bursts and

38

writes the recorded speech to a WAV file which is sent back to DeckAssistant. Once the speech

recognizer is done with the speechtotext operation, this background process is run again to

record new commands. For more details about SoX, please refer to the SoX Documentation [4].

5.2.2. Choosing a Speech Recognition Library

To pick the most suitable speech recognition library for our needs, we experimented with

four popular APIs:

Google Speech: It did not provide an official API. We had to send an HTTP request to

their service with a recorded WAV file to get the speechtotext response, and were

limited to 50 requests per day. Even though the responses for random sentences that we

used for testing were accurate, it did not work very well for our own grammar since the

library does not provide any grammar configuration. A simple example could be the

sentence “Move this C2”. The recognizer thought that we were saying “Move this see

too”. Since we had a lot of similar issues with other commands, we decided not to use

this library.

IBM Watson Speech API: Brand new, easytouse API. It transcribed the incoming audio

and sent it back to our system with minimal delay, and speech recognition seemed to

improve as it heard more. However, like Google Speech, it did not provide any grammar

configuration which caused inaccuracy in recognizing certain commands in our system.

Therefore, we did not use this library.

Alexa Voice Service: Amazon recently made this service available. Even though the

speech recognition works well for the purposes it was designed for, it unfortunately

cannot be used as a pure speechtotext service. Instead of returning the text spoken, the

39

service returns an audio file with a response which is not useful for us. After hacking

with the service, we managed to extract the text that was transcribed from the audio file

we sent in. However, it turns out that the Alexa Voice Service can only be used when the

user says the words “Alexa, tell DeckAssistant to…” before issuing a command. That is

not very usable for our purposes, so we choose not to work with this service.

AT&T Speech: This system allowed us to configure a vocabulary and a grammar that

made the speech recognition of our specific commands very accurate. Like the IBM

Watson Speech API, the transcription of the audio file we sent in was returned with

minimal delay. Therefore, we ended up using this library for our speech recognizer. The

one downside of this library was that we had to pay a fee to receive Premium Access for 1

the Speech API.

As explained in Section 5.2.1, recognition is performed after each spoken command

followed by a brief period of silence. Once the AT&T Speech library recognizes a phrase in our

grammar, we pass the transcribed text into our parser.

5.2.3. Parsing Speech Commands

The parser extracts metadata that represents the type of the command being issued as well

as any other relevant information. Each transcribed text that is sent to the parser is called a base

command. Out of all the base commands, only the Decision Command (Table 1) represents a

meaningful action by itself. The parser interprets the rest of the commands in two stages, which

allows for gestural input alongside speech. We call these combined commands. Let’s look at an

example where we have the command “Move this aircraft, over there”. When issuing this

1 AT&T Developer Premium Access costs $99.

40

command, the deck handler points at the aircraft to be moved and says “Move this aircraft...”,

followed by “...over there” while pointing at the destination. In the meantime, the parser sends

the metadata extracted from the text to the Action Manager, which holds the information until

two base commands can be combined into a single command for an action to be taken. In

addition, the Action Manager provides visual and auditory feedback to the deck handler during

the process. A full breakdown of speech commands are found in [1] and listed here:

Base Commands

Name Function Example(s)

Move Command Selects aircraft to be moved. “Move this C2…”

Location Command Selects destination of move. “…to the fantail.”

Launch Command Selects catapult(s) to launch aircraft on.

“…to launch on Catapult 2.”

Decision Command Responds to a question from DeckAssistant

“Yes”, “No”, “Okay”.

Combined Commands

Name Function Combination

Move to Location Command Moves aircraft to a specified destination.

Move Command + Location Command

Move Aircraft to Launch Command

Moves aircraft to launch on one or more catapults.

Move Command + Launch Command

Table 1: Set of commands that are recognized by DeckAssistant.

5.2.4. Speech Recognition Stack in Action

In Figure 20, we outline how the Speech Recognition Stack works with the Action

Manager to create deck actions. As already discussed in Section 5.2.1, the SoX process that we

run is constantly recording and waiting for commands. Figure 20 uses a command that moves an

41

aircraft to a deck region as an example. When the deck handler issues the first command, the

SoX process sends the speech recognizer a WAV file to transcribe. The transcribed text is then

sent to the speech parser which extracts the metadata. Once the speech recognizer is done

transcribing, it restarts the recording of sound through the SoX Command to listen for future

commands. Step 1 on Figure 20 shows that the metadata extracted represents a Move Command

for an aircraft that is being pointed at. The Action Manager receives this information at Step 2,

understands that it is a base command and it waits for another command to combine them into a

single command that represents a deck action. In the meantime, the Action Manager consults the

Selection Engine at Step 3 to get the information for the aircraft that is being pointed at. This

allows the Action Manager highlight the aircraft that is selected. Meanwhile, the deck handler

speaks the rest of the command, which is sent to the parser. Step 4 shows the metadata that is

assigned to the base command spoken. In this case, we have a Location Command and the name

of the deck region which is the destination. In Step 5, the Action Manager constructs the final

command with the second base command, and it fetches the destination information through the

Deck Object. Finally, a Deck Action is created (Step 7) with the information gathered from the

Speech Recognition Stack and other modules.

Implementation of Deck Actions is described in [1, Section 7].

42

Figure 20: A summary of how the speech recognition stack works.

43

6. Related Work

This section presents the work done previously that inspired the DeckAssistant project.

6.1. Navy ADMACS

As mentioned in Section 1.2.2, the Navy is moving towards a more technologically

developed and connected system called ADMACS that is a realtime data management system

connecting the carrier's air department, ship divisions and sailors who manage aircraft launch and

recovery operations.

6.2. Deck Heuristic Action Planner

Ryan et al. have developed ‘a decision support system for flight deck operations that

utilizes a conventional integer linear programbased planning algorithm’ [5]. In this system, a

human operator inputs the end goals as well as constraints, and the algorithm returns a proposed

schedule of operations for the operator’s approval. Even though their experiments showed that

human heuristics perform better than the plans produced by the algorithm, human decisions are

usually conservative and the system can offer alternate plans. This is an early attempt to aid

planning on aircraft carriers.

44

7. Conclusion

In this thesis, we introduced improvements to DeckAssistant, a system that provides a

traditional Ouija board interface by displaying a digital rendering of an aircraft carrier deck that

assists deck handlers in planning deck operations. DeckAssistant has a large digital tabletop

display that shows the status of the deck and has an understanding of certain deck actions for

scenario planning. To preserve the conventional way of interacting with the oldschool Ouija

board where deck handlers move aircraft by hand, the system takes advantage of multiple modes

of interaction. Deck handlers plan strategies by pointing at aircraft, gesturing and talking to the

system. The system responds with its own speech and updates the display to show the

consequences of the actions taken by the handlers. The system can also be used to simulate

certain scenarios during the planning process. The multimodal interaction described here creates

a communication of sorts between deck handlers and the system.

Our work includes three improvements to the initial version of DeckAssistant built by

Kojo Acquah [1]. First is the introduction of the Leap Motion Sensor for pointing detection and

gesture recognition. We presented our subjective opinions on why the Leap Motion device

performs better than the Microsoft Kinect and we explain how we achieve pointing detection and

gesture recognition using the device. The second improvement is better speech synthesis from

our introduction of a new speech synthesis library that provides highquality pronunciation and

clarity in speech. The third improvement is better speech recognition. We discuss the use cases

of several speech recognition libraries and figure out which one is the best for our purposes. We

explain how to integrate this new library into the current system with our own methods of

recording voice.

45

7.1. Future Work

While the current version of DeckAssistant focuses only on aircraft movement based on

deck handler actions, future versions may be able to implement algorithms where the system can

simulate the most optimal ordering of operations for an end goal, while accounting for deck and

aircraft status such as maintenance needs.

Currently, DeckAssistant’s display that is created by the four downwardfacing projectors

mounted over the tabletop (discussed in Section 3.1) has a high pixel resolution. However, it is

not as seamless as it should be. The ScalableDesktop software is being used to accomplish an

automatic edgeblending of the four displays, however the regions where the projectors overlap

are still visible. Moreover, the ScalableDesktop software has to be run for calibration every time

a user tries to start DeckViewer, and the brightness of the display is low. Instead of the projectors

and the tabletop surface, a highresolution, touchscreen LED TV might be mounted flat on a

table. This would provide a seamless display free of projector overlaps and remove the need for

timeconsuming calibration. In addition, with the touchscreen feature, we can introduce drawing

gestures where the deck handler can draw out the aircraft movement as well as take notes on the

screen.

46

8. References

[1] Kojo Acquah. Towards a Multimodal Ouija Board for Aircraft Carrier Deck Operations. June 2015. [2] US Navy Air Systems Command. Navy Training System Plan for Aviation Data Management and Control System. March 2002. [3] The Leap Motion Sensor. Leap Motion for Mac and PC. November 2015. [4] SoX Documentation. http://sox.sourceforge.net/Docs/Documentation. February 2013. [5] Ryan et al. Comparing the Performance of Expert User Heuristics and an Integer Linear Program in Aircraft Carrier Deck Operations. 2013. [6] Ziezulewicz, Geoff. "Oldschool 'Ouija Board' Being Phased out on Navy Carriers." Stars and Stripes. Stars and Stripes, 10 Aug. 2011. Web. 03 Mar. 2016. [7] Microsoft. Kinect for Windows Sensor Components and Specifications. Web. 07 Mar. 2016. [8] Khoshelham, K.; Elberink, S.O. Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 2012, 12, 1437–1454. [9] Weichert, F.; Bachmann, D.; Rudak, B.; Fisseler, D. Analysis of the accuracy and robustness of the leap motion controller. Sensors 2013, 13, 6380–6393.

47

9. Appendix

9.1. Code and Documentation

The source code of DeckAssistant, documentation on how to get up and running with the

system, and the DeckAssistant Software Guide is available on GitHub:

https://github.mit.edu/MUGCSAIL/DeckViewer.

48

https://github.mit.edu/MUG-CSAIL/DeckViewer

A Multimodal Ouija Board for Aircraft Carrier Deck Operations

Documents

Transcript of A Multimodal Ouija Board for Aircraft Carrier Deck Operations