[IEEE 2013 6th International Conference on Human System Interactions (HSI) - Sopot, Poland...

3
978-1-4673-5637-4/13/$31.00 ©2013 IEEE Abstract. In this paper, we present visual hand and arm signals recognition and rifle firing positions estimation for virtual military training systems which perform various military tasks in the virtual environments. Among them, we deal with the platoon leader training and rifle firing position training for various military situations. In the platoon leader training, the trainee whose role is a platoon leader keeps command and control over his/her members by visual signals. The proposed visual signals and rifle firing positions recognition system exploits the optical motion capture system. For the real-time performance, our recognition method localizes the head and hands in the user-centered partitions of 3D space and analyzes their temporal transitions, and finally makes a simple decision to classify the visual hand and arm signals. Experimental results show the evaluation of the proposed method. Keywords: Visual signal, recognition, military training, motion capture. I. INTRODUCTION HERE has been an increasing interest in the application of virtual reality (VR) in the field of training. The use of virtual environments may be training’s future [1]. Virtual military training is not exceptional because battle training with troops and equipment in the real world is both expensive and dangerous. Stand-alone VR system is used for training individual soldiers. The networked VR system can create a shared virtual battlefield for joint training of various armed services [2]. For team training, the intelligent virtual soldiers cohabit with the human subjects in the virtual world and collaborate with them on training scenarios. The human subjects should naturally interact with simulated agents. They can communicate through voice and gesture to complete team tasks [3]. In this paper, the human subject gives the command with visual hand and arm signals for military tasks. This paper is organized as follows. In section 2, we describe the optical motion capture system using infrared cameras and markers. In section 3, we present visual signals recognition and pose estimation using motion capture system. In section 4, we show the experimental results. Finally, we conclude the paper in section 5 with some pointers to future work. This work was supported by Republic of Korea Dual Use Program Cooperation Center (DUPC) of Agency for Defense Development (ADD) II. SYSTEM OVERVIEW In our virtual military system, the human subject with a mock-up of rifle stands in front of a big screen on which the virtual environment of field training exercise is projected. Fig. 1 illustrates the functional block diagram of our system. For the sensing of the trainee, optical tracking and virtual rifle systems are installed on the virtual military system. The virtual perception in the system analyzes the sensing data of the trainee and obtains meaningful information of his/her activities by visual signals recognition, human pose estimation, aiming point estimation and rifle fire detection. Visual signals recognition Optical tracking system Aiming point estimation Human pose estimation Virtual rifle system Rifle fire detection Interaction and Simulation for Military Training Trainee Display system Sensing Virtual perception Virtual military environment Fig. 1. Functional block diagram of our system For the sensing of the human subject, we use OptiTrack optical tracking system [4]. It consists of six IR cameras: three in front and three in back for detecting the IR markers on the body of the trainee as shown in Fig. 2 (a) and (b). We also use one extra camera for estimating the aiming position of the rifle as shown in Fig. 2 (c). (a) (b) (c) Fig. 2. Sensing in our virtual military system Virtual rifle system consists of a mock-up of rifle with an IR laser and the trigger modules. One IR camera detects the light position of IR laser on the screen and the aiming position of the rifle is estimated with the screen-camera Recognition of visual signals and firing positions for virtual military training systems Yoon Suk Kwak, Soon Ki Jung School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea [email protected] , [email protected] T 656

Transcript of [IEEE 2013 6th International Conference on Human System Interactions (HSI) - Sopot, Poland...

978-1-4673-5637-4/13/$31.00 ©2013 IEEE

Abstract. In this paper, we present visual hand and arm

signals recognition and rifle firing positions estimation for

virtual military training systems which perform various

military tasks in the virtual environments. Among them, we

deal with the platoon leader training and rifle firing position

training for various military situations. In the platoon leader

training, the trainee whose role is a platoon leader keeps

command and control over his/her members by visual signals.

The proposed visual signals and rifle firing positions

recognition system exploits the optical motion capture system.

For the real-time performance, our recognition method

localizes the head and hands in the user-centered partitions of

3D space and analyzes their temporal transitions, and finally

makes a simple decision to classify the visual hand and arm

signals. Experimental results show the evaluation of the

proposed method.

Keywords: Visual signal, recognition, military training,

motion capture.

I. INTRODUCTION

HERE has been an increasing interest in the application

of virtual reality (VR) in the field of training. The use

of virtual environments may be training’s future [1]. Virtual

military training is not exceptional because battle training

with troops and equipment in the real world is both

expensive and dangerous.

Stand-alone VR system is used for training individual

soldiers. The networked VR system can create a shared

virtual battlefield for joint training of various armed

services [2]. For team training, the intelligent virtual

soldiers cohabit with the human subjects in the virtual

world and collaborate with them on training scenarios. The

human subjects should naturally interact with simulated

agents. They can communicate through voice and gesture to

complete team tasks [3]. In this paper, the human subject

gives the command with visual hand and arm signals for

military tasks.

This paper is organized as follows. In section 2, we

describe the optical motion capture system using infrared

cameras and markers. In section 3, we present visual signals

recognition and pose estimation using motion capture

system. In section 4, we show the experimental results.

Finally, we conclude the paper in section 5 with some

pointers to future work.

This work was supported by Republic of Korea Dual Use Program

Cooperation Center (DUPC) of Agency for Defense Development (ADD)

II. SYSTEM OVERVIEW

In our virtual military system, the human subject with a

mock-up of rifle stands in front of a big screen on which the

virtual environment of field training exercise is projected.

Fig. 1 illustrates the functional block diagram of our

system. For the sensing of the trainee, optical tracking and

virtual rifle systems are installed on the virtual military

system. The virtual perception in the system analyzes the

sensing data of the trainee and obtains meaningful

information of his/her activities by visual signals

recognition, human pose estimation, aiming point

estimation and rifle fire detection.

Visual signals

recognitionOptical

tracking

system

Aiming point

estimation

Human pose

estimation

Virtual

rifle

system Rifle fire

detection

Interaction

and

Simulation

for

Military

Training

Trainee

Display system

Sensing Virtual perception Virtual military

environment Fig. 1. Functional block diagram of our system

For the sensing of the human subject, we use OptiTrack

optical tracking system [4]. It consists of six IR cameras:

three in front and three in back for detecting the IR markers

on the body of the trainee as shown in Fig. 2 (a) and (b). We

also use one extra camera for estimating the aiming position

of the rifle as shown in Fig. 2 (c).

(a) (b) (c)

Fig. 2. Sensing in our virtual military system

Virtual rifle system consists of a mock-up of rifle with an

IR laser and the trigger modules. One IR camera detects the

light position of IR laser on the screen and the aiming

position of the rifle is estimated with the screen-camera

Recognition of visual signals and firing positions for virtual military training systems

Yoon Suk Kwak, Soon Ki Jung School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea

[email protected], [email protected]

T

656

calibration information. The trigger module detects rifle

firing and sends firing event to the virtual military system

using Bluetooth.

In this paper, we focus on the visual signals recognition

to recognize the military command for platoon leader

training, and human pose estimation to distinguish rifle

shooting positions.

III. VISUAL SIGNALS RECOGNITION

A. Marker labeling

We use four sets of markers for real-time processing:

head, hand, rifle, and foot markers.

· Head marker set: a rigid marker with three markers.

· Hand marker set: a flock marker with several markers

around the right wrist.

· Rifle marker set: a rigid marker with two markers on

the rifle.

· Left/right foot marker: one marker for each foot.

(a) rigid marker (b) flock marker

Fig. 3. Rigid marker and flock marker

Here, a rigid marker consists of two or more markers

attached on a rigid body segment so that the distances

between any two markers are fixed during motion. It is

regarded as one virtual marker. The rigid markers are very

useful to distinguish from other rigid markers by the

number of markers and their 3D distances. Fig. 3 (a) shows

a rigid marker with three markers.

A flock marker consists of many markers that are close

together as shown in Fig. 3 (b). It is used for robust

detection because a part of markers can be visible until the

whole markers are occluded. The position of the flock

marker is estimated by averaging the positions of its visible

markers.

For real-time processing, we label the markers in order as

illustrated in Fig. 4. First, rigid markers are labeled to head

marker set and rifle marker set. We distinguish two rigid

markers with their number of markers.

Fig. 4. Labeling order of markers

After labeling of the rigid markers, foot markers are

labeled. Naturally, the feet are placed on the floor so the

foot markers are easily determined by their height values.

Finally, the rest of IR markers which are close together

are labeled as hand marker set.

B. Marker Tracking

Due to self-occlusion or intervention of bright objects in

the environment, the result of marker labeling process

includes some error. For overcoming this problem, we use

Kalman filter to predict 3D position of each marker. If

irrelevant 3D measurements appear, it regards as outlier.

Also, if some confidential marker data are lost, the position

of the lost marker is restored to promising position until

confidential measurement is obtained. After performing

data association for all markers, every track is filtered by

Kalman filter so we obtain smoothed tracks with noise

elimination.

C. Feature definition

Our recognition system deals with 5 rifle firing positions

and 9 different visual signals illustrated in Fig. 5. In order to

classify them, we define some features as follows.

· Height of head

· Distance between head and hand

· Position of head and hand

· Angle between head and hand

· Movement of hand

- Stop/Backward/Forward movement

· Repetition of hand

- Vertical/Horizontal repetition

· The number of visible foot markers

Fig. 5. Firing positions and visual signals

D. Feature selection

All features are not used for classifying all visual signals

for the computation efficiency. We select a specific set of

features for each signal because each feature has different

discriminating power. For example, ‘Position of head and

hand’ feature can classify many signals except ‘stop’ and

‘move slowly’ signal as shown in Fig. 6. In this case,

‘repetition of hand’ feature is additionally used. We

empirically select two to four features per each signal.

Table 1 shows feature set for 8 signals.

657

Fig. 6. Domain of hand visual signal

TABLE 1: FEATURE SELECTION

Relation between head and hand Head

Height Foot

marker

Position Distance Angle Movement

Keeping an eye

High Close

Horizontal

Disperse High H and V

Stop High Close Stop

Warping out

High Close

Vertical

Sit down Low Far High Vertical

Move slowly

Low Far Low Horizontal

Squatting Middle 2

Kneeling position

Middle 1

1st Prone position

Low 2

2nd Prone

position

Low 1

IV. EVALUATION

In this section, we describe the evaluation methods and

the recognition results by the proposed method.

A. Evaluation method

In order to determine the performance of our system we

evaluate three terms: correct recognition rates, rejection

rate, and the rate of false recognition. The definition of each

item is as follows:

· Recognition rate = c / N

· Rejection rate = r / N

· False recognition = e / N

- c: correct pieces

- r: rejected pieces

- e: false pieces

- N: processed pieces (= c + r + e ).

The test subjects consist of ten men aged from 20 to 40

years. All participants have no experience about the visual

signals. Before the evaluation test, an experienced man

demonstrated all visual signals for participants. And every

participant has two chances for performing visual signals.

B. Recognition Result

As a result of recognizing the signal is as follows. There

are 2.85% resulted in rejection rate and false acceptance

rate is 5.31% and recognition rate is 91.8% as shown in

Table 2. The recognition rate of each signal can be found in

Table 3. Most visual signals have reasonable recognition

rate, but two signals, ‘Move slowly’ and ‘Attention’, have

high false acceptance rate. The reason is why most

participants confused with other visual signals during

performing these two visual signals. This problem is

partially solved by inducing to perform correct actions. But

similar visual signals can be hardly discriminated because

we use only one hand marker for computation efficiency.

TABLE 2: TOTAL RECOGNITION RATES

Processed pieces

Correct pieces Rejection

pieces False pieces

245 225 7 13

Recognition

rate (%) Rejection rate

(%) False acceptance

rate (%)

91.8 2.9 5.3

TABLE 3: RECOGNITION RATE PER EACH SIGNAL

Processed

pieces

Recognition

rate (%) Rejection

rate (%)

False

acceptance rate (%)

Move slowly

22 68.2 13.6 18.2

Warping out

20 90.0 10.0 0.0

Approach 20 100.0 0.0 0.0

Stop 20 100.0 0.0 0.0

Sit down 20 90.0 0.0 10.0

Attention 23 60.9 8.7 30.4

Keeping an eye

20 100.0 0.0 0.0

Disperse 20 100.0 0.0 0.0

V. CONCLUSION

We proposed a simple visual hand signal recognition

method using optical trackers for virtual military training.

In our system, we use four set of markers to discriminate 14

poses. It makes the marker labeling and tracking process to

be performed in real-time. However the proposed method

has high false acceptance rate for two similar visual signals.

Future research will involve developing a robust visual

signals recognition method with more markers. The sensor

fusion with optical trackers and motion sensors is also

involved in the future research. This work is the preliminary

result before completing team-based military training

system.

REFERENCES

[1] Seidel RJ, Chatelier PR (1997) Virtual reality, training’s future? Perspectives on Virtual Reality and Related Emerging Technologies. Plenum Press, New York

[2] Karr R, Reece D, Franceschini R (1997) Synthetic soldiers. IEEE Spectrum, pp 39-45

[3] Rickel J, Johnson WL (1999) Virtual humans for team training in virtual reality. In; Proc Ninth World Conference on AI in Education, pp 578-585

[4] http://www.optitrack.com

658