[IEEE 2013 6th International Conference on Human System Interactions (HSI) - Sopot, Poland...
Transcript of [IEEE 2013 6th International Conference on Human System Interactions (HSI) - Sopot, Poland...
978-1-4673-5637-4/13/$31.00 ©2013 IEEE
Abstract. In this paper, we present visual hand and arm
signals recognition and rifle firing positions estimation for
virtual military training systems which perform various
military tasks in the virtual environments. Among them, we
deal with the platoon leader training and rifle firing position
training for various military situations. In the platoon leader
training, the trainee whose role is a platoon leader keeps
command and control over his/her members by visual signals.
The proposed visual signals and rifle firing positions
recognition system exploits the optical motion capture system.
For the real-time performance, our recognition method
localizes the head and hands in the user-centered partitions of
3D space and analyzes their temporal transitions, and finally
makes a simple decision to classify the visual hand and arm
signals. Experimental results show the evaluation of the
proposed method.
Keywords: Visual signal, recognition, military training,
motion capture.
I. INTRODUCTION
HERE has been an increasing interest in the application
of virtual reality (VR) in the field of training. The use
of virtual environments may be training’s future [1]. Virtual
military training is not exceptional because battle training
with troops and equipment in the real world is both
expensive and dangerous.
Stand-alone VR system is used for training individual
soldiers. The networked VR system can create a shared
virtual battlefield for joint training of various armed
services [2]. For team training, the intelligent virtual
soldiers cohabit with the human subjects in the virtual
world and collaborate with them on training scenarios. The
human subjects should naturally interact with simulated
agents. They can communicate through voice and gesture to
complete team tasks [3]. In this paper, the human subject
gives the command with visual hand and arm signals for
military tasks.
This paper is organized as follows. In section 2, we
describe the optical motion capture system using infrared
cameras and markers. In section 3, we present visual signals
recognition and pose estimation using motion capture
system. In section 4, we show the experimental results.
Finally, we conclude the paper in section 5 with some
pointers to future work.
This work was supported by Republic of Korea Dual Use Program
Cooperation Center (DUPC) of Agency for Defense Development (ADD)
II. SYSTEM OVERVIEW
In our virtual military system, the human subject with a
mock-up of rifle stands in front of a big screen on which the
virtual environment of field training exercise is projected.
Fig. 1 illustrates the functional block diagram of our
system. For the sensing of the trainee, optical tracking and
virtual rifle systems are installed on the virtual military
system. The virtual perception in the system analyzes the
sensing data of the trainee and obtains meaningful
information of his/her activities by visual signals
recognition, human pose estimation, aiming point
estimation and rifle fire detection.
Visual signals
recognitionOptical
tracking
system
Aiming point
estimation
Human pose
estimation
Virtual
rifle
system Rifle fire
detection
Interaction
and
Simulation
for
Military
Training
Trainee
Display system
Sensing Virtual perception Virtual military
environment Fig. 1. Functional block diagram of our system
For the sensing of the human subject, we use OptiTrack
optical tracking system [4]. It consists of six IR cameras:
three in front and three in back for detecting the IR markers
on the body of the trainee as shown in Fig. 2 (a) and (b). We
also use one extra camera for estimating the aiming position
of the rifle as shown in Fig. 2 (c).
(a) (b) (c)
Fig. 2. Sensing in our virtual military system
Virtual rifle system consists of a mock-up of rifle with an
IR laser and the trigger modules. One IR camera detects the
light position of IR laser on the screen and the aiming
position of the rifle is estimated with the screen-camera
Recognition of visual signals and firing positions for virtual military training systems
Yoon Suk Kwak, Soon Ki Jung School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea
[email protected], [email protected]
T
656
calibration information. The trigger module detects rifle
firing and sends firing event to the virtual military system
using Bluetooth.
In this paper, we focus on the visual signals recognition
to recognize the military command for platoon leader
training, and human pose estimation to distinguish rifle
shooting positions.
III. VISUAL SIGNALS RECOGNITION
A. Marker labeling
We use four sets of markers for real-time processing:
head, hand, rifle, and foot markers.
· Head marker set: a rigid marker with three markers.
· Hand marker set: a flock marker with several markers
around the right wrist.
· Rifle marker set: a rigid marker with two markers on
the rifle.
· Left/right foot marker: one marker for each foot.
(a) rigid marker (b) flock marker
Fig. 3. Rigid marker and flock marker
Here, a rigid marker consists of two or more markers
attached on a rigid body segment so that the distances
between any two markers are fixed during motion. It is
regarded as one virtual marker. The rigid markers are very
useful to distinguish from other rigid markers by the
number of markers and their 3D distances. Fig. 3 (a) shows
a rigid marker with three markers.
A flock marker consists of many markers that are close
together as shown in Fig. 3 (b). It is used for robust
detection because a part of markers can be visible until the
whole markers are occluded. The position of the flock
marker is estimated by averaging the positions of its visible
markers.
For real-time processing, we label the markers in order as
illustrated in Fig. 4. First, rigid markers are labeled to head
marker set and rifle marker set. We distinguish two rigid
markers with their number of markers.
Fig. 4. Labeling order of markers
After labeling of the rigid markers, foot markers are
labeled. Naturally, the feet are placed on the floor so the
foot markers are easily determined by their height values.
Finally, the rest of IR markers which are close together
are labeled as hand marker set.
B. Marker Tracking
Due to self-occlusion or intervention of bright objects in
the environment, the result of marker labeling process
includes some error. For overcoming this problem, we use
Kalman filter to predict 3D position of each marker. If
irrelevant 3D measurements appear, it regards as outlier.
Also, if some confidential marker data are lost, the position
of the lost marker is restored to promising position until
confidential measurement is obtained. After performing
data association for all markers, every track is filtered by
Kalman filter so we obtain smoothed tracks with noise
elimination.
C. Feature definition
Our recognition system deals with 5 rifle firing positions
and 9 different visual signals illustrated in Fig. 5. In order to
classify them, we define some features as follows.
· Height of head
· Distance between head and hand
· Position of head and hand
· Angle between head and hand
· Movement of hand
- Stop/Backward/Forward movement
· Repetition of hand
- Vertical/Horizontal repetition
· The number of visible foot markers
Fig. 5. Firing positions and visual signals
D. Feature selection
All features are not used for classifying all visual signals
for the computation efficiency. We select a specific set of
features for each signal because each feature has different
discriminating power. For example, ‘Position of head and
hand’ feature can classify many signals except ‘stop’ and
‘move slowly’ signal as shown in Fig. 6. In this case,
‘repetition of hand’ feature is additionally used. We
empirically select two to four features per each signal.
Table 1 shows feature set for 8 signals.
657
Fig. 6. Domain of hand visual signal
TABLE 1: FEATURE SELECTION
Relation between head and hand Head
Height Foot
marker
Position Distance Angle Movement
Keeping an eye
High Close
Horizontal
Disperse High H and V
Stop High Close Stop
Warping out
High Close
Vertical
Sit down Low Far High Vertical
Move slowly
Low Far Low Horizontal
Squatting Middle 2
Kneeling position
Middle 1
1st Prone position
Low 2
2nd Prone
position
Low 1
IV. EVALUATION
In this section, we describe the evaluation methods and
the recognition results by the proposed method.
A. Evaluation method
In order to determine the performance of our system we
evaluate three terms: correct recognition rates, rejection
rate, and the rate of false recognition. The definition of each
item is as follows:
· Recognition rate = c / N
· Rejection rate = r / N
· False recognition = e / N
- c: correct pieces
- r: rejected pieces
- e: false pieces
- N: processed pieces (= c + r + e ).
The test subjects consist of ten men aged from 20 to 40
years. All participants have no experience about the visual
signals. Before the evaluation test, an experienced man
demonstrated all visual signals for participants. And every
participant has two chances for performing visual signals.
B. Recognition Result
As a result of recognizing the signal is as follows. There
are 2.85% resulted in rejection rate and false acceptance
rate is 5.31% and recognition rate is 91.8% as shown in
Table 2. The recognition rate of each signal can be found in
Table 3. Most visual signals have reasonable recognition
rate, but two signals, ‘Move slowly’ and ‘Attention’, have
high false acceptance rate. The reason is why most
participants confused with other visual signals during
performing these two visual signals. This problem is
partially solved by inducing to perform correct actions. But
similar visual signals can be hardly discriminated because
we use only one hand marker for computation efficiency.
TABLE 2: TOTAL RECOGNITION RATES
Processed pieces
Correct pieces Rejection
pieces False pieces
245 225 7 13
Recognition
rate (%) Rejection rate
(%) False acceptance
rate (%)
91.8 2.9 5.3
TABLE 3: RECOGNITION RATE PER EACH SIGNAL
Processed
pieces
Recognition
rate (%) Rejection
rate (%)
False
acceptance rate (%)
Move slowly
22 68.2 13.6 18.2
Warping out
20 90.0 10.0 0.0
Approach 20 100.0 0.0 0.0
Stop 20 100.0 0.0 0.0
Sit down 20 90.0 0.0 10.0
Attention 23 60.9 8.7 30.4
Keeping an eye
20 100.0 0.0 0.0
Disperse 20 100.0 0.0 0.0
V. CONCLUSION
We proposed a simple visual hand signal recognition
method using optical trackers for virtual military training.
In our system, we use four set of markers to discriminate 14
poses. It makes the marker labeling and tracking process to
be performed in real-time. However the proposed method
has high false acceptance rate for two similar visual signals.
Future research will involve developing a robust visual
signals recognition method with more markers. The sensor
fusion with optical trackers and motion sensors is also
involved in the future research. This work is the preliminary
result before completing team-based military training
system.
REFERENCES
[1] Seidel RJ, Chatelier PR (1997) Virtual reality, training’s future? Perspectives on Virtual Reality and Related Emerging Technologies. Plenum Press, New York
[2] Karr R, Reece D, Franceschini R (1997) Synthetic soldiers. IEEE Spectrum, pp 39-45
[3] Rickel J, Johnson WL (1999) Virtual humans for team training in virtual reality. In; Proc Ninth World Conference on AI in Education, pp 578-585
[4] http://www.optitrack.com
658