Honda Research Institute JP - Rochester Institute of ... fileHonda Research Institute JP 1 Robot...
Transcript of Honda Research Institute JP - Rochester Institute of ... fileHonda Research Institute JP 1 Robot...
Honda Research Institute JP
1
Robot audition and its deployment
Kazuhiro Nakadai
Principal Researcher, Honda Research Institute Japan Co. Ltd.Visiting Professor, Tokyo Institute of TechnologyVisiting Professor, Waseda University
2nd Workshop on Alternative Sensing for Robot Perception: Beyond Laser and Vision
Honda Research Institute JPOutline
1. Background of Robot Audition
2. Introduction to Robot Audition Research
3. Open Source Software for Robot Audition
4. Deployment of Robot Audition
5. Summary
2
Honda Research Institute JPBackground
Robot as our partner
Necessity of auditory processing → robot audition
Humanoid robot Interaction with human is expected to be
a partner.
welfare companyHouse keeping News provider
Service, Interaction, Information, Entertainment…
Honda Research Institute JPRobot Audition
When a robot listens to sound with its ears, ….
It should deal with the mixture of sounds.
Ego-noiseMotors, self-voice
Honda Research Institute JPRobot Audition
• Proposed by Prof. Okuno (Kyoto Univ. →Waseda Univ.) and Nakadai at AAAI-2000– http://winne.kuis.kyoto-u.ac.jp/SIG/
• A research field bridging Robotics, AI and Signal processing
• Continuously expanding – Japan: Kyoto Univ., Honda RI, Tokyo Tech., ATR, AIST,
Kumamoto Univ., Waseda Univ.,etc– Europe: CNRS-LAAS (France), INRIA (France), Univ. of Erlangen-
Nuremberg (Germany), Ruhr-Universität Bochum (Germany),ITU (Turkey), Imperial College London (UK), etc
– North America: Sherbrooke Univ.(Canada), MERL (USA), Virginia Tech. (USA), Willow Garage (USA), etc
– Oceania: UTS (Australia)
RobotAudition
Honda Research Institute JPOur Activities for Robot Audition
Organized Sessions on IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems (IROS 2005-2013)* Since 2014, robot audition is registered as an official keyword in IEEE-RAS.
Special Session on IEEE Int’l Conf. on Acoustics, Speech and Signal Processing(ICASSP 2009)@Taipei, Taiwan(ICASSP 2015)@Brisbane, Australia
HARK Tutorial (OSS)France: 2009, 2012, 2013Korea: 2008Japan: once a year since 2008
• Migration to Taxai at Willow Garage 2010 @ Palo Alto, USA• International workshop on Music Robot 2010 @ Taipei, Taiwan
Honda Research Institute JPOutline
1. Background of Robot Audition
2. Introduction to Robot Audition Research
3. Open Source Software for Robot Audition
4. Deployment of Robot Audition
5. Summary
7
Honda Research Institute JPRobot is surrounded by various noises.
Target Speech
Directional noise
Ego-noise such as motion and voice)(near field, loud)
Reverberation(echo)
Diffuse noise(BGN, omni-directional)
Different characteristics → one-by-one approach• Sound Source Separation mainly for directional noise• Dereverberation• Ego-noise suppression
Honda Research Institute JPSound Source Separation
9
Source Input)(x
+
+
Separation Matrix)(W
Output
Separation process
)()()( xWy
Incremental SSS: Update to reduce mixing cost
:step-size parameter
W )(WJ
)(y
Source Separation
)('1 WJWW tt
Separation Matrix
Honda Research Institute JP
Adaptive Step-size (AS)Newton’s method
Sound Source Separation with Adaptive Step-size control
Fixed step-size: Difficult to adapt to environmental changes like robot motions and moving sources => GHDSS-AS
[IEEE-TSLP Nakajima 10]
10
otearai
Recorded sound
Manually-tuned
Small value
Large value
0 200 400 600 800 1000 1200-50
-40
-30
-20
-10
0
Number of Updates
Leve
l [dB
]
GSS with u = 1
2505001k2k
0 200 400 600 800 1000 1200-50
-40
-30
-20
-10
0
Number of Updates
Leve
l [dB
]
GSS with u = 1
2505001k2k
Fixed μ
Time (# of frames)
Sepa
ratio
n de
pth
Adaptively-controlled μ
Sepa
ratio
n de
pth
Time (# of frames)
SSS
Honda Research Institute JPExperiment with Texai [IEEE ICRA 2011]
Reverberant conference room(RT > 1s), around 20m x 10m.
11
http://www.youtube.com/watch?v=xpjPun7Owxg
Time (frame)
Dire
ction (d
egre
e)
Talker1
Talker2
Talker3
Talker4
Garbage
Recorded
Honda Research Institute JPEgo-noise suppression
12
Robot’s voice & motion noise• closer to mics• Higher power
Interactive Dancing Robot
Key ideaRobot knows what it utters and what kind of motions it does.
)(
)()(
100
010,(0,(
),(
),(),(
MfS
fS
fNMHHA
MfS
fS
fY
Known signal
Noise siganalobservation
Known signal(utterance)
Semi-blind ICA⇒barge-in-able robot Template-based ego-motion noise suppression
Posture noise Pos
ture noise
[Neural Computation ‘12, IEEE IROS ‘09-’12]
Honda Research Institute JP
Mismatch between two blocks
Noise suppression
Automatic speech recognition (ASR)
Missing-Feature-Theory-based Integration [ASRU 07]
Noise Suppression
AutomaticSpeech
Recognition
Noisy/SimultaneousSpeech
Text
Distorted speech
Clean speech, or speech with known noise
Missing Feature Theory (MFT)for better integration
Honda Research Institute JPMissing Feature Theory (MFT)
The features of corrupted sound at time t
Missing features caused by separation
Smallerror
Normal ASR
The features of corrupted sound at time t
MFT-based ASR
i
)(ix
)(ix
i
Missing feature mask (MFM)
An acoustic model stored in ASR
Largeerror
One of the most important issues is automatic MFM generation.
Honda Research Institute JPAn example of automatic generated MFM
left
center
right
spectrogram MFMcaptured 1 (reliable)0 (unreliable)
Arayuru Genjitsu wo …
Isshukan bakari …
Terebi gemu ya pasokon de …
leakage
leakage
speechpass
masked
masked
Honda Research Institute JPOutline
1. Background of Robot Audition
2. Introduction to Robot Audition Research
3. Open Source Software for Robot Audition
4. Deployment of Robot Audition
5. Summary
16
Honda Research Institute JPOpen Source Robot Audition Software HARK
HRI-JP Audition for Robots with Kyoto University
http://www.hark.jp/
hark = listen (old English)Research: Free(Commercial: Licensing)
Sound Source Localization
Sound Source Separation
Automatic Speech Recognition
ArrayDialog
Developing under collaboration between Kyoto Univ., HRI-JP, and Tokyo Tech.
Honda Research Institute JPHistory and Tutorials
1. Apr. 2008, First release (0.1.7)– 1st Tutorial: Nov. 17th, 2008, Kyoto University, Kyoto, Japan– 2nd Tutorial: Dec. 5th, 2008, KIST, Seoul, Korea
2. Nov. 2009, 1.0.0 Pre-release– 3rd Tutorial: Nov. 20th, 2009, Keio University, Yokohama, Japan– 4th Tutorial: Dec. 5th, 2009, Univ. de Pierre et Marie Curie, Paris, France
3. Nov. 2010, Major version-up (1.0.0) – performance, rich documents– 5th Tutorial: Nov. 20th, 2010, Kyoto University, Kyoto, Japan
4. Feb. 2012, Version-up (1.1) – performance, 64bit processing, ROS– 6th Tutorial: Feb. 29th, 2012, Univ. de Pierre et Marie Curie, Paris, France– 7th Tutorial: Mar. 9th, 2012, Nagoya University, Nagoya, Japan
5. Mar. 2013, Version-up (1.7) – Window, Kinect, PSEye– 8th Tutorial: Mar. 19th, 2013, Kyoto University, Kyoto, Japan
6. Oct. 2013, Major Version-up (2.0) – HARKDesigner, Microcone– 9th Tutorial: Oct. 2nd, 2013, LAAS-CNRS, Toulouse, France– 10th Tutorial: Dec. 5th, 2013, Waseda University, Tokyo, Japan
7. Nov. 2014, Version-up (2.1)– 11th Tutorial: Nov. 21th, 2014, Waseda University, Tokyo, Japan
8. Nov., 2015 Version-up (2.2) planned
Honda Research Institute JPFeatures in HARK (1)
GUI programming environment (HARK Designer)– Web-based programming environment
(jQuery, node.js, HTML5)
– Chrome/Safari/Firefox on Linux/Windows/Mac
– Small overhead in module communication (frame-based processing) provided by FlowDesigner [Cote04]
An example of robot audition system with HARKa) Module network
b) Property setting
Honda Research Institute JPFeatures in HARK (2)
Support many multi-channel sound input devices
Advanced signal processing technologies – Localization: GEVD/GSVD [Nakamura’11], 3D localization
– Separation: GHDSS [Nakajima ‘09], HRLE [Nakajima ‘10], etc.
Easy to install– Just use a package management tool “apt-get” !
Rich documentation– Manual and cookbook over 300 pages in Japanese and English
Packages: ROS, OpenCV, Python, …
ALSA supported sound cards (e.g. RME)Kinect (4mics) PlayStation eye
(4mics)
Microcone(7mics)
Honda Research Institute JPOutline
1. Background of Robot Audition
2. Introduction to Robot Audition Research
3. Open Source Software for Robot Audition, HARK
4. Deployment of Robot Audition
5. Summary
21
Honda Research Institute JPMusical Robot [IEEE IROS 09 workshop on musical robots]
22
Human-Robot Interactionaccording to musical beats• Adaptive beat tracking• HRP2, Nao : Thereminist
Honda Research Institute JPSLAM-based Mic array Calibration
Issues in mic array processing
– Given microphone positions
– Synchronous recording
Measurements are required.Embedded mic array
Special multichannel A/D is necessary.
EKF-SLAM based online mic-array calibration
beforereference afterx [m] x [m] x [m]y [m] y [m] y [m]Av
erag
e en
ergy
Beam-forming output
[Miura et al., IROS 2011] (Best Paper Finalist)
Honda Research Institute JPApplication to UAV (SSL) [IROS 2014]
Reference SEVD-MUSIC iGSVD-MUSIC w/ CMS
Highly noisy sound sources (-15dB) can be localized.
Quadrotor with 16 mics
SSL with iGSVD-MUSIC
Localization Results
Honda Research Institute JPApplication of HARK-Embedded [IROS ‘15]
Sound Source Localization and Visualization• 2D sound source localization using 1D sound source
localization, motion, and gyro information
Honda Research Institute JP
26
Robot Audition based IVI system [IROS’15]
• Talk button less, highly noise-robust voice recognition, • multi-party dialog with a robot agent• Hybrid of local and cloud services
Honda Research Institute JPSpatio-Temporal Analysis of Frog Chorus
Firefly– Takeshi Mizumoto, Ikkyu Aihara, Takuma Otsuka, Ryu Takeda, Kazuyuki Aihara, Hiroshi G.
Okuno: Sound Imaging of Nocturnal Animal Calls in Their Natural Habita, Journal of Comparative Physiology A, Accepted, 26 Apr. 2011. doi:10.1007/s00359-011-0652-7 (IF 1.8)
Discovery of Three-Group Chorus (=Tri-phase synchronization)– Ikkyu Aihara, Ryu Takeda, Takeshi Mizumoto, Takuma Otsuka, Toru Takahashi, Hiroshi G.
Okuno, Kazuyuki Aihara: Complex and transitive synchronization in a frustrated system of calling frogs, Physical Review E, Vol.83, Issue 3. 031913 (2011) [5 pages], 21 Mar. 2011. doi:10.1103/PhysRevE.83.031913 (IF 2.4)
27
Rice field
Footpath
Oki Island
Honda Research Institute JPSummary
Introduced an overview of robot audition
Introduced open source software for robot audition HARK
Introduced deployment of robot audition technologies to robotics and other fields.
28
Honda Research Institute JPTake Home Message
Audio processing is powerful as well as visual sensors, and it is essential to HRI and HMI. Robot audition is a research field to consider techniques working in various real-world scenes. When you are interested in robot audition, please join us (we continuously have sessions in IROS), and try to use HARK.
http://www.hark.jp/
29/60