Honda Research Institute JP - Rochester Institute of ... fileHonda Research Institute JP 1 Robot...

Honda Research Institute JP

1

Robot audition and its deployment

Kazuhiro Nakadai

Principal Researcher, Honda Research Institute Japan Co. Ltd.Visiting Professor, Tokyo Institute of TechnologyVisiting Professor, Waseda University

2nd Workshop on Alternative Sensing for Robot Perception: Beyond Laser and Vision

Honda Research Institute JPOutline

1. Background of Robot Audition

2. Introduction to Robot Audition Research

3. Open Source Software for Robot Audition

4. Deployment of Robot Audition

5. Summary

2

Honda Research Institute JPBackground

Robot as our partner

Necessity of auditory processing → robot audition

Humanoid robot Interaction with human is expected to be

a partner.

welfare companyHouse keeping News provider

Service, Interaction, Information, Entertainment…

Honda Research Institute JPRobot Audition

When a robot listens to sound with its ears, ….

It should deal with the mixture of sounds.

Ego-noiseMotors, self-voice

Honda Research Institute JPRobot Audition

• Proposed by Prof. Okuno (Kyoto Univ. →Waseda Univ.) and Nakadai at AAAI-2000– http://winne.kuis.kyoto-u.ac.jp/SIG/

• A research field bridging Robotics, AI and Signal processing

• Continuously expanding – Japan: Kyoto Univ., Honda RI, Tokyo Tech., ATR, AIST,

Kumamoto Univ., Waseda Univ.，etc– Europe: CNRS-LAAS (France), INRIA (France), Univ. of Erlangen-

Nuremberg (Germany), Ruhr-Universität Bochum (Germany),ITU (Turkey), Imperial College London (UK), etc

– North America: Sherbrooke Univ.(Canada), MERL (USA), Virginia Tech. (USA), Willow Garage (USA), etc

– Oceania: UTS (Australia)

RobotAudition

http://winne.kuis.kyoto-u.ac.jp/SIG/

Honda Research Institute JPOur Activities for Robot Audition

Organized Sessions on IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems (IROS 2005-2013)* Since 2014, robot audition is registered as an official keyword in IEEE-RAS.

Special Session on IEEE Int’l Conf. on Acoustics, Speech and Signal Processing(ICASSP 2009)@Taipei, Taiwan(ICASSP 2015)@Brisbane, Australia

HARK Tutorial (OSS)France: 2009, 2012, 2013Korea: 2008Japan: once a year since 2008

• Migration to Taxai at Willow Garage 2010 @ Palo Alto, USA• International workshop on Music Robot 2010 @ Taipei, Taiwan






5. Summary

7

Honda Research Institute JPRobot is surrounded by various noises.

Target Speech

Directional noise

Ego-noise such as motion and voice)(near field, loud)

Reverberation(echo)

Diffuse noise(BGN, omni-directional)

Different characteristics → one-by-one approach• Sound Source Separation mainly for directional noise• Dereverberation• Ego-noise suppression

Honda Research Institute JPSound Source Separation

9

Source Input)(x

+

+

Separation Matrix)(W

Output

Separation process

)()()( xWy

Incremental SSS: Update to reduce mixing cost

：step-size parameter

W )(WJ

)(y

Source Separation

)('1 WJWW tt

Separation Matrix


Adaptive Step-size (AS)Newton’s method

Sound Source Separation with Adaptive Step-size control

Fixed step-size: Difficult to adapt to environmental changes like robot motions and moving sources => GHDSS-AS

[IEEE-TSLP Nakajima 10]

10

otearai

Recorded sound

Manually-tuned

Small value

Large value

0 200 400 600 800 1000 1200-50

-40

-30

-20

-10

0

Number of Updates

Leve

l [dB

]

GSS with u = 1

2505001k2k

0 200 400 600 800 1000 1200-50

-40

-30

-20

-10

0

Number of Updates

Leve

l [dB

]

GSS with u = 1

2505001k2k

Fixed μ

Time (# of frames)

Sepa

ratio

n de

pth

Adaptively-controlled μ

Sepa

ratio

n de

pth

Time (# of frames)

SSS

Honda Research Institute JPExperiment with Texai [IEEE ICRA 2011]

Reverberant conference room(RT > 1s), around 20m x 10m.

11

http://www.youtube.com/watch?v=xpjPun7Owxg

Time (frame)

Dire

ction (d

egre

e)

Talker1

Talker2

Talker3

Talker4

Garbage

Recorded

http://www.youtube.com/watch?v=xpjPun7Owxg

Honda Research Institute JPEgo-noise suppression

12

Robot’s voice & motion noise• closer to mics• Higher power

Interactive Dancing Robot

Key ideaRobot knows what it utters and what kind of motions it does.

)(

)()(

100

010,(0,(

),(

),(),(

MfS

fS

fNMHHA

MfS

fS

fY

Known signal

Noise siganalobservation

Known signal(utterance)

Semi-blind ICA⇒barge-in-able robot Template-based ego-motion noise suppression

Posture noise Pos

ture noise

[Neural Computation ‘12, IEEE IROS ‘09-’12]


Mismatch between two blocks

Noise suppression

Automatic speech recognition (ASR)

Missing-Feature-Theory-based Integration [ASRU 07]

Noise Suppression

AutomaticSpeech

Recognition

Noisy/SimultaneousSpeech

Text

Distorted speech

Clean speech, or speech with known noise

Missing Feature Theory (MFT)for better integration

Honda Research Institute JPMissing Feature Theory (MFT)

The features of corrupted sound at time t

Missing features caused by separation

Smallerror

Normal ASR

The features of corrupted sound at time t

MFT-based ASR

i

)(ix

)(ix

i

Missing feature mask (MFM)

An acoustic model stored in ASR

Largeerror

One of the most important issues is automatic MFM generation.

Honda Research Institute JPAn example of automatic generated MFM

left

center

right

spectrogram MFMcaptured 1 (reliable)0 (unreliable)

Arayuru Genjitsu wo …

Isshukan bakari …

Terebi gemu ya pasokon de …

leakage

leakage

speechpass

masked

masked






5. Summary

16

Honda Research Institute JPOpen Source Robot Audition Software HARK

HRI-JP Audition for Robots with Kyoto University

http://www.hark.jp/

hark = listen (old English)Research: Free(Commercial: Licensing)

Sound Source Localization

Sound Source Separation

Automatic Speech Recognition

ArrayDialog

Developing under collaboration between Kyoto Univ., HRI-JP, and Tokyo Tech.

Honda Research Institute JPHistory and Tutorials

1. Apr. 2008, First release (0.1.7)– 1st Tutorial: Nov. 17th, 2008, Kyoto University, Kyoto, Japan– 2nd Tutorial: Dec. 5th, 2008, KIST, Seoul, Korea

2. Nov. 2009, 1.0.0 Pre-release– 3rd Tutorial: Nov. 20th, 2009, Keio University, Yokohama, Japan– 4th Tutorial: Dec. 5th, 2009, Univ. de Pierre et Marie Curie, Paris, France

3. Nov. 2010, Major version-up (1.0.0) – performance, rich documents– 5th Tutorial: Nov. 20th, 2010, Kyoto University, Kyoto, Japan

4. Feb. 2012, Version-up (1.1) – performance, 64bit processing, ROS– 6th Tutorial: Feb. 29th, 2012, Univ. de Pierre et Marie Curie, Paris, France– 7th Tutorial: Mar. 9th, 2012, Nagoya University, Nagoya, Japan

5. Mar. 2013, Version-up (1.7) – Window, Kinect, PSEye– 8th Tutorial: Mar. 19th, 2013, Kyoto University, Kyoto, Japan

6. Oct. 2013, Major Version-up (2.0) – HARKDesigner, Microcone– 9th Tutorial: Oct. 2nd, 2013, LAAS-CNRS, Toulouse, France– 10th Tutorial: Dec. 5th, 2013, Waseda University, Tokyo, Japan

7. Nov. 2014, Version-up (2.1)– 11th Tutorial: Nov. 21th, 2014, Waseda University, Tokyo, Japan

8. Nov., 2015 Version-up (2.2) planned

Honda Research Institute JPFeatures in HARK (1)

GUI programming environment (HARK Designer)– Web-based programming environment

(jQuery, node.js, HTML5)

– Chrome/Safari/Firefox on Linux/Windows/Mac

– Small overhead in module communication (frame-based processing) provided by FlowDesigner [Cote04]

An example of robot audition system with HARKa) Module network

b) Property setting

Honda Research Institute JPFeatures in HARK (2)

Support many multi-channel sound input devices

Advanced signal processing technologies – Localization: GEVD/GSVD [Nakamura’11], 3D localization

– Separation: GHDSS [Nakajima ‘09], HRLE [Nakajima ‘10], etc.

Easy to install– Just use a package management tool “apt-get” !

Rich documentation– Manual and cookbook over 300 pages in Japanese and English

Packages： ROS, OpenCV, Python, …

ALSA supported sound cards (e.g. RME)Kinect (4mics) PlayStation eye

(4mics)

Microcone(7mics)

http://www.synthax.jp/products/multiface_ae/index.html

http://www.synthax.jp/products/multiface_ae/index.html




3. Open Source Software for Robot Audition, HARK


5. Summary

21

Honda Research Institute JPMusical Robot [IEEE IROS 09 workshop on musical robots]

22

Human-Robot Interactionaccording to musical beats• Adaptive beat tracking• HRP2, Nao : Thereminist

Honda Research Institute JPSLAM-based Mic array Calibration

Issues in mic array processing

– Given microphone positions

– Synchronous recording

Measurements are required.Embedded mic array

Special multichannel A/D is necessary.

EKF-SLAM based online mic-array calibration

beforereference afterx [m] x [m] x [m]y [m] y [m] y [m]Av

erag

e en

ergy

Beam-forming output

[Miura et al., IROS 2011] (Best Paper Finalist)

Honda Research Institute JPApplication to UAV (SSL) [IROS 2014]

Reference SEVD-MUSIC iGSVD-MUSIC w/ CMS

Highly noisy sound sources (-15dB) can be localized.

Quadrotor with 16 mics

SSL with iGSVD-MUSIC

Localization Results

Honda Research Institute JPApplication of HARK-Embedded [IROS ‘15]

Sound Source Localization and Visualization• 2D sound source localization using 1D sound source

localization, motion, and gyro information


26

Robot Audition based IVI system [IROS’15]

• Talk button less, highly noise-robust voice recognition, • multi-party dialog with a robot agent• Hybrid of local and cloud services

Honda Research Institute JPSpatio-Temporal Analysis of Frog Chorus

Firefly– Takeshi Mizumoto, Ikkyu Aihara, Takuma Otsuka, Ryu Takeda, Kazuyuki Aihara, Hiroshi G.

Okuno: Sound Imaging of Nocturnal Animal Calls in Their Natural Habita, Journal of Comparative Physiology A, Accepted, 26 Apr. 2011. doi:10.1007/s00359-011-0652-7 (IF 1.8)

Discovery of Three-Group Chorus (=Tri-phase synchronization)– Ikkyu Aihara, Ryu Takeda, Takeshi Mizumoto, Takuma Otsuka, Toru Takahashi, Hiroshi G.

Okuno, Kazuyuki Aihara: Complex and transitive synchronization in a frustrated system of calling frogs, Physical Review E, Vol.83, Issue 3. 031913 (2011) [5 pages], 21 Mar. 2011. doi:10.1103/PhysRevE.83.031913 (IF 2.4)

27

Rice field

Footpath

Oki Island

Honda Research Institute JPSummary

Introduced an overview of robot audition

Introduced open source software for robot audition HARK

Introduced deployment of robot audition technologies to robotics and other fields.

28

Honda Research Institute JPTake Home Message

Audio processing is powerful as well as visual sensors, and it is essential to HRI and HMI. Robot audition is a research field to consider techniques working in various real-world scenes. When you are interested in robot audition, please join us (we continuously have sessions in IROS), and try to use HARK.

http://www.hark.jp/

29/60

Honda Research Institute JP - Rochester Institute of ... fileHonda Research Institute JP 1 Robot...

Documents

Transcript of Honda Research Institute JP - Rochester Institute of ... fileHonda Research Institute JP 1 Robot...