rospeex: a cloud-based speech communication toolkit for ROS
-
Upload
ksugiura -
Category
Technology
-
view
691 -
download
4
Transcript of rospeex: a cloud-based speech communication toolkit for ROS
![Page 1: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/1.jpg)
rospeexA Cloud-based speech communication toolkit for ROS
Komei SugiuraNational Institute of Information and Communication Technology, [email protected]
2013/12/13
![Page 2: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/2.jpg)
ROS (Robot Operating System)
• ROS: middleware for robots– Version 1.0 released in 2010– Global de facto standard– From driver and package management to learning and
visualization
2
![Page 3: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/3.jpg)
Speech communication toolkit for ROS
• ROS compatible• Speech recognition using VoiceTra engine• Other functionalities
– Noise reduction, non-monologues speech synthesis
Conventional packages rospeexSpeech recognition/synthesis
Sphinx, festival, Julius(or commercial tools)
VoiceTra engine(or third-party engines)
Engine Stand alone Cloud-basedLanguage Single language ja, en, zh, ko
rospeex
3
![Page 4: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/4.jpg)
Position in Cloud Robotics
• Cloud robotics [James Kuffner@Google, 2011]– Manipulation using Google Goggles [Kehoe+ 2013]– Knowledge sharing based on RoboEarth [Tenorth+ 2012]– Speech communication for robots rospeex
Commercial systems(Nuance, ToSpeak, AmiVoice Cloud, ..)
rospeex
Many OpenHRI, HARK,PocketSphinx, Festival
Cloud-based
Stand-alone
Robot middleware compatibleIncompatible
![Page 5: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/5.jpg)
Quadrilingual communication using rospeex
5
![Page 6: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/6.jpg)
rospeex provides speech recognition/synthesis, user constructs dialogue processing
Speech moduleDialogue
processingSpeech
synthesis
Task manager
Speech output
Speech input
Input from other modules(Sensors, recognized obj, etc)
Output to other modules(Actuators, learning, etc)
Provided by the user
Provided by rospeex
Speech recognition
Speech recognition & synthesis servers
Noise reduction
VAD
Speech recognition & synthesis servers
Provided by third parties
![Page 7: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/7.jpg)
Non-monologue speech synthesis for robots
• Reading-style robot voice– Monotonous, unnatural and unfriendly– Hard to realize that the robot is asking
a question• Conventional text-to-speech (TTS) systems
are not optimized for communication
Voice talentXIMERA 3
(Text reading)
7
![Page 8: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/8.jpg)
Demohttp://komeisugiura.jp/software/nm_tts.html
8
![Page 9: rospeex: a cloud-based speech communication toolkit for ROS](https://reader035.fdocuments.in/reader035/viewer/2022081809/557c22ffd8b42a925b8b4bea/html5/thumbnails/9.jpg)
Using speech recognition/synthesis without ROS
• Send JSON file to the server– Recognition– Synthesis
• Sample codes (JavaScript, Python, C++) are available
{ “method” : “speak”,"params" : [
"ja","こんにちは","*","audio/x-wav"
]}
http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSS
{ "method":"recognize","params":[
"ja",{“audio”:“base64-encoded wav",
"audioType":"audio/x-wav","voiceType":"*"} ] }
http://rospeex.ucri.jgn-x.jp/nauth_json/jsServices/VoiceTraSR
Recognition Synthesis
Non-monologue speech synthesis Search