Instant Speech Translation - 10BM60080

download Instant Speech Translation - 10BM60080

of 13

Transcript of Instant Speech Translation - 10BM60080

  • 8/8/2019 Instant Speech Translation - 10BM60080

    1/13

    INSTANT SPEECH TRANSLATION

    By SATHIYASEELAN M

    10BM60080

    I Year M.B.A

    VGSOM, IIT Kharagpur

  • 8/8/2019 Instant Speech Translation - 10BM60080

    2/13

    Index

    1. Abstract .................................................................................................3

    2. Instant Speech Translation Eliminating Language Barriers ...........3

    3. System Requirements ..........................................................................3

    3.1. Speech Recognition ...............................................................................4

    3.2. Language Parsing ..................................................................................5

    3.3. Translation .............................................................................................5

    4. Applications and their Business Potential .................... .................... ..6

    4.1. Mobile Applications and Services ...........................................................6

    4.2. Voice Interface Devices with Local Language support ............................8

    4.3. Data Entry Applications in Multiple Languages ....................................9

    4.4. e-Learning .............................................................................................9

    4.5. Business Applications .......................................................................... 10

    5. Key Players ................................. ..................... ..................... .............. 11

    6. Challenges Ahead ..................... .................... ..................... ................. 11

    7. Conclusion ..................... ..................... ..................... ..................... ...... 12

    8. References ..................... ..................... ..................... ..................... ...... 13

  • 8/8/2019 Instant Speech Translation - 10BM60080

    3/13

    1. Abstract

    With the current pace of globalization, any Industry needs to look beyond Geographical

    borders. Indian IT firms provide services to Japanese, Korean clients etc. These firms also

    invest a lot on foreign language training programs. An Application that provides instanttranslation will not only cut down these costs but will also help gathering requirements more

    precisely and in a short span of time. Instant speech translation [IST] finds wide applications in

    other industries as well. Say in a country like India where numerous vernacular languages are

    in use, IST can be used in a number of ways in day-to-day life. There is huge potential for IST

    applications in mobile phones. All major players such as Google, Microsoft, and IBM have

    already come up with some sort of prototype for these kind of applications. Google Translator is

    one such primitive example. A lot many such applications will be in our gadgets soon. This

    Paper elaborates on few such applications and their business potential.

    2. Instant Speech Translation Eliminating Language Barriers

    Internet and mobile services has reached even remote villages. Now rural markets are

    considered significant in countries like China and India. Breaking Language barriers will further

    open up these markets for international business. Knowledge anywhere in any form should be

    used for the growth of the humanity. We should create opportunities for those who want to

    learn and share knowledge using their own native languages. Instant Speech translation will

    create a platform for them. This could unravel many things that are not known to the world.

    In The Hitchhikers Guide to the Galaxy Babel fish, a f ictitious animal performs instant

    translations when kept in the ear. If such an application is there on the mobile, Say I call a

    person in Japan, I speak to him in English which would be translated to Japanese by the

    application and then transmitted through a telecom service provider. This will eliminate

    language boundaries and create a truly connected world.

    3. System Requirements

    We think speech-to-speech translation should be possible and work reasonably well in a few years

    time. Clearly, for it to work smoothly, you need a combination of high-accuracy machine translation

    and high-accuracy voice recognition, and thats what were working on .If you look at the progress in

    machine translation and corresponding advances in voice recognition, there has been huge progress

    recently.

    - Franz Och, Googles head of translation services

    To develop an Instant speech Translation application, we need a robust speech recognition

    and Machine translation system. Following figure depicts the basic blocks of an instant speech

    translation system.

  • 8/8/2019 Instant Speech Translation - 10BM60080

    4/13

    Fig. Basic Functional Blocks of Instant Speech Translation

    3.1. Speech Recognition

    Advances in speech-recognition and dictation technology have made stunning leaps

    forward in recent years although it isn't perfect. Word Error Rate (WER)has drastically come

    down in the recent past.

    Fig. Word Error Rate of Speech Recognition Systems over Years

    Source -http://cacm.acm.org/Communications of the ACM

    http://en.wikipedia.org/wiki/Word_error_ratehttp://en.wikipedia.org/wiki/Word_error_ratehttp://cacm.acm.org/http://cacm.acm.org/http://cacm.acm.org/http://cacm.acm.org/http://en.wikipedia.org/wiki/Word_error_rate
  • 8/8/2019 Instant Speech Translation - 10BM60080

    5/13

    Speech recognition has achieved good usability and there is a sudden surge in the

    speech controlled devices. Even Microsoft Vista had speech recognition capabilities which

    turned out to be a failure. But we had witnessed basic commands working in it. Just a listening

    and guessing system is not going to thi s forward.

    Robust speech recognition technology is an crucial part of Instant speech translation.

    Main problem systems face is in understanding the nuance of users enunciation and voice

    patterns. When used over a period of time it could reduce the speech recognition error rate.

    Mobile phones will have an upper hand over gadgets in this. As a mobile phone is used by only

    one user mostly and even users cant avoid mobile phone usage. Mobiles can also soon

    recognise users natural free-style speech. Speech recognition systems can be customized to a

    particular user by having a predefined set of commands or words to be uttered by the user.

    This could help the system recognize its masters voice patterns. This could be done with the

    help of a professional in early stages of development for this sort of customization.

    3.2. Language Parsing

    Human sentences cant be easily parsed by programs as they parse mathematical

    expressions. There is substantial ambiguity associated with the structure of human language.

    Some sort of linguistic analysis needs to be done to fetch the relevant information. Language

    parser splits the raw text into understandable word units and selects the correct form and class

    for each word that can have more than one interpretation and identifies the head words of a

    sentence. The information that is analysed by the language parser is passed to the machine

    translation engine for further tasks.

    There should be some set of protocols defined for communication between different

    languages. Say for e.g. Indian languages generally use SUBJECT-OBJECT-VERB pattern but

    in English SUBJECT-VERB-OBJECT pattern is generally used. Language Parser role is provide

    parsed language stream that can be easily interpreted by translators.

    3.3. Translation

    Machine Translator translates a parsed input language stream to a well defined output

    language stream. Translation done by Machine translator will abide by the set of protocols

    defined for communication between a set of languages.

  • 8/8/2019 Instant Speech Translation - 10BM60080

    6/13

  • 8/8/2019 Instant Speech Translation - 10BM60080

    7/13

    Fig. A Model of IST Services on mobile

    IST as a product:

    Even these services can be packaged into a product. But this will be a heavy

    application to support an almost perfect translation. So in the initial stages user preferred

    language packs can be packed into a product and sold to the user.

    Fig. Users interacting through an IST application on mobile

    Service model will suit Indian languages and Product model will suit for international

    languages like Japanese. Service model will facilitate wide spread of these applications and it

    will also bring in various players into it.

  • 8/8/2019 Instant Speech Translation - 10BM60080

    8/13

    Even IST applications can be used in other type of gadgets like iPod, iPad etc. Few

    basic stuffs are already available in App store for e.g. Jibbigo Voice Translation

    Fig. Screenshot of Jibbigo Application on iPod

    IST Development Standards

    To facilitate easy development and learning some set of standards need to be

    established similar to HTML in web design. As XML and JSON for machine readable data

    sharing, VOXML (Voice XML) can be used for these types of applications.

    4.2. Voice Interface Devices with Local Language support

    Voice interface devices that support Local languages will soon be on use. Say a

    localities interacting with a railway information kiosk with their local language through speech.

    Instant speech translation will play a vital role in these types of interfaces. IST Applications can

    be at the front end of such devices. This will also consume lesser query resolving time as

    compared to traditional key entry enquiry system. As most of the voice driven applications

    currently support English. Even same is the case with Windows 7 Operating System. IST

    Application when used at the front end can translate local language speech input to English

    which can be further processed by Speech recognition systems supported by various Operating

    Systems.

  • 8/8/2019 Instant Speech Translation - 10BM60080

    9/13

    Fig. Various blocks in a Railway Information Kiosk that supports Regional Language support

    through speech

    4.3. Data Entry Applications in Multiple Languages

    IST Applications can help in Data entry applications in multiple languages. This could

    assist in translating legal documents to various languages. We have witnessed many court

    proceedings getting delayed due to lack of documents in regional languages. Our Governm ent

    also invests a lot in translating various documents to regional languages. In the years to come

    Microsoft word will have options to view translated versions while typing. This could cut down

    costs and time involved in such activities.

    4.4. e-Learning

    Advancement in computing and bandwidth has brought the benefits of traditional classroom

    education into a distance learning environment. IST will take this a step forward by removing

    language barriers that impede the sharing of ideas and knowledge. Below figure depicts the schema

    of an e-classroom that uses IST.

    LocalLanguage

    Speech inputIST

    ApplicationsCommand / Query

    Generator

    Normal Processingdone in a RailwayInformation Kiosk

    English

  • 8/8/2019 Instant Speech Translation - 10BM60080

    10/13

    Fig. IST Applications supporting Distance Learning in Various Languages

    Even IST applications could be used in webcasting in a similar way.

    4.5. Business Applications

    IST Applications could also assist Business enterprises to interact with customers located

    across different geographies. IST will help in understanding customer requirements in short

    span of time.

    Users contribution to IST applications is very crucial. They can provide suggestions t o

    improvise the translation provided by the application. Some credits can be given to regular

    users who provide valuable suggestions. This will encourage local participation, which would

    ultimately help in improving the quality of service provided by IST applications.

    Applications of IST discussed here is just a tip of an iceberg. We would see a lot many such

    applications in future when IST applications are usable in real time. Then IST applications

    could be expanded to lot many sensitive areas like Health care, defence etc.

  • 8/8/2019 Instant Speech Translation - 10BM60080

    11/13

    5. Key Players

    Google was the first company to announce that it was working on speech-to-speech

    translation for mobile phones. The Latest Apps from Google Android that supports translation is

    Babylon that will give dictionary results in 75 different languages as well as full text translationsin over 12 languages. Apple is working with IBM to roll out speech-to-speech translator for

    iPhones. IBM and Apple are already working closely on a few applications that will run on

    iPhone and iPad.

    IBM has been working on translation software and machine translation for years. In fact,

    they created MASTOR and the SMT (Statistical Machine Translation) technology that many

    other Translating Applications are using.

    Microsoft has inbuilt speech recognition support in its Operating systems. It has

    recently demonstrated German-English translation of a conversation between two Microsoft

    employees. It has made no official announcements on projects pertaining to Instant Speech

    Translation.

    Videos of Instant Speech Translation applications by other major players like AT&T,

    NEC, ATR float in YouTube. Nespole, Babylon, Verbmobil, MATRIX etc. are few well known

    speech translation systems developed by these players in this field. Extensive Research

    Projects are going on to improve the usability of Speech translation systems. PDA

    manufacturers could work in collaboration with these Application developers to accelerate

    these projects, which would also help them in gaining an upper hand over their competitors.

    6. Challenges Ahead

    System that works well in real time environment will only be successful in the long run.

    Numerous hurdles need to be crossed to reach a perfect real time IST. One such is Speech

    Recognition with high accuracy. It is heavily dependent upon the quality of the input speech.

    Acoustical degradations produced by additive noise are an obstacle to reach desired accuracy.

    In a real time user is not going to use IST applications in a noise free environment. Hence IST

    application should be intelligent enough to separate out the users voice form the noise in the

    environment.

    IST applications are also expected to be intelligent enough to capture the users mood

    in the future. Monotonous voice from an IST application will soon make the user bored with

    these applications. Even a customisable voice from the IST application will make them more

    expressive and friendly. Adding Phonemes to computerised voice will it nearer to a human

    voice.

  • 8/8/2019 Instant Speech Translation - 10BM60080

    12/13

    Industry should work in collaboration with research communities in resolving these

    hurdles and achieve a human like performance.

    7. Conclusion

    Speech/Text Translation Applications are being used in variety of forms in number of

    devices. To attain humanlike performance, we must continue to invest in research. Along with

    speech, other sensory user inputs can also be integrated with IST applications to attain

    humanlike performance. Once that is achieved Instant speech translation will soon spread to

    devices like T.V. It wouldnt be a surprise if text in the web now gets replaced by audio and

    video in the future glocalworld.

  • 8/8/2019 Instant Speech Translation - 10BM60080

    13/13

    8. References

    1. Enhancing Global and Synchronous Distance Learning and Teaching by Using InstantTranscript and Translation By Ivan Ho, Hajime Kiyohara, Akira Sugimoto, and Kazuo

    Yana Hosei. University Research Institute, California.

    2. http://mashable.com/2010/02/08/speech-to-speech/

    3. http://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.html

    4. http://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article701783

    1.ece

    5. http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-

    translator/

    6. http://www.jibbigo.com/website/index.php

    7. http://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognition

    http://mashable.com/2010/02/08/speech-to-speech/http://mashable.com/2010/02/08/speech-to-speech/http://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.htmlhttp://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.htmlhttp://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://www.jibbigo.com/website/index.phphttp://www.jibbigo.com/website/index.phphttp://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognitionhttp://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognitionhttp://cacm.acm.org/magazines/2004/1/6588-challenges-in-adopting-speech-recognitionhttp://www.jibbigo.com/website/index.phphttp://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://blog.gts-translation.com/2010/03/02/microsoft-demos-speech-to-speech-translator/http://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://technology.timesonline.co.uk/tol/news/tech_and_web/personal_tech/article7017831.ecehttp://domino.research.ibm.com/comm/research.nsf/pages/r.uit.innovation.htmlhttp://mashable.com/2010/02/08/speech-to-speech/