Language Technology at MILE Lab · Title: Language Technology at MILE Lab Author: Ramakrishnan...

1
Language Technology at MILE Lab Ramakrishnan Angarai Ganesan MILE Laboratory, Department of Electrical Engineering Indian Institute of Science, Bangalore, India [email protected] Philosophy of MILE Research relevant to people and life around us. No download of research topics, data or code! Having chosen to work on an applied area, we deal with whatever is needed to reach the goal. All the data we use have been collected by us: India has a huge population and so, there is no dearth for creation of standard databases. Deployment of our OCRs Using MILE Tamil OCR (Tamil Gnani), Worth Trust, Chennai digitized 600+ Tamil books; the Braille books are used by 100’s of students. Kannada school & college books, digitized using our Kannada OCR, are available to all the blind schools as audio books. Many organizations are using our OCRs to digi- tize old books, which are now out of print. Many blind individuals use our OCR & TTS. Manthan Award (South East Asia and Asia Pa- cific) 2014 - e-inclusion & accessibility category. Deployment of our TTS Uses our DCT-based prosody modification. Ranked second in Blizzard TTS Challenge 2013. Gives different output each time for same text. Using MILE Tamil TTS (Thirukkural), Anna Centenary Library, Chennai sends voice messages to 1000+ blind members around Tamil Nadu. Using our Kannada TTS, Kannada school books are available as audio books on multiple platforms (.mp3, iTune, etc.). Manthan Award, 2015 - e-education category. Camera Captured Document Analysis and Recognition Text extraction from scene images Segmentation of coloured scene word images Recognition of the segmented word images Translation/transliteration of the words into the target language/script. Text to speech conversion of the words Top positions in ICDAR 2011, 2013, 2015, 2017 Robust Reading Competition – word recognition. Free tools from MILE lab Read web text in your script Tool for typing in any Indian script using QW- ERTY keyboard - using anyone of many key- board mappings - on Linux & Windows. Recognition of anyone of 11 scripts at the word level from a multilingual document. Recognition of online handwritten documents in Tamil, Kannada, Hindi. Enhancement of binary, low-resolution, scanned document images using superresolution tech- niques, increasing OCR accuracy & readability. ASR of code-mixed speech Working on recognition of Hindi, Kannada & Tamil speech, including Hinglish. Tamil & Kannada are morphologically rich; each verb root gives rise to 1000’s of derived words; try- ing sub-words as units/grams. 1 The author thanks Tata Trust Travel Grant for funding participate in this conference.

Transcript of Language Technology at MILE Lab · Title: Language Technology at MILE Lab Author: Ramakrishnan...

Page 1: Language Technology at MILE Lab · Title: Language Technology at MILE Lab Author: Ramakrishnan Angarai Ganesan0.2cmMILE Laboratory, Department of Electrical Engineering 0.2cm Indian

Language Technology at MILE LabRamakrishnan Angarai Ganesan

MILE Laboratory, Department of Electrical EngineeringIndian Institute of Science, Bangalore, India

[email protected]

Philosophy of MILE

⇒⇒Research relevant to people and life around us.⇒⇒No download of research topics, data or code!⇒⇒Having chosen to work on an applied area, wedeal with whatever is needed to reach the goal.⇒⇒All the data we use have been collected by us:India has a huge population and so, there is nodearth for creation of standard databases.

Deployment of our OCRs

⇒⇒Using MILE Tamil OCR (Tamil Gnani), WorthTrust, Chennai digitized 600+ Tamil books; theBraille books are used by 100’s of students.⇒⇒Kannada school & college books, digitized usingour Kannada OCR, are available to all the blindschools as audio books.⇒⇒Many organizations are using our OCRs to digi-tize old books, which are now out of print.⇒⇒Many blind individuals use our OCR & TTS.⇒⇒Manthan Award (South East Asia and Asia Pa-cific) 2014 - e-inclusion & accessibility category.

Deployment of our TTS

⇒⇒Uses our DCT-based prosody modification.⇒⇒Ranked second in Blizzard TTS Challenge 2013.⇒⇒Gives different output each time for same text.⇒⇒Using MILE Tamil TTS (Thirukkural), AnnaCentenary Library, Chennai sends voice messagesto 1000+ blind members around Tamil Nadu.⇒⇒Using our Kannada TTS, Kannada school booksare available as audio books on multiple platforms(.mp3, iTune, etc.).⇒⇒Manthan Award, 2015 - e-education category.

Camera Captured DocumentAnalysis and Recognition

•Text extraction from scene images•Segmentation of coloured scene word images•Recognition of the segmented word images•Translation/transliteration of the words into thetarget language/script.•Text to speech conversion of the words•Top positions in ICDAR 2011, 2013, 2015, 2017Robust Reading Competition – word recognition.

Free tools from MILE lab•Read web text in your script•Tool for typing in any Indian script using QW-ERTY keyboard - using anyone of many key-board mappings - on Linux & Windows.

•Recognition of anyone of 11 scripts at the wordlevel from a multilingual document.•Recognition of online handwritten documents inTamil, Kannada, Hindi.•Enhancement of binary, low-resolution, scanneddocument images using superresolution tech-niques, increasing OCR accuracy & readability.

ASR of code-mixed speech

⇒⇒Working on recognition of Hindi, Kannada &Tamil speech, including Hinglish.⇒⇒Tamil & Kannada are morphologically rich; eachverb root gives rise to 1000’s of derived words; try-ing sub-words as units/grams.1 The author thanks Tata Trust Travel Grant forfunding participate in this conference.