Interlingual word mapping
description
Transcript of Interlingual word mapping
![Page 1: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/1.jpg)
MTP I Stage Project Presentation
Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye
Department of Computer Science and EngineeringIndian Institute of Technology, Bombay
![Page 2: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/2.jpg)
1. Motivation2. Introduction3. Introduction to Transliteration 4. Syllables and their structure types 5. Sonority Theory6. Relation between Sonority and Syllables7. What is Schwa?8. A Sonority theory based Syllabification module9. Results obtained10. References
![Page 3: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/3.jpg)
Language – an integral part of society Each has its specific structure and rules Some basic concepts common to all Helpful in processes like transliteration
ultimately leading to better CLIR. We are trying to exploit them for
process of syllabification
![Page 4: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/4.jpg)
“To study some Phonological similarities between English, Hindi and Marathi and exploit them in order to achieve the goal of transliteration with high accuracy so as to be able to tackle problems like OOV words during Cross-Lingual Information Retrieval.”
![Page 5: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/5.jpg)
Concepts being emphasized Transliteration Theory of Syllables Sonority Theory Their relation Theory of Schwa & Schwa deletion
Mainly based on the properties of Sound Driving force behind word pronunciation
in any language
![Page 6: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/6.jpg)
A process of phonetically “translating” named entities like proper nouns from a source language to a target language.[1]
The process of transliteration should be as accurate as possible.
Faces the problem of multiple variants of words.
![Page 7: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/7.jpg)
![Page 8: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/8.jpg)
“Syllable is a unit of spoken language consisting of a single uninterrupted sound formed generally by a Vowel and preceded or followed by one or more consonants.”
Vowels are the heart of a syllable(Most Sonorous Element)
Consonants act as sounds attached to vowels.
![Page 9: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/9.jpg)
A syllable consists of 3 major parts:- Onset (C) Nucleus (V) Coda (C)
Vowels sit in the Nucleus of a syllable Consonants may get attached as Onset
or Coda. Basic structure - CV
![Page 10: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/10.jpg)
The Nucleus is always present
Onset and Coda may be absent
Possible structures V CV VC CVC
![Page 11: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/11.jpg)
Prominence Theory E.g. entertaining /entəteɪnɪŋ/ The peaks of prominence: vowels /e ə eɪ ɪ/ Number of syllables: 4
Chest Pulse Theory Based on muscular activities
Sonority Theory Based on relative soundness of segment
within words
![Page 12: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/12.jpg)
“The Sonority of a sound is its loudness relative to other sounds with the same length, stress and speech.”
Languages have sounds associated with them Some sounds are more sonorous Words in a language can be divided into
syllables Sonority theory distinguishes syllables on the
basis of sounds.
![Page 13: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/13.jpg)
Defined on the basis of amount of sound associated
The sonority hierarchy is as follows:- Vowels (a, e, i, o, u) Liquids (y, r, l, v) Nasals (n, m) Fricatives (s, z, f,…..sh, th etc.) Affricates (ch, j) Stops (b, d, g, p, t, k)
![Page 14: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/14.jpg)
Obstruents can be further classified into:- Fricatives Affricates Stops
![Page 15: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/15.jpg)
“A Syllable is a cluster of sonority, defined by a sonority peak acting as a structural magnet to the surrounding lower sonority elements.”
Represented as waves of sonority or Sonority Profile of that syllable Nucleus
Onset Coda
![Page 16: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/16.jpg)
“The Sonority Profile of a syllable must rise until its Peak(Nucleus), and then fall.”
Peak (Nucleus)
Onset Coda
![Page 17: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/17.jpg)
ABHIJEET Sonority Profile 1
A I E E H JB T
Sonority Profile 2 A I E E
H JB T
![Page 18: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/18.jpg)
“The Intervocalic consonants are maximally assigned to the Onsets of syllables in conformity with Universal and Language-Specific Conditions.”
Determines underlying syllable division Example
DIPLOMADIP LO MA & DI PLO MA
![Page 19: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/19.jpg)
First alphabet of IAL – {a} Unstressed and Toneless neutral vowel Sanskrit is phonetically perfect – no neutral
vowels Hindi, Bengali etc. allow schwa to be neutral Some schwas deleted and some are not Schwa deletion – important issue for
grapheme to phoneme conversion
![Page 20: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/20.jpg)
1) Saphalya and Amantrana2) Priya and Tritiya3) Kavya and Ashva4) Badhai5) Samuha and Chehara6) Badara and Kalama7) Kalama and Banda
![Page 21: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/21.jpg)
Developed completely in Java Platform independent Tries to perform syllabification of words Rides on the concepts of Sonority
theory – mainly sonority sequencing principle
Makes use of Java’s Hashmap utility to save execution time.
![Page 22: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/22.jpg)
Consists of three major functions:- SonorityHierarchy() syllabify(String word) accuracy() Delete_schwa() [Under Development]
Stores and references the Sonority hierarchy from the hashmap
Tries to find the syllable boundaries according to their sonority profile
Tries to delete schwas present in the input
![Page 23: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/23.jpg)
Syllabification and PRR generation modules implemented
Number of manually syllabified words – 27614 No. of words fed as input – 27614 No. of words correctly syllabified – 26253 Accuracy obtained – 95.86 % for English and
about 70% for Hindi Accuracy of Schwa deletion in English – 77% Schwa deletion for Hindi is under
developement
![Page 24: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/24.jpg)
Problems faced First rule-based implementation failed Some specific consonant and vowel clusters still
result in erroneous syllabification
Future work Schwa deletion for Hindi and Marathi Implementation of Maximal Onset First principle Packaging the above implementation in a stable
transliteration module to be used further in CLIR
![Page 25: Interlingual word mapping](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815e00550346895dcc43b0/html5/thumbnails/25.jpg)
1) Giegerich, H. J. 1992. English Phonology. An Introduction.
2) Kahn, Daniel. 1976. Syllable-based generalizations in English phonology.
3) Lass, Roger. Phonology: An Introduction to Basic Concepts. Cambridge University Press, 1984