Post on 08-Dec-2018
www.ijar
csa.o
rg
COMPUTATIONAL MODELING OF YORÙBÁ SPLIT-VERBS FOR
ENGLISH TO YORÙBÁ MACHINE TRANSLATION SYSTEM
ELUDIORA SAFIRIYU I. 1 OKUNOLA TUNDE
2 ODETUNJI A. ODEJOBI
3
1, 2 & 3
Department of Computer Science & Engineering, Obafemi Awolowo University, Ile-Ife,
Nigeria.
ABSTRACT
Sentences that contain split verbs pose a serious challenge in the translation process
achine Translation system, due to the fact that they do not follow the pattern of normal
verbs in the sentential form. These split verbs differ from one another in their structure depending on
the part of speech that precede them in the sentence. When translated they generate invalid sentence.
We examined the computational model of the split verbs and developed a software artefact that we
used for the test of the model. We modelled within the content of context free grammar. We used
rule-based approach to develop the software. The results showed that split-verbs can be modelled
computationally. The software codes will be used as a module in the on-going development of
English to .
Keywords: Keywords: split-verb, word, word-swapping, model, linguistics, rule-based
I. INTRODUCTION
People from different nations speak different
languages to communicate, to exchange ideas,
sell products, offer help, formulation and
acceptance of theories etc. There is need for both
parties to understand the language with which
they communicate. The early researchers made
use of human translators which was not efficient
at this time. Later they concentrate exclusively
on translation of scientific text and technical
documents in textual form. This approach led to
the development of machine translation.
The history of machine translation could be
traced backed to seventeen century when ideas
of universal, philosophical languages and of
mechanical dictionaries came to existence. The
first practical suggestions was made in 1933
with two patents issued in France and Russia to
Georges Artsrouni and Petr Trojanskij
respectively both having the patent for
mechanical dictionary. They later went further
with proposals for coding and interpreting
grammatical functions using universal
(Esperanto-based) symbols in a multilingual
translation device. Since then there had been
machine translators for different languages. A
machine translator depending on the language it
might involve the integration of different
structures or aspect of the language to form a
complete machine translation for a language.
, some
verbs differ in pattern when translated
following each other, they are called split verbs.
Example of such verb is believed ( ). Its
form in the sentence can be seen in the examples
below:
1. He believed the boy ( Ọmọ
náà gb )
Also, cheat - r jẹ
2. He cheats Ola ( Ọ ẹ)
Compare with the verbs like dance ( ). Dance is
a non-splitting verb and the sentence below
explains it.
3. Olu danced at the party (Ọ
náà)
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 1 admin@ijarcsa.org
www.ijar
csa.o
rg
Also sang (kọrin)
4. Ade sang at the party ( ọrin ní patí
náà).
Other splitting verbs are rebuke ( ), scatter
( ), lost (sọ ) etc., these verbs and others
follow the same pattern as those in the examples
1 and 2 above.
Translation (MT) arise from the fact that, the
number of words in the sentence increased by
one in the equ which is
not so for the English language verbs.
In addition, if the split verb in the sentence is
followed by a preposition like in (nínú) as we
have in example 5.
5. He believed in God (Ó nínú
Ọ ).
In sentence
machine translation system development. The
development of machine translators
system, the project reported here focused on the
translation of simple sentences in the form
subject-verb-object (SVO). For instance, the
sentence 6 and 7
6. Olu<subject> damaged <verb> the
<determiner>
radio<object>(Olú<subject>ba<verb1>rédí
ó<noun> náà<determiner> j <verb2>)
7. The<determiner>tall<adjective>boy<noun>
believed<verb>in<preposition>his<prono
un>friend<noun>
(Ọmọ
<verb>nínú<pre
position> r <noun>r <pronoun>).
It should be noted that the two languages are
SVO, but word-swapping do occur in among
some
speech. For example, verb ( r ìṣ e).
We applied rule-based approach. This approach
involves linguistic information about source and
target languages in which bilingual dictionary
was used. English language is the source
language (SL) while
target language (TL). Rule based approach
provides the re-write rules to generate the output
for the TL from SL.
II. YORÙBÁ LANGUAGE
-Congo language spoken in
Nigeria and other parts of the world. According
to [1] of over
25 million (South West Nigeria only). Its loan
words are mostly from Arabic, English, Hausa
and Igbo languages. Its dialects include: gbá,
Ìj bú, y /Ìbàdàn, Èkìtì, Ìgbómìnà, Ìj sà,
Ìkál /Òndó and If [2].
According to the International African Institute
[3], the Yorùbá language is used by the media
i.e. the Press, Radio and Television. It is also
used as a language of formal instruction and a
curriculum subject in the primary, secondary and
post-secondary school (including University). It
has a standard orthography.
.
th many
other language groups in Nigeria and in some
African countries; so it has several exonyms
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 2 admin@ijarcsa.org
www.ijar
csa.o
rg
(outside names) like Yáríbà, Yórúbáwá, Nàgó
Ànàgó, Lùkúmì, and Akú [4].
A.
By the definition of grammar in linguistics;
grammar is the set of structural rules that govern
the composition of clauses, phrases,
and words in any given natural language. It
refers to the study of such rules in the fields of
morphology, syntax, and phonology, often
complemented by phonetics, semantics,
and pragmatics.
B. Alphabets
-five alphabets, out of which
seven are vowels and the rest are consonants.
: Aa Bb Dd Ee Ẹẹ Ff Gg GBgb
Ii Hh Jj Kk Ll Mm Nn Oo Ọọ Pp Rr Ss Ṣ ṣ Tt
Uu Ww Yy
(a) - owels: a e ẹ i o ọ u
There are five Nasalised vowels (an, ọn, in, ẹn,
un) and two syllabic nasal vowels (m, n) [5] and
[6]
do)] respectively
[7]
. For example, a word that
has the same form (i.e. vowels and consonants)
can have different meanings depending on the
tones that it has:
Igba -- two hundred
-- calabash
-- time
-- the season when perennial crops
have the least production
-- garden egg
-- climbing rope
The mid tone is usually left unmarked on
vowels. Out of the three basic (high, low and
mid) tones that are attested in the language, only
the high tone cannot occur on a word initial
vowel [7]. This is why potential words such as
those given below are not possible in the
language.
orí (cf. orí) -- a head
i (cf. ) -- a bottle
) -- a curse
(cf. ) -- bitter leaves
cf (common form)
Loan words that have closed syllables in the
source languages are made to conform to the
forms acceptable in the language [7]:
tì -- shirt
k sì -- course
Here, vowel /i/ is inserted to re-syllabify the
coda from the English loan. Consonant clusters
are not allowed in either. Therefore
consonant clusters in the loan words are re-
syllabified. The most common method for
consonant cluster simplification is vowel
insertion. For example, vowel /i/ is inserted to
simplify consonant clusters in.
tì -- slate
sì -- class
dir -- driver
-- trailer
rather it ends with vowel sounds [A E Ẹ I O Ọ U
AN ẸN IN ỌN UN]
C. Morphology
Morphology is the branch of linguistics that
studied the internal structure of words and how
they are formed in a language [8]. Morphology
accounts for word formation in language. The
basic unit of analysis in morphology is called the
„morpheme‟. A morpheme is defined as the
meaningful unit of grammatical analysis, that is,
a meaningful sequence of sound which is not
divisible into smaller unit. have some
productive methods of word derivation. The
main morphological processes in the language
include: affixation, compounding and
reduplication.
D. Affixation
uses pre-fixation and in-fixation to
derive new words. Each of the Yo oral
vowels (except /u/ in the standard dialect) can be
used as a prefix to derive a new word. Each of
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 3 admin@ijarcsa.org
www.ijar
csa.o
rg
the usable six oral vowels (a, e, ẹ, i, o, ọ) has
two forms as a prefix: mid toned and low toned.
They are attached to verbs to derive nouns as
follows:
Low toned prefixes
+ d „to be soft‟ = d -- idiot
ì + „to break‟ = ì -- poverty
+ gún „to pierce‟ = gún -- thorn
Mid toned prefixes
ẹ + „to carry‟ = ẹ -- load
ọ + dẹ „to hunt‟ = ọdẹ -- hunter
a + „to sieve‟ = a -- sieve
Infixes are (usually) inserted between two forms
of the same word to derive a new word:
„house‟ + kí + ( ) -- a bad
house / any house
ọmọ „child‟ ọmọ + kí + ọmọ (ọmọk mọ)
-- a bad child
D. Compounding
also derive new words by combining
two independent words:
ẹran „meat‟ + oko „farm‟ = ẹranko --
animal
„mother‟ + ọkọ „husband‟ = iyakọ --
mother-in-law
E. Reduplication
derive nominal items/adjectives from
verbs through a partial reduplication of verbs.
New nouns can also be derived by a total
reduplication of an existing noun.
jẹ „to eat‟ = jíjẹ -- edible
„to cook‟ = -- cooked
ọmọ „child‟ = ọmọọmọ -- grand-children
„mother‟ = ì -- grand-mother
III. ENGLISH AND YORÙBÁ
The parts of speech that are attested in
include: verbs, nouns, adjectives, prepositions,
pronoun etc as we have in English language.
A. r Orúkọ (Noun)
A noun in as we have in English
language is the name of a person, animal, place,
things etc. A noun can take two forms in
language structure; it can be the subject (the
performer of an action in) or the object (receiver
of an action) in a sentence. For example;
Olú - name of a person
Ìbàdàn – name of a place
Ewúr – Goat (animal)
B. pò r Orúkọ (Pronoun)
Pronouns are words used in place of nouns.
For example:
àwa – we
àwọn – they/them
Ó – he/she/it ( has no gender
pronoun as we have in English)
C. r Ìṣ e (Verbs)
A verb is a word in a sentence expressing the
action performed by the subject or received by
the object. It is known as “ r ìse” in .
verbs can be monosyllabic as we have in
the case of:
lọ -- to go
n -- to sleep
-- to die
-- to break
Also it can be more than one syllable as we have
in the case of:
-- to forget
t -- to follow
-- to insult
Some of the verbs are discontinuous
morphemes. They are called splitting verbs in
the traditional grammar [9].
j -- to get spoiled/ to damage
n -- to introduce
does not mark any agreement between
the verb and the number feature of the nouns:
Like/likes – f ràn
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 4 admin@ijarcsa.org
www.ijar
csa.o
rg
D. Splitting Verbs
According to [9], “when splitting verbs used
with an object, each verb in this class is always
split into halves and the object is inserted
between them”. Reference [10] reported the split
verbs in relation to work of [11] and [12] which
analyse further the work of [9]. However,
Reference [10] observed that split verb usually
give idiomatic expression in the
equivalent meaning. Our research reported in
this paper is limited to context free grammar.
For example: Case 1
1. Ade damaged the radio
2. Ó ba rédìò náà j
Also:
1. Taye introduced
2. Táyé fi Adé hàn, according to [13].
From the example above, the English word
“damaged” was spitted to “ba” and “j ” in the
context. This makes the translation
difficult. A more complex situation is
encountered in the following examples:
Case 2
1. Taye introduced Ade to me (Táyé fi
Adé hàn mí)
Also
2. He reported Bayo to me (Ó fi ẹj
Báy sùn mí)
From the two examples above, the second
syllable does not assume the last position as in
case 1. Some splitting verbs do split to two or
more syllables in its context. Other
examples of splitting verbs include:
Helped --“ran” + “lowo”
Hungry --“n” + “pa”
Reference [14] and [15] described An example
of how verb splits can be found in the Bilingual
N-gram Statistical Machine Translation [14 and
15] implemented a translation model based on
the finite-state perspective, which is used along
with a log-linear combination of four additional
feature functions [16].
Moreover, [17] presents a lexical decomposition
of composite forms within the framework of a
lexical morphology vis-à-vis using his
knowledge as a native speaker of the language.
He found that some complex verbs have had
their forms modified, thereby causing some
disturbances in the recognition of their
component forms.
IV. Machine Translation Approaches
Statistical machine translation is a machine
translation paradigm where translations are
generated on the basis of statistical models
whose parameters are derived from the analysis
of bilingual text corpora [18].
Statistical machine translation (SMT) is
characterised by the use of machine learning
methods, for example, Hidden Markov Model
for POS tagging. In less than two decades, SMT
has come to dominate academic MT research
and has gained a share of the commercial MT
market [19].
Hybrid machine translation (HMT) leverages the
strengths of statistical and rule-based translation
methodologies [20].
Rule-based Machine Translation (RBMT) also
known as `Knowledge-based Machine
Translation', `Classical Approach' of MT is a
general term that denotes machine translation
systems based on linguistic information about
source and target languages. Basically the
linguistic information can be retrieved from
(bilingual) dictionaries and grammars covering
the main semantic, morphological and syntactic
regularities of each language [21] and [22]
Reference [23] presented rule based machine
translation (RBMT) approach for English to
Bangla. The Authors used of fuzzy rules that is,
the if- then rules and the formation of bilingual
dictionary for the languages.
Reference [24] Shallow-transfer MT system
from Swedish to Danish was developed.
Transfer-based MT model based on Apertium
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 5 admin@ijarcsa.org
www.ijar
csa.o
rg
platform was used. The following steps were
used to develop the system 1) Resources 2)
Analysis and generation 3) Disambiguation 4)
Lexical transfer 5) Syntactic transfer 6) Status.
V. SYSTEM ARCHITECTURE
The software architecture of a system is the set
of structures needed to reason about the system,
which comprise software elements, externally
visible properties, relations among them, and
properties of both. The term also refers to
documentation of a system's "software
architecture". The architecture explains the
decomposition of system into sub-systems. This
software contains
A. The User Interface
The user interface is important in most
application software because the user will make
use of it to interact with the system. The test on
any software starts at the interface level; failure
at interface might lead to the condemning the
software. The interface contains two textboxes,
two labels, a list-box, file-menu and a button.
The labels give the description to the textboxes.
The first label contains the description: “English
sentence” and the second label: “
sentence”. One of the textbox is design to accept
the user input that is; English sentence, the
second textbox displays the translation of the
inputted sentence in and is not editable.
Also a list box which displays the word(s) in the
input sentence that is (are) not in the database. A
file menu that will allow the user to view the
words in the database, add to list of words, edit
and delete words. Finally a button labeled
“Translate” which handle the event of the
translation process.
B. Software Modules
The modules can also be called classes or
functions which perform various operations
leading to the output of the translation. They
include
1) Parts of Speech (POS) Tagging
It splits or tokenizes the input sentence to
separate words which have meaning in the
English context. For example the sentence: “He
damage the radio” will be tokenize to an array
of words,
He<Prn>damage<V>the<Det>radio<N>.
These words were tagged manually. Each word
in the word array its corresponding part of
speech then returns the pattern to the parser to
rearrange the pattern. This function also gets the
translation of each word in the array. The
translations, the part of speech and the words are
stored in a dictionary for easy referencing in
other part of the program. The structure of the
dictionary can be seen in the illustration below:
English sentence: “He damaged the radio”
He<Prn>damage<V>the<Det>radio<N>
equivalent sentence:
Ó<Prn>ba<V1>àga<N>náà<Det>j <V2>
The verb in the content of split verbs had
already qualified the split verbs as verb so there
is no need to store it part of speech in the
database rather we stored the two parts it splits
to in the database.
C. Lexicon
In computation linguistics, lexicon supplies
paradigmatic information about words, including
parts of speech, labels, irregular plurals and sub
categorization information for verbs.
Traditionally, lexicons are small and constructed
largely by hand. There is a growing concern that
effective natural language processing requires
increased amount of lexical information. A
recent trend has been the use of automatic
techniques applied to large corpora for the
purpose of acquiring lexical information from
text [25].
D. System Database
The database is responsible for storing the
English and words and their
parts of speech. This collection of words and
their meaning can also be called corpora. The
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 6 admin@ijarcsa.org
www.ijar
csa.o
rg
database contains two tables one for the splitting
verbs and the other one for words that are used
with the splitting verbs. Figure 1 shows the
words, translation and POS used with the
splitting verbs and figure 2 shows the splitting
verbs, their translations; the views are presented
in form of screen shot of the words in a notepad.
Figure 1. Words frequently used with
splitting verbs
Figure 2 Splitting verbs table
VI. SYSTEM DESIGN
Automata is the study of abstract machine and
how they are used to solve computational
problem. It is used in application areas like;
communication systems, compiler, computer
hardware system, numerical computation etc. its
application in relative to this project work is the
compiler. A language translator is a form of
compiler that takes input sentence from a source
language and maps it to sentence in the target
language.
The automata structure of the sentence is a
context free grammar (CFG). It is the most
common way of modeling constituency.
CFG = Context-Free Grammar = Phrase
Structure Grammar = BNF = Backus-Naur Form
Grammar (G) is a mechanism to describe the
language; it is used to de
sentences with respect to
the context free grammar can be described as a
four tuple grammar given as:
G = {T, N, S, R}
Where:
T is set of terminals (lexicon)
N is set of non-terminals For NLP, we usually
distinguish out a set P N of pre-terminals
which always rewrite as terminals.
S is start symbol (one of the non-terminals)
R is rules or productions of the form X → ,γ
where X is a non-terminal and γ is a sequence
of terminals and non-terminals.
Production is used to specify how a grammar
transforms one string to another thus defining a
language associated with a grammar.
T = {a noun, a verb, a pronoun, a
preposition, determinant or article, adjective e.g.
He as a pronoun, Damaged as a verb, the as a
determinant, In as a preposition}.
N = {NP = noun phrase, VP = verb phrase, PP =
prepositional phrase also DET(determinant),
N(noun or pronoun), V(Verb), P(preposition),
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 7 admin@ijarcsa.org
www.ijar
csa.o
rg
ADJ(adjective), which will be replaced by
corresponding terminal symbols.
and
is either N or DET in English}.
P = {this is the rule of sentence
formation}.
The table 1 below shows the sentence formation
rule, the decomposition and the arrangements of
non-terminal .
The sentence structures can also be represented
with a transition graph. It is a graph that consists
of three things:
A finite set of state
Input strings form non-terminal symbols
A finite set of transition that shows how to
move from one state to another depending
on the input string.
Figures 3 and 4 below show the transition graph
of English and automata structure
respectively. The graphical representation of the
English structure has one verb and the verb does
not split. The English structure can be used
generate many sentences by following the
transitions. It can only be used for a simple
sentence because it has only one verb. There are
two phrases: NP + VP that make a sentence.
There is provision for adjectival phrase which
can take care of noun qualifier. Figure 4
describes the -
-
. Figure 4 shows how we
can realise the translation of simple sentences
using the finite state diagram. The rules were
used to design the state diagram. It is possible to
qualify subject and object in the diagram shown
in figure 4. For example, the tall boy damaged
the red car - ọmọkùnrin gíga náà ba ọk pupa
náà j . We tested the rules and the transition
using the JFLAP shown in figure 5.
Table 1 Sentence Formation Rules
English productions productions
S> →
<NP><VP>
<VP> →
<V><NP>
<NP> →
<DET><N>
<NP>→
DET><ADJ><N>.
<NP> → <N>
<S>→ <NP><V1P><V2>
<V1P> → <V1><NP>
<NP> → <N><DET>
<NP>→<N><ADJ><DET>
<NP> → <N>
English Structure
Figure 3 Transition graph for English
sentence structure
structure
Figure 4. Transition graph for
sentence structure.
N/P
rn
De
t
V2
<S>
<VP
>
Ad
j
De
t
<NP> De
t
Adj
N N
2
6
St
op N V1 4 3 5
7 S
t
a
r
t
t
<NP>
<NP>
<S>
<VP>
Det
Stop
Adj
N Det
N V N/Prn
4 3 5 Start
2
<NP>
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 8 admin@ijarcsa.org
www.ijar
csa.o
rg
The diagrams above show the transition graphs
for English and , it represents the
sequence of arrangement of part of speech to
form the production of a sentence. Examples
are:
1 Olu believed God →< N V N>
1(a) Ọ Ọ → < N V1 N V2
2. He damaged the radio →< N V Det N >
2(a) →< N V1 N Det V2 >
3.The tall boy damage the box
→<Det AdjNVDetN >
3(a)Ọmọ
→<NAdjDetV1NDetV2 >
4. Ade cheated the tall boy →<N V Det Adj N >
4(a) Adé r ọmọ ẹ
→< NV1NAdjDetV2 >
Pronouns are treated as noun because they
followed the same pattern in the sentence. The
examples above shows some sequence of
NV1V2, Adj, Det being combined together to
generate a valid sentence. These can be
concatenated following the transition graphs
above to generate a valid sentence for the two
languages. We checked the validity of the re-
write rules in Table 1 using the JFLAP as shown
in figure 5.
Figure 5 JFLAP sample demonstration of the
re-write rules
VII. System Implementation
System implementation deals with the process of
converting system specification to executable
system for the case of software, in the case of
hardware; it involves the construction of
components that will make up the system (gotten
from specification). The process of
implementation involves the total steps taken
from the analysis of the problem to the
production of the executable file. Parts of the
processes include gathering the requirements,
gathering the tools, managing the tools and the
requirements to obtain the result. Some of the
tools used in the implementation of this
application include: Python IDE, wxPython,
Sqlite3 and py2exe.
Python IDE provides an interface for writing
python code, wxPython is a package integrated
to python IDE for building the GUI, sqlite3 also
a package built in with python installer; it
provides the database functionality for the
application and py2exe is used in compiling
python codes to executable (.exe) file. The entire
program code was developed in line with
program design.
A. System Description
. The
system will be described in term of its data
format, requirements for populating the database
and the user interface.
B. System Usage
The user can only enter English sentence in the
space provided then strike the enter button or
click the translate button. If the user does not
enter anything in the space provided for English
sentence before he/she strike the enter button or
clicks translate button; the application will
simply popup a message box telling the user
that he/she has not entered any text.
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 9 admin@ijarcsa.org
www.ijar
csa.o
rg
C. Requirements for Populating the Database
Populating th
words; the database was populated with
keyboard that has the capability of tone marking.
So, it will be necessary for the user to use
keyboard with this capability in case there will
be need to add to the words in the database.
D. User Interface
As it was previously mentioned, the user
interface has three textboxes; one of the textbox
accept user input (English sentence), the second
displays the translation of the inp
sentence in a well arranged format based
on the grammar. Also the user interface has a
button labeled “Translate” and a menu for
performing activities like; viewing the database,
editing the database and performing exiting
function.
Figure 6 shows the user interface for the
application, figure 7 shows the popup message it
will show when the user tries to translate an
empty text, followed by the translation of
sentences in figures 8 and 9.
Figure 6. User interface for the application
Figure 7. Popup error message for empty
string
Figure 8. Sample Sentence
All these and many more can be translated with
the translator. One important thing is just for the
user to include a split verb in his sentence;
because the structure of the system is built
around split verbs.
E. Error Handling
This is another important aspect in application
development. Error handling is important in the
development of software to prevent unnecessary
behavior such as; hanging and halting when
unnecessary input is inputted to the system. Two
approach were used in this application; the first
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 10 admin@ijarcsa.org
www.ijar
csa.o
rg
one is the management of empty text for
translation, figure 6 already shows the nature of
the popup error message when this is done.
Another approach used was to notify the user
whenever the system comes across a string that
is not in the database. Figure 9 shows a snapshot
of the popup message. This error technique only
allow word which their part of speech is
suggested to be “Noun”; when ok is clicked the
exact word will reflect in the translates sentence
otherwise the system will give a wrong
translation or not give any translation of the
sentence. This was done to guild against the
population of database with names (of: object,
place, things).
Figure 9. Popup menu for a word not in the
database
CONCLUSION
This model will be integrated to English to
machine translation system. This
subsystem modelled can translate simple
sentences that have subject verb object. Again
simple sentence is expected to have one verb. It
cannot handle compound and complex
sentences. There are other concepts that we have
to do separately and then integrate with the
system. Some of these concepts are: tone
changing verbs. Ambiguity, numbering system
etc. These and other concepts will be handled in
future.
References
1. NPC (National Population Commission),
2006 Census. url: www.population.gov.ng,
(Accessed: 25/06/2012).
2. F. A. Fabunmi, “A GPSG Structure of
Aspect in Àkókó” in a book titled
Current Perspectives in Phono-Syntax and
Dialectology, Ghana, 2009, Pp: 159.
3. International African Institute, “Provisional
Survey of Major Languages in the
Independent States of Sub-Saharan Africa”,
P. Baker (ed.). UNESCO: International
African Institute, 1980.
4. F. A. Fabunmi, and A. S. Salawu, “Is
Yorùbá an Endangered Language?”, Nordic
Journal of African Studies 14(3): 391-408,
2005, url: http://www.njas.helsinki.fi/pdf-
files/vol14num3/fabunmi.pdf [accessed:
22/11/2014]
5. A. Bamgbose, A Short Yoruba Grammar.
Heinemann Educational Books Ltd, Ibadan,
First edition, 1967.
6. L. O. Adewole, The Yoruba Language:
Published Works and Doctoral
Dissertations. Helmut Buske, Hamburg,
1987.
7. K. Owolabi, Ìjìnl Ìtúpal Èdè Yorùbá (1)
Fòn tíìkì ati Fon lọjì, Extension
Publications Limited, Molete, Ibadan, 2009.
8. J. Lyor, “Language and Linguistics: an
introduction” United Kingdom Cambridge
University press, 1981
9. O. Awobuluyi, “Essentials of
Grammar”, Oxford University Press, Ibadan,
1978.
10. A. Howell, “Abstracting over Degrees in
Yoruba Comparison Constructions1”,
Universität Tübingen, 2012, url:
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 11 admin@ijarcsa.org
www.ijar
csa.o
rg
http://semanticsarchive.net/sub2012/Howell.
pdf [accessed: 22/11/2014].
11. O. Awobuluyi, The Yoruba Verb Phrase. In
A. A. (Ed.), Yoruba language and literature,
University of Ife Press, 1982.
12. O. G. Bode, Yorùbá Clause Structure. Ph. D.
thesis, University of Iowa, Iowa City, 2000.
13. J. D. Atoyebi, “Complex Verbs and Valency
Classes in Yorùbá, Workshop on Valency
Classes”, MPI-EVA, Leipzig, Aug. 2010,
Pp: 1-12, [Accessed: 13/11/2014].
14. A. De Gispert, and J. B. Mari˜no, “Using X-
grams for speech-to-speech translation” In
Proceeding. of the 7th Int. Conf. on Spoken
Language Processing, 2002.
15. A. De Gispert, J. B. Mari˜no, and J.M.
Crego, “TALP: X-gram-based spoken
language translation system”, Proceeding of
the Int. Workshop on Spoken Language
Translation, Kyoto, Japan, October, 2004,
pp.:85–90.
16. J. M. Crego, J. B. Mari˜no, and A. De
Gispert, “A Ngram-based Statistical
Machine Translation Decoder”, Submitted
to INTERSPEECH, 2005.
17. J. A. Ogunwale, “Problems of Lexical
Decomposition: The Case of Yoruba
Complex Verbs”, Nordic Journal of African
Studies vol: 14, issue: 3, 2005, Pp: 318–333.
18. W. Weaver, Translation, in Machine
Translation of Languages. MIT Press,
Cambridge, 1955.
19. A. Lopez, “Hierarchical Phrase-Based
Translation with Suffix Arrays” In
Proceedings of the 2007 Joint Conference on
Empirical Methods in Natural Language
Processing and Computational Natural
Language Learning (EMNLP-CNLL,
Association for Computational Linguistics,
url: http://www.aclweb.org/anthology/ D/ D
07 /D07-1104, (Accessed: 23/06/2011).
20. A. Boretz, A. (2009). AppTek Launches
Hybrid Machine Translation Software.
Magazine, Speech Technology, 2009, url:
http://www.speechtechmag.com/Articles/Ne
ws/ News-eature/AppTek-Launches-Hybrid-
Machine-Translation-Software-52871.aspx
(Accessed:04/08/2011).
21. F. Bond, “Introduction to Machine
Translation", url:www.cs.mu.oz.au /research
/it /nlp06/ materials/Bond /mt-intro.pdf,
(Accessed: 02/ 03/ 2011), 2006.
22. M. Osborne, “Machine Translation (MT)
History and Rule-Based Systems”, School of
Informatics, University of Edinburgh, 2012.
23. C. P. Francisca, S. L. Virach, C. P. Anee,
“Improving Translation Quality of Rule
Based Machine Translation”. Proceeding of
the workshop on Machine Translation; Asia,
2002.
24. F. M. Tyres, and J. Nordfalk, “Shallow
transfer RBMT for Swedish to Danish”,
Proceeding of the First International
Workshop on free/open-source RBMT, Pp.
27-33, 2009.
25. U. Zernike, “Lexical acquisition: Exploring
on-line resources to build a lexicon
Hillsdale”, NJ: Lawrence Erlbaum
Associates, 1991.
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.
www.ijarcsa.org 12 admin@ijarcsa.org