Summer Research Project (Anusaaraka) Report
-
Upload
anwar-jameel -
Category
Software
-
view
24 -
download
0
Transcript of Summer Research Project (Anusaaraka) Report
1
Abstract
A n us aa r ak a i s an En g l i s h – H in d i l an gu age acce ss i n g so f tw a r e . Wi t h i ns i gh t s f rom
P an in i ' s A sh t adh ya yi ( G r am mar ru l es ) , An us aa r ak a i s a m ach in e t r ans l a t io n too l
b e in g d ev e l op ed by t h e Ch i nm a ya In t e r n a t i on a l Fou nda t i on (C IF) , In t e rn a t io n a l
In s t i t u t e o f In f o rm at io n T echn o l o gy, H yd e r ab ad ( I I I T - H ) an d Un iv e rs i t y o f
H yd e r ab ad ( D ep a r tm en t o f Sans k r i t S t ud ie s ) . Fus io n o f t r ad i t i on a l In d i an s h as t r as
an d adv an ced mo d er n t e ch no lo g i es i s w h a t An us aa r ak a i s a l l abo u t .
A n us aa r ak a a l lo ws u s e r s t o a ccess t ex t i n an y In d i an l an gu age , a f t e r
t r an s l a t io n f r om th e so u r ce l an gu age ( i . e . En g l i sh o r an y o t h e r r eg io na l In d i an
l an gu age ) . In t od ay ' s In f o r m at io n A ge l a r ge v o lu m es o f i n f o rm at i on i s av a i l ab l e i n
E n g l i sh – w he th e r i t b e i n f o r m at i on f o r co mp e t i t i v e ex ams o r ev en gen e r a l r e ad ing .
H o w ev e r , a l o t o f t he ed u ca t ed m ass e s w ho se p r im a r y l an gu age i s Hin d i o r a
r eg i on a l In d i an l angu age a r e un ab l e t o a cces s i n fo r ma t io n i n En g l i sh . A nus aa r ak a
a im s to b r i d ge th i s l an guage b a r r i e r b y a l l o win g a u s e r t o en t e r an En g l i s h t ex t
i n t o A nu s aa r aka an d ge t t h e t r ans l a t i on o f t h e s ame in an In d i an l an gu age . T h e
A n us aa r ak a b e i n g re f e r r ed t o h e r e h a s E n g l i s h as t h e s our ce l an gu age and Hi nd i as
t h e t a r ge t l an gu age .
A n us aa r ak a d er iv es i t s n am e f r om t he S ans kr i t wo r d ‘ An us ar an ’ w h i ch
m eans ‘ t o f o l low ’ . I t i s so ca l l ed , a s t h e t r ans l a t ed Anu s aa rak a o u tp u t app ea r s i n
l a ye r s – i . e . a s equen ce o f s t ep s t h a t f o l lo w each o th e r t i l l t he f i n a l t r an s l a t io n i s
d i s p l ayed t o t h e u se r .
2
International Institute of Information
Technology (IIIT), Hyderabad
T h e In t e r n a t i on a l In s t i t u t e o f In f o r ma t i on T ech n o lo g y, H yd e r ab ad ( I I IT - H ) i s an
au to nom ou s un iv e rs i t y f o u n d ed in 199 8 . I t w as s e t up a s a n o t - fo r -p r o f i t p u b l i c
p r iv a t e p a r t n e r sh ip ( NPPP ) an d i s t h e f i r s t I I IT t o b e s e t u p ( un d er t h i s mo d e l ) i n
In d i a . T h e Go ve r nm en t o f A nd h r a P rad esh l en t su pp o r t t o t h e i n s t i t u t e b y g r an t o f
l and and bu i ld in gs . A Go ve r n i n g C ou n c i l con s i s t i n g o f emin en t p eo p l e f rom
acad em i a , i n du s t r y an d go v e rnm en t p re s id es o v e r t h e go ve r n ance o f t h e i n s t i t u t i on .
I I IT - H w as s e t u p a s a r es ea rch u n i ve r s i t y f o cu s ed on th e co r e a r eas o f
In f o r m a t i on T echn o l o g y, s u ch as C om put e r S c i en ce , E l ec t r on i c s and
C omm un ica t io ns , an d t h e i r app l i c a t i on s i n o t h e r do m ain s . Th e i ns t i t u t e ev o l ved
s t ro n g r e s ea r ch p r ogr am s in a h os t o f a r ea s , w i t h comp uta t i on o r IT p r o v i d in g t h e
co nn ec t in g th r ead , an d wi th an emph as i s on t h e d ev e lo pm ent o f t e ch no lo g y an d
ap p l i c a t io ns , w h i ch can be t r an s f e r r ed fo r us e t o i nd us t r y an d so c i e ty . T h i s
r eq u i r ed ca r r yi n g o u t b as i c r e s ea r ch t h a t c a n b e us ed to s o lv e r ea l l i f e p r o b l em s .
A s a r es u l t , a s yn e r g i s t i c r e l a t i on sh ip h as com e t o ex i s t a t t h e In s t i t u t e b e tw een
b a s i c an d app l i ed r e s ea r ch . Facu l t y c a r r i es o u t a n umb er o f a cad emi c in d us t r i a l
p r o j ec t s , an d a f ew com p an i e s h ave b een in cub a ted b a se d on t h e r e s ea r ch d on e a t
t h e In s t i t u t e .
I I IT - H i s o r gan ized a s r e sea r ch cen t e r s and l abs , i n s t e ad o f t he
co nv en t i on a l d ep a r t men t s , t o f a c i l i t a t e i n t e r - d i s c ip l in a r y r e s ea r ch an d a s eaml e ss
f l ow o f kn o wl ed ge w i th i n th e In s t i t u t e . Facu l t y a s s i gn ed t o t h e ce n te r s an d l abs
co nd u c t r es ea r ch , a s w e l l a s a cad emi c p r o gr ams , w h i ch a r e o w n ed b y t h e In s t i t u t e ,
an d no t b y i n d i v i dua l r e s ea r ch cen t e r s .
3
Machine Translation
M ach in e Tr ans l a t io n i s an im po r t an t t e chn o l o g y f o r l o ca l i z a t i on , an d i s
p a r t i cu l a r l y r e l ev an t i n a l i n gu i s t i c a l l y d i v e r s e c ou n t r y l i k e In d i a . H uman
t r an s l a t io n in In d ia i s a r i ch and an c i en t t r ad i t i on . Wo r ks o f p h i l os oph y, a r t s ,
m yt h o l o g y, r e l i g i on , s c i en ce an d f o lk lo r e h ave been t r an s l a t ed amo n g th e an c i en t
an d m od e r n In d i an l an gu ages . Nu m ero us c l as s i c w o rk s o f a r t , an c i en t , med i ev a l
an d m od e rn , h av e a l so been t r an s l a t ed b e t w een Eu r opean and In d i an l an gu ages
s in ce th e 1 8t h
cen t u r y . In t h e cu r ren t e r a , h um an t r an s l a t io n f i nd s app l i c a t ion
m ai n l y i n t h e adm in i s t r a t i on , m ed ia an d ed u ca t i on , an d to a l e s s e r ex t en t , i n
b us in e ss , a r t s an d s c i en ce an d t e ch no l o g y. In d i a h as a l i n gu i s t i c a l l y r i ch a r ea — i t
h a s 1 8 co ns t i t u t i ona l l an gu ages , wh ich a r e w r i t t en i n 1 0 d i f f e r en t s c r i p t s . H i nd i i s
t h e o f f i c i a l l an gu age o f t h e Un io n . E n g l i sh i s v e r y w i d e l y u s ed in t h e m ed ia ,
co mm er ce , s c i en ce an d t e ch no l o g y a n d edu ca t io n . M any o f t h e s t a t e s h av e th e i r
o w n r egi on a l l an gu age , w h i ch i s e i t h e r Hi nd i o r on e o f t h e o t he r con s t i t u t io na l
l an gu ages . On l y ab o u t 5 % o f t h e po pu l a t i on s peaks E n gl i sh . In s u ch a s i t u a t io n ,
t h e r e i s a b i g m ar k e t fo r t r an s l a t ion b e t ween E n gl i s h an d th e v a r i ou s In d i an
l an gu ages . Cu r r en t l y, t h i s t r ans l a t io n i s e s s en t i a l l y m anu a l . Us e o f au t om at io n i s
l a r ge l y r e s t r i c t ed t o w o rd p ro ces s ing . T w o s p ec i f i c ex ampl e s o f h i gh v o l ume
m anu a l t r an s l a t i on a r e — t r ans l a t i on o f n ews f ro m En g l i sh i n to lo ca l l angu ages ,
t r an s l a t io n o f annu a l r epo r t s o f go ve r nm en t d ep a r tm en t s an d p ub l i c s ec to r un i t s
am on g , E n g l i s h , Hin d i an d t h e lo ca l l an gu age .
A s i s c l e a r f rom abo v e , t h e ma r k e t i s l a r ge s t fo r t r ans l a t io n f ro m E n g l i sh
i n t o In d i an l an gu ages , p r im a r i l y H i n d i . H en ce , i t i s no s u rp r i s e t ha t a m ajo r i t y o f
t h e In d i an M ach i ne T r ans l a t io n (M T) s ys t ems a re f o r E n g l i sh - Hin d i t r ans l a t ion .
N a tu r a l l an gu age p r o ce ss i n g p r e s en t s m an y ch a l l en ges , o f w h i ch th e b i gges t i s t he
i nh e r en t am bi gu i t y o f n a tu r a l l an gu age . M T s ys t ems h ave t o d ea l wi th ambi gu i t y,
an d va r io us o th e r N L p h en o m en a . In ad d i t i o n , t h e l i n gu i s t i c d i ve r s i t y b e t w een t he
s ou r ce an d t a r ge t l an gu age m ak es M T a b i gge r ch a l l en ge . T h i s i s p a r t i cu l a r l y t r ue
o f wi d e l y d i v e r gen t l an gu ages s uch as E n gl i s h and In d ian l an gu ages . Th e m aj o r
s t ru c t u ra l d i f f e r ence b e tw een E n g l i sh an d In d i an l an guages can b e s umm ar iz ed a s
4
f o l lo w s . E n gl i s h i s a h i gh l y p o s i t i on a l l an gu age wi t h r u d im en t a r y m o r ph o l o g y,
an d d e f au l t s en t en ce s t ru c tu r e . In d i an l an gu ages a r e h i gh ly i n f l e c t i on a l , w i th a r i ch
m o rp ho l o g y, r e l a t i v e l y f r e e w o r d o r de r , and d e f au l t s en t en ce s t ru c t u r e . In ad d i t i o n ,
t h e r e a r e m an y s t y l i s t i c d i f fe r en ces . Fo r ex amp le , i t i s co mmo n t o s ee v e ry l o n g
s en t en ces i n E n gl i s h , u s i n g ab s t r ac t con cep t s as t h e su b j ec t s o f s en t en ces , and
s t r i n g i n g s ev e r a l c l au s es t o ge th e r ( a s i n t h i s s en t en ce ! ) . Su ch con s t ru c t io ns a r e n o t
n a tu r a l i n In d i an l an gu ages , and p re s en t m ajo r d i f f i cu l t i e s i n p ro du c i ng go o d
t r an s l a t io ns .
A s i s r e co gn iz ed the w o r ld o v e r , wi t h t h e cu r r en t s t a t e o f a r t i n M T, i t i s
n o t p os s i b l e t o h ave Fu l l y A u t om at i c , H i gh Q ua l i t y , an d G en er a l - Pu rp os e M ach ine
T r ans l a t io n . P rac t i c a l s ys t ems n eed to h an d l e am bi gu i t y an d t h e o th e r compl ex i t i e s
o f na tu r a l l an guage p ro ce ss i n g , b y r e l ax in g o n e o r mo r e o f t h e ab ov e d im en s i ons .
T h us , w e can h ave au t om at i c h i gh -q u a l i t y ‘ s u b - l an guage ’ s ys t ems fo r s p ec i f i c
d om ai ns , o r au tom at i c gen e r a l -p u rp os e s ys t ems g i v i n g r o u gh t r ans l a t io n , o r
i n t e r ac t i v e gen e r a l - p u rp os e s ys t em s wi th p r e o r po s t ed i t i n g .
Wh y M ach i n e T r ans l a t i on?
T o da y t e ch n o l o g y h a s mad e i t po ss ib l e f o r i n d iv id ua l s w o r l d wid e to a cce s s l a r ge
v o l um es o f i n f o rm at io n a t t h e c l i ck o f a b u t t on . H o w ev e r , v e r y o f t en t he
i n fo rm at io n so u gh t m ay n o t b e in a l an gu age th a t t he i nd i v i du a l i s f ami l i a r w i t h .
T h us , M ach in e T r an s l a t io n i s an endeav o r t o m inim iz e t h e l angu ag e ba rr i er , b y
m ak in g i t p os s ib l e t o a cces s a t ex t i n t h e l an gu age o f o ne ' s cho i ce . Fo r t e ch n o lo g y
t o b e ab l e t o p ro v id e t h e ab ov e f ac i l i t y , m an y a s p ec t s o f l an gu age a r e i n vo lv ed .
T o n am e a f ew :
• S cr ip t
• Sp e l l i n g
• V o cab u l a r y
• Mo r ph o l o g y
• S yn t ax
5
K eep i n g th e abo ve i n mi nd , m ach in e t r ans l a t i on s ys t em s n eed t o be
eq u i pp ed to t r ans l a t e a t ex t wi t h i n s eco nd s an d ye t c ap tu r e t h e i n fo rm at ion o f t he
t ex t t o t he b e s t p oss ib l e ex t en t .
6
Anusaaraka
T h e f o cu s in An usaa r ak a i s n o t m ai n l y o n m ach i ne t r an s l a t io n , b u t o n Lan gu age
A ccess be tw een In d i an l an gu ages . U s i n g p r in c i p l es o f Pan in i an G r am m ar (P G ) , and
ex p l o i t i n g t h e c l ose s im i l a r i t y o f In d i an l an gu ages , Anu s aa rak a es s en t i a l l y m ap s
l o ca l w or d g ro up s b e t w een t h e so urce and t a r ge t l angu ages . Wh er e the r e a re
d i f f e r en ces b e tw een th e l an gu ages , t h e s ys t em i n t ro d uces ex t r a no t a t io n to
p r e s e r v e th e i n fo rm at io n o f t he sou r ce l an guage . Thu s , t h e u s e r n eeds som e
t r a i n i n g t o un de r s t and th e ou t pu t o f t h e s ys t em . The p r o j ec t h as d eve l op ed
Lan gu age A cces s o rs f rom m an y In d i an l an gu a ges in t o Hi nd i .
A n us aa r ak a m aps co ns t r u c t i o n s in t h e s ou r ce l an gu age to t he
co r r es po nd in g co ns t ru c t i on s i n t he t a r ge t l an gu age w h e r ev e r po ss i b l e . For
ex ampl e , a no un o r p ro no un in t h e so u r ce l an gu age i s m app ed to an app ro p r i a t e
n o un o r p ro no un , r e s p ec t i v e l y, i n t h e t a r ge t l an gu age as s ho w n b e lo w:
@ H : Ap a p us t ak a paD h a_r aH A_[ HE | t h A ] _ k yA { 2 3 _b a .}?
! E : Yo u bo ok r ead _ i n g_[ i s |w as ] Q .?
E : A r e / w er e yo u r ead in g a b oo k?
( Wh er e th e p r e f ix es m ean t h e fo l l ow in g :
@ H =anu saa r aka Hin d i , ! E =E n gl i s h g lo s s , E =E n g l i s h . )
In t h e ex amp l e abo v e , t h e l a s t w o r d i n t h e s en t en ce i s a v e r b and i l l u s t r a t es t h e
m app i n g mo r ph eme b y m o r p h em e: t h e r oo t i s m ap p ed to ' p aD h a ' ( r e ad ) , and
s imi l a r l y t h e t en se - a s pec t - mo d a l i t y ( T A M) l ab e l i s map p ed t o ' r aH A_[ H E | t h A ] '
( i s_ * in g o r w as _* in g) , w h i ch i s f o l l ow ed b y 'A ' s u f f ix w h ich ge t s m app ed t o ' k yA '
( w h a t ) a s a q ues t i on m ark in Hi nd i . Gen d e r , nu mb e r , and p e r so n (G NP ) i n fo r m at ion
i s a l so s ho wn s ep ar a t e l y i n cu r l y b r a ck e t s ( ' { 23 _b a .} ' f o r s eco nd o r t h i rd p e r son
an d p l u ra l ) .
7
S om et i m es , f o r a co ns t r u c t i on in t h e so u r ce l an gu age , t h e s am e
co ns t r u c t i on i s no t av a i l ab l e i n t h e t a r ge t l an gu age . In s uch a c as e , t h e s ys t em
ch oo s es ano t h e r con s t r uc t io n i n t h e t a r ge t l an gu age in wh i ch th e sam e in f o r m at ion
can b e ex p r e s s ed . In t h e e x am pl e be l o w, t h e s ys t em ch os e s t h e comp l em en t i z e r
co ns t r u c t i on i n Hi nd i ( Es A ) t o ex p r es s t h e s am e s en s e :
@ H : h am Ar A _ l ad ak I_ k o ` n O k a r I k a r an A _E sA n ah IM _[ hE |W A ] .
! E : Ou r d au gh t e r (da t . ) j ob d o_ sh ou ld _ t h a t no t ( f em . )
E : I t i s no t t h e case th a t o u r d au gh t e r sh ou ld ge t a j o b .
H o w ev e r , A nu saa r ak a sh ow s th e i m age an d th e r e fo r e , i t u s es t h e com pl emen t i z e r
( E s A) . S om et i m es th e r e a r e s l i gh t d i f fe r en ce s b e t w een a co ns t r u c t i on in t h e s o ur ce
l an gu age to a s im i l a r co ns t r u c t i on i n t h e t a r ge t l an gu age b ecaus e o f w h i ch
i n fo rm a t io n m i gh t n o t b e p r e s e r v ed . In s u ch a s i t u a t i on ad d i t i o na l no ta t i on i s
i n t ro du ced t o ex p re s s t h e i n fo rm at ion w hi ch wo u l d o the r w i s e ge t l o s t . A s im pl e
ex ampl e o f t h i s i s t h e l a ck o f d i s t i n c t io n b e t ween p e r so na l p ro no un and p ro no mi na l
ad j ec t i ve i n Hi nd i : v ah a .
@ H : v ah a ` pA T hshA l A_ ko ` ga yA .
! E : h e scho o l ( d a t . ) w en t .
E : H e w en t t o s choo l .
@ H : v ah a - pA T hshA l A_ ko ` T ro ph I A yI .
! E : t h a t s ch oo l ( da t . ) t ro ph y cam e
E : Th a t s cho o l r e ce i v ed th e t ro ph y.
Wh en t r an s fe r r i n g f r om o ne l an gu age t o t he o t h e r , t h i s d i s t i n c t io n w ou l d h ave
d i s app ea r ed , i f c a r e w as n o t t ak en . I n A n u s aa r ak a , t h e t wo f o rm s a re m ad e
d i f f e r en t b y i n t ro du c in g add i t i o n a l n o t a t io n :
v ah a ` (h e )
v ah a - ( t h a t )
8
Salient Features of Anusaaraka
Faithful representat ion of text in source language:
T h ro u gho u t t h e v a r i ou s l aye r s o f A nu s aa rak a o u t p u t t he r e i s an e f f o r t t o en su r e
t h a t t h e us e r sh ou ld b e ab l e t o un de r s t and th e in f o r m at i on co n t a i ned i n t h e E n g l i sh
s en t en ce . Th i s i s g i v en g r ea t e r imp o r t an ce t han g i v i n g pe r f ec t s en t en ces i n H in d i ,
f o r i t wo u l d b e po i n t l e s s t o h av e a t r an s l a t i on th a t r e ad s w e l l bu t d o es no t t r u l y
cap tu r e t h e in f o r m at io n o f t h e so ur ce t ex t .
T h e l a ye r ed o u tp u t i s un iq u e to An usaa r ak a . Th us , s ou rce l an gu age t ex t
i n fo rm at io n and how th e Hin d i t r an s l a t io n i s f i n a l l y a r r i v ed a t c an b e acces s ed b y
t h e us e r . T h e i mpo r t an t f e a tu r e o f t h e l a ye r ed o u tp u t i s t h a t t he i n f o rm at ion
t r an s f e r i s d on e in a co n t r o l l ed m anne r a t ev er y s t ep thu s , m ak in g i t p os s i b l e t o
r ev e r t back w i th ou t an y l o s s o f i n f o r m at io n . Al s o , an y l os s o f i n f o r m at io n t h a t
c an no t b e av o i ded in a t r ans l a t ion p ro ces s i s t h en d on e in a g r adua l w ay.
T h e r e f o r e , ev en i f t h e t r an s l a t ed s en ten ce i s no t a s ' p e r f ec t ' a s h um an t r ans l a t i on ,
w i t h s om e e f fo r t an d o r i en t a t io n o n read in g A nu s aar ak a o u t pu t , an in d i v i du a l can
u n de r s t an d w h a t t he s ou r ce t ex t i s imp l yi n g b y l o o k in g a t t h e l aye r s an d co n t ex t i n
w h ich th a t s en t en ce app ea rs .
Reversibi l i ty:
T h e f ea t u r e o f g r ad u a l t r an s f e r en ce o f i n f o rm at i on f rom o n e l a ye r t o t he n ex t ,
g i v e s A nu s aar ak a an add i t i on a l ad v an t age o f b r i ng i n g r ev e r s ib i l i t y i n t he
t r an s l a t io n p ro ce ss – a f e a t u r e w h i ch cann o t be ach i ev ed b y a co n v en t io na l
m ach in e t r an s l a t i on s ys t em. A b i - l i ngu a l u se r o f A nu saa r ak a can , a t an y p o in t ,
a cce ss t h e s ou r ce l an gu age t ex t i n E n g l i sh , b ecau s e o f t he t r ans p ar en cy i n t h e
o u t pu t . S om e amo un t o f o r i en t a t io n on h o w to r ead t h e An us aa r ak a ou t pu t wo u l d b e
r eq u i r ed f o r t h i s .
9
Transparency:
D i s p l a y o f s t ep - b y- s t ep t r an s l a t io n l a ye r s g iv es an in c r eas ed l ev e l o f con f i den ce to
t h e en d -u s e r , a s he can t r a ce b ack t o t h e s ou r ce and ge t c l a r i t y r e ga r d i n g t r an s l a t ed
t ex t b y an a l ys i s o f t h e ou t pu t l a ye r s an d so m e r e f e r en ce to co n t ex t .
10
Champollion
C h amp ol l io n i s a R obu s t Para l l e l Tex t S en t en ce A l ign er . P a r a l l e l t ex t i s a v e r y
v a lu ab l e r es ou r ce f o r a n um b er o f na t u r a l l an gu age p ro ce ss i n g t as ks , i nc l ud i n g
m ach in e t r an s l a t io n , c ro s s l an gu age i n fo rm a t io n r e t r i ev a l , and w o r d
d i s am bi gu a t io n . P ar a l l e l t ex t p ro v id es t h e m ax im um u t i l i t y w h en i t i s s en t en ce
a l i gn ed . Th e s en ten ce a l i gn m en t p r o ce s s maps s en t en ces i n t h e so ur ce t ex t t o t h e i r
t r an s l a t io n . Th e l ab o ur i n t ens i v e an d t i m e con su min g na t u r e o f m an u a l s en t en ce
a l i gn m en t m ak es l a r ge p a r a l l e l t ex t co r p us d ev e l opm ent d i f f i cu l t . Th us a n um b er o f
au tom a t i c s en t en ce a l i gnm en t ap pr o ach es h av e b een p ro p os ed and u t i l i z ed ; s om e
a r e p u r e l en g t h b ased ap p ro aches , some a r e l ex i co n b as ed , and s om e a r e a mix t u r e
o f t he t w o ap p ro ach es .
Wh i l e ex i s t i n g app r o ach es p e r fo r m r ea s on ab l y w e l l on c l os e l an gu age
p a i r s , t h e i r p e r fo rm an ce d egr ad es qu ick l y o n r emot e l angu age p a i r s s uch as E n g l i sh
an d Ch i n es e . P e r fo r m an ce d egr ad a t i on i s ex ace rb a t ed b y n o i s e i n t h e d a t a .
C h amp ol l io n w as i n i t i a l l y d ev e l op ed f o r a l i gn i n g Ch in ese - En g l i sh
p a r a l l e l t ex t . I t w as l a t e r po r t ed to o t h e r l an gu age pa i r s , i n c l ud in g A rab i c –
E n g l i sh and Hi nd i – En g l i sh .
C h amp ol l io n d i f f e r s f rom o th e r s en t en ce a l i gn e rs i n two w ays . F i r s t , i t
a s su mes a n o i s y i n p u t , i . e . a l a r ge p e r cen t age o f a l i gn men t s wi l l n o t be one t o o ne
a l i gn m en t s , and t h a t t h e n um ber o f d e l e t i on s an d in s e r t i on s wi l l b e s i gn i f i c an t . The
a s su mpt io n i s a ga in s t d ec l a r in g a m at ch i n t h e ab sen ce o f l ex i ca l ev id ence . N on -
l ex i ca l measu r es , su ch as s en t en ce l eng t h in f o r m at i on – w h ich a r e o f t en unr e l i ab l e
w h en d ea l in g w i t h n o i s y d a t a – c an an d sh ou l d s t i l l b e u s ed , b u t t h e y s h ou ld on l y
p l ay a s u pp or t i n g r o l e w h e n l ex i ca l ev id en ce i s p r e sen t . S eco nd , C h am po l l ion
d i f f e r s f ro m o t h er l ex i co n - b as ed app r o ach es i n a s s i gn i n g w ei gh t s t o t r an s l a t ed
w o r ds . T r ans l a t io n l ex i con s u su a l l y h e lp s en ten ce a l i gn e r s i n t h e fo l lo win g w a y:
f i r s t , t r an s l a t ed wo r ds a r e i d en t i f i ed b y u s i n g en t r i e s f r o m a t r an s l a t io n l ex i con ;
11
s econ d , s t a t i s t i c s o f t r ans l a t ed wo r ds a r e t h en u sed to i d en t i f y s en t en ce
co r r es po nd en ces .
In m o s t ex i s t i n g sen t en ce a l i gnm en t a l go r i th ms , t r an s l a t ed wo rd s a re
t r e a t ed eq ua l l y, i . e . t r an s l a t ed w o rd pa i r s a r e a s s i gn ed eq u a l w e i gh t w h en dec i d in g
s en t en ce co r r es po nd en ces . Fo r ex amp le , 1 - 1 a l i gnm en t co ns t i t u t es 8 9 % o f t h e UBS
E n g l i sh - F r en ch co r p us an d 1 -0 and 0 - 1 a l i gnm en t s co n s t i t u t e m e r e ly 1 . 3 % .
H o w ev e r , wh en c rea t i n g v e r y l a r ge p a r a l l e l co rp or a , t h e d a t a c an b e v e ry n o i s y.
Fo r ex ampl e , i n a U N C hin es e En g l i sh co r pu s , 6 .4 % o f a l l a l i gn m en t s a r e e i t h e r 1 -
0 o r 0 - 1 a l i gnm en t .
S om e o f t h e om is s i on s an d in s e r t i o ns w er e i n t ro duced d u r in g th e
t r an s l a t io n o f t h e t ex t . Mo s t o f t h e o mis s i on s an d in s e r t i o ns , ho w ev e r , a re
i n t ro du ce d d u r in g d i f f e r en t s t age s o f p ro ces s i n g b efo r e s en t en ce a l i gnm en t i s
c a r r i ed ou t . T h e p re - p r oce ss i n g s t eps i n c lu d e con v er t in g t h e r aw d a ta t o p l a i n t ex t
f o rm a t , r emo vi n g t ab l es , fo o t n o t es , en d n o t es , e t c . Mos t o f t h es e s t eps i n t ro du ce
n o i s e . Fo r i n s t an ce , w h i l e a t ab l e i n an E n g l i s h do cu m ent c an b e com pl e t e l y
r emo v ed , t h i s i s no t n eces sa r i l y t h e ca s e i n an y g i v en C h i nes e d ocum ent . Becau se
o f t h e s hee r nu mber o f d o cum ent s i n v o lv ed , m an u a l l y ex ami n in g each do cum ent
a f t e r p r e - p ro ces s i ng i s im po ss ib l e . A r o bu s t s en t en ce a l i gne r needs no t on l y t o
d e t ec t m os t c a t egor i es o f no i s e , b u t a l so to r e cov e r qu ick l y i f an e r r o r i s m ad e . I t
h a s b een p r o ved th a t ex i s t i n g m et hod s wo r k ve r y w e l l o n c l ean d a t a , b u t t h e i r
p e r f o r man ce go es do w n qu ick l y a s da t a b ecom es no i s y.
12
CODES
Code for extracting regular text from xml file:
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
//MAXIMUM NUMBER OF PAGES ALLOWED
#define MAX 200
//EXTENSION OF THE FILES BEING CREATED FOR EACH PAGE
#define EXTENSION ".xml"
//LENGTH OF THE EXTENSION OF THE FILE
#define EXTENSION_LENGTH strlen(EXTENSION)
char temp[MAX];
//EXACT NUMBER OF PAGES IN THE SOURCE XML FILE
int totalPages;
//CONTAINS THE CURRENT PAGE NUMBER CONVERTED TO ITS CORRESPONDING FILENAME
char pageNumber[20];
//FILE POINTERS FOR READING THE PAGE FILE AND WRITING TO FINAL TEXT FILES
//TWO TEXT FILES ARE CREATED
//ONE FOR NON SORTED AND THE OTHER FOR SORTED DATA ACCORDING TO CO-
ORDINATES OF THE TEXT ON THE PAGE
FILE *fr,*fw;
//STRUCTURE FOR THE CONTENTS OF A SINGLE LINE OF THE XML FILE
struct Line
{
int top;
int left;
int width;
int height;
int font;
char text[10000];
};
//STRUCTURE FOR THE CONTENTS OF A SINGLE PAGE OF XML FILE
struct Page
{
struct Line line[MAX];
int lines;
};
//STRUCTURE FOR THE PAGE HEADER
struct Header
{
int fontId;
char fontSize[10];
char color[10];
struct Header *link;
};
typedef struct Header* HEADER;
struct Page pages[MAX];
HEADER head;
//CONTAINS THE FONTS FOR WHICH THE TEXT IS TO BE EXTRACTED
int fonts[MAX];
//CONTAINS TOTAL NUMBER OF FONTS
int totalFonts;
HEADER getHeader()
{
return((HEADER)malloc(1*sizeof(struct Header)));
}
13
void generatePages(char arg[50])
{
char arr[100]="./genPages.out ";
strcat(arr,arg);
printf("Creating Pages\n");
system("cc genPages.c -o genPages.out");
system(arr);
printf("Pages created\n");
}
void convertToText(int page)
{
int l,i,j;
char rev[20];
i=0;
while(page!=0)
{
rev[i++]=(page%10)+48;
page=page/10;
}
l=i;
i--;
for(j=0;j<l;j++)
{
pageNumber[j]=rev[i--];
}
for(i=0;i<EXTENSION_LENGTH;i++)
pageNumber[i+l]=EXTENSION[i];
pageNumber[i+l]='\0';
}
void fetchHeader()
{
int c,i;
HEADER t,cur;
while(1)
{
c=getc(fr);
if(c=='<')
{
c=getc(fr);
if(c=='f')
{
t=getHeader();
t->link=NULL;
while(!isdigit(c=getc(fr)));
i=0;
while(isdigit(c))
{
temp[i++]=c;
c=getc(fr);
}
temp[i]='\0';
t->fontId=atoi(temp);
while(!isdigit(c=getc(fr)));
i=0;
while(isdigit(c))
{
14
t->fontSize[i++]=c;
c=getc(fr);
}
t->fontSize[i]='\0';
while((c=getc(fr))!='#');
c=getc(fr);
i=0;
while(c!='\"')
{
t->color[i++]=c;
c=getc(fr);
}
t->color[i]='\0';
if(head==NULL)
{
head=t;
}
else
{
cur=head;
while(cur->link!=NULL)
cur=cur->link;
cur->link=t;
}
while(getc(fr)!='>');
}
else
break;
}
}
}
int checkLineEnd()
{
int c,i;
i=0;
while((c=getc(fr))!='>')
temp[i++]=c;
temp[i]='\0';
if(strcmp(temp,"/text")==0)
return(1);
return(0);
}
void fetchText(int pgNo)
{
int c,i;
i=0;
while(1)
{
c=getc(fr);
if(c=='<')
{
if(checkLineEnd())
break;
else
15
continue;
}
pages[pgNo].line[pages[pgNo].lines].text[i++]=c;
}
pages[pgNo].line[pages[pgNo].lines].text[i]='\0';
}
void fetchPageInfo(int pgNo)
{
int c,i;
c=getc(fr);
while(c!=EOF)
{
while(!isdigit(c=getc(fr)));
i=0;
while(isdigit(c))
{
temp[i++]=c;
c=getc(fr);
}
temp[i]='\0';
pages[pgNo].line[pages[pgNo].lines].top=atoi(temp);
while(!isdigit(c=getc(fr)));
i=0;
while(isdigit(c))
{
temp[i++]=c;
c=getc(fr);
}
temp[i]='\0';
pages[pgNo].line[pages[pgNo].lines].left=atoi(temp);
while(!isdigit(c=getc(fr)));
i=0;
while(isdigit(c))
{
temp[i++]=c;
c=getc(fr);
}
temp[i]='\0';
pages[pgNo].line[pages[pgNo].lines].width=atoi(temp);
while(!isdigit(c=getc(fr)));
i=0;
while(isdigit(c))
{
temp[i++]=c;
c=getc(fr);
}
temp[i]='\0';
pages[pgNo].line[pages[pgNo].lines].height=atoi(temp);
while(!isdigit(c=getc(fr)));
16
i=0;
while(isdigit(c))
{
temp[i++]=c;
c=getc(fr);
}
temp[i]='\0';
pages[pgNo].line[pages[pgNo].lines].font=atoi(temp);
printf("Fetching text for line %d\n",pages[pgNo].lines);
c=getc(fr);
fetchText(pgNo);
pages[pgNo].lines++;
c=getc(fr);
c=getc(fr);
}
}
void fetchFontId(int argc,char *argv[])
{
int i;
HEADER cur;
for(i=3;i<argc-1;i=i+2)
{
cur=head;
while(cur!=NULL)
{
if((strcmp(argv[i],cur-
>fontSize)==0)&&(strcmp(argv[i+1],cur->color)==0))
{
fonts[totalFonts++]=cur->fontId;
}
cur=cur->link;
}
}
}
void createPages()
{
int i;
for(i=1;i<=totalPages;i++)
{
convertToText(i);
fr=fopen(pageNumber,"r");
if(fr==NULL)
{
printf("Cannot open the file %s\nExitting\n",pageNumber);
exit(0);
}
printf("Fetching information of Page %d\n",i);
fetchHeader();
pages[i].lines=0;
fetchPageInfo(i);
printf("Information of Page %d fetched\n",i);
fclose(fr);
}
}
int checkFont(int fnt)
{
int i;
17
for(i=0;i<totalFonts;i++)
{
if(fnt==fonts[i])
return(1);
}
return(0);
}
void sortPage(int pgNo)
{
struct Line temp;
int i,j;
for(i=0;i<pages[pgNo].lines-1;i++)
{
for(j=i+1;j<pages[pgNo].lines;j++)
{
if(pages[pgNo].line[i].top>=pages[pgNo].line[j].top)
{
if(pages[pgNo].line[i].top==pages[pgNo].line[j].top)
{
if(pages[pgNo].line[i].left>pages[pgNo].line[j].left)
{
temp=pages[pgNo].line[i];
pages[pgNo].line[i]=pages[pgNo].line[j];
pages[pgNo].line[j]=temp;
}
}
else
{
temp=pages[pgNo].line[i];
pages[pgNo].line[i]=pages[pgNo].line[j];
pages[pgNo].line[j]=temp;
}
}
}
}
}
void writeText(int pgNo)
{
int i;
for(i=0;i<pages[pgNo].lines;i++)
{
if(checkFont(pages[pgNo].line[i].font))
{
fputs(pages[pgNo].line[i].text,fw);
putc('\n',fw);
}
}
}
void createTextFile(char arg[MAX])
{
int i;
for(i=1;i<=totalPages;i++)
{
writeText(i);
}
fclose(fw);
strcat(arg,"_sorted");
fw=fopen(arg,"w");
18
if(fw==NULL)
{
printf("Cannot create the file %s\nEXITING\n",arg);
return;
}
for(i=1;i<=totalPages;i++)
{
sortPage(i);
writeText(i);
}
}
main(int argc,char *argv[])
{
totalPages=atoi(argv[2]);
generatePages(argv[1]);
head=NULL;
createPages();
totalFonts=0;
fetchFontId(argc,argv);
fw=fopen(argv[argc-1],"w");
if(fw==NULL)
{
printf("Cannot create the file %s\nEXITING\n",argv[argc-1]);
return(0);
}
createTextFile(argv[argc-1]);
fclose(fw);
return(0);
}
Code for dividing the xml into pages in accordance with the .pdf file used:
#include<stdio.h>
#include<string.h>
#define MAX 2000
#define START_PATTERN "<page"
#define START_PATTERN_LENGTH strlen(START_PATTERN)
#define END_PATTERN "</page>"
#define END_PATTERN_LENGTH strlen(END_PATTERN)
#define EXTENSION ".xml"
#define EXTENSION_LENGTH strlen(EXTENSION)
FILE *fr,*fw;
char temp[MAX];
char pageNumber[20];
void convertToText(int page)
{
int l,i,j;
char rev[20];
i=0;
while(page!=0)
{
rev[i++]=(page%10)+48;
page=page/10;
}
19
l=i;
i--;
for(j=0;j<l;j++)
{
pageNumber[j]=rev[i--];
}
for(i=0;i<EXTENSION_LENGTH;i++)
pageNumber[i+l]=EXTENSION[i];
pageNumber[i+l]='\0';
}
int skip()
{
int i,c;
for(i=0;i<START_PATTERN_LENGTH;i++)
{
c=getc(fr);
if(c==EOF)
return(EOF);
temp[i]=c;
}
temp[i]='\0';
do
{
if(strcmp(temp,START_PATTERN)==0)
{
c=getc(fr);
while(c!='>')
c=getc(fr);
c=getc(fr);
return(1);
}
c=getc(fr);
if(c==EOF)
return(EOF);
for(i=0;i<START_PATTERN_LENGTH-1;i++)
temp[i]=temp[i+1];
temp[i++]=c;
temp[i]='\0';
}while(1);
}
int checkPageEnd()
{
int i;
for(i=0;i<END_PATTERN_LENGTH;i++)
{
temp[i]=getc(fr);
}
temp[i]='\0';
if(strcmp(temp,END_PATTERN)==0)
return(1);
return(0);
}
main(int argc,char *argv[])
{
int c,r,i,page;
page=1;
fr=fopen(argv[1],"r");
if(fr==NULL)
{
printf("Cannot open %s\n",argv[1]);
return(0);
20
}
do
{
if(skip()==EOF)
break;
convertToText(page);
fw=fopen(pageNumber,"w");
if(fw==NULL)
{
printf("Cannot create file %s\n",pageNumber);
return(0);
}
else
printf("File for Page Number %s created\n",pageNumber);
do
{
c=getc(fr);
if(c=='<')
{
ungetc(c,fr);
r=checkPageEnd();
if(r)
break;
//putc('<',fw);
for(i=0;temp[i]!='\0';i++)
putc(temp[i],fw);
}
else
{
putc(c,fw);
}
}while(1);
fclose(fw);
page++;
}while(c!=EOF);
fclose(fr);
return(0);
}
21
Word-Sense Disambiguation
(WSD)
In co m p ut a t io n a l l i n gu i s t i c s , w o rd -s ens e d i sambi gu a t io n (WS D ) i s an o p en p ro b l em
o f n a tu r a l l an gu age p ro ce ss i n g , w h i ch go v er ns t h e p roce s s o f i d en t i f yi n g w hich
s ens e o f a wo rd ( i . e . m ean i n g ) i s u s ed in a s en t en ce , wh en t h e wo rd h as m ul t ip l e
m ean in gs . Th e s o lu t i on to t h i s p r ob lem im p ac t s o t h e r co mp ut e r - r e l a t ed w r i t i n g ,
s u ch a s d i s cou r se , i mp r ov i n g r e l ev an ce o f s ea r ch en g i n es , an aph or a r e so lu t ion ,
co h e ren ce . A d i sam bi gu a t io n p ro ce ss r equ i r e s t wo s t r i c t t h i n gs : a d i c t i on a r y t o
s p ec i f y t h e s ens es w h ich a re t o be d i sam bi gu a t ed and a co r pu s o f l an guage d a t a t o
b e d i s am bi gu a t ed ( i n som e m et ho ds , a t r a in i n g co r pus o f l an gu age ex amp l es i s a l so
r eq u i r ed ) . WS D t a sk h as t w o v a r i an t s : " l ex i ca l s amp le " an d " a l l w o rd s " t a sk . T he
f o rm er comp r i s es d i s amb i gu a t i n g th e o ccu r r en ces o f a sm al l s am pl e o f t a r ge t wo rds
w h ich we r e p r ev i ou s l y s e l ec t e d , wh i l e i n t h e l a t t e r a l l t h e w o rd s in a p i ece o f
r u nn in g t ex t n eed to b e d i s am bi gu a t ed . T h e l a t t e r i s d eem ed a mo r e r ea l i s t i c f o rm
o f ev a l u a t i on , bu t t h e co rp us i s mo r e ex p ens iv e to p r o du ce b ecaus e h u m an
an no t a to r s hav e to r e ad t h e d e f in i t i on s f o r e ach w o rd in t h e sequ en ce ever y t i me
t h ey n eed t o m ak e a t agg i n g ju d gem en t , r a t h e r t h an o nce f o r a b lo ck o f i n s t an ces
f o r t h e s am e t a r ge t w o r d .
T o g i ve a h i n t ho w a l l t h i s w or ks , con s i d er t wo ex amp l es o f t h e d i s t i n c t
s ens e s t h a t ex i s t f o r t h e ( w r i t t en ) w ord " ba ss " :
a t yp e o f f i s h
t on e s o f l ow f requ en c y
an d th e s en t en ces :
I wen t f i sh i ng f or so m e s ea b as s .
T h e bas s l i n e o f t he so ng i s t oo w ea k .
22
T o a hu m an , i t i s ob v i ou s th a t t h e f i r s t s en t en ce i s u s in g t he wo r d " bass
( f i sh ) " , a s i n t h e fo r mer s ens e abo v e an d i n t h e secon d s en ten ce , t h e w o rd " b ass
( i n s t ru m en t ) " i s b e i n g us ed as i n t h e l a t t e r s en s e b e l o w. D ev e l op in g a l go r i t hm s to
r ep l i c a t e t h i s h um an ab i l i t y c an o f t en b e a d i f f i cu l t t a sk , a s i s fu r t h e r ex emp l i f i ed
b y t h e im pl i c i t eq u i vo ca t io n b e t ween " b ass ( s o un d) " an d " b a ss " (m us ica l
i n s t r um en t ) .
C Language Integrated Production System :
C LIP S i s an ex pe r t s ys t em to o l o r i g i n a l l y d ev e l op ed by t h e S o f t w a r e
T echn o l o g y Br an ch ( S T B) , N AS A/ Lyn d o n B . J oh ns on Space C en t e r . S in ce i t s f i r s t
r e l e as e in 19 86 , CLIP S h a s u nd e r go ne co n t in ua l r e f in em en t and imp r ov em ent . I t i s
n o w u s ed b y t h ou san ds o f p eop l e a r ou n d th e w o r ld . C LI PS i s d es i gn ed t o fa c i l i t a t e
t h e d ev e lo pm ent o f s o f t w ar e to mo de l hu m an kn ow l edge o r ex p e r t i s e . Th e r e a re
t h ree wa ys t o r ep r es en t k no wl ed ge in C LIP S :
• Rul e s , w h i ch a r e p r im a r i l y i n t end ed f o r h eur i s t i c k n ow led ge b as ed on ex p er i en ce .
• D ef f un c t io ns and g en er i c f un c t io ns , w h i ch a r e p r i ma r i l y i n t end ed fo r p roced u ra l
k n ow led ge .
• Ob j ec t -o r i en t ed pr og ra m mi ng , a l so p r im a r i l y i n t en d ed fo r p r o ced ur a l kn o wl ed ge .
T h e f iv e gen e r a l l y a ccep t ed f ea tu r e s o f o b j ec t -o r i en t ed p ro gr ammin g a r e
s up po r t ed : c l a s s es , m ess age - h an d l e r s , ab s t r ac t i on , encap su l a t io n , i n h er i t an ce ,
p o l ym o r ph i sm . Ru l e s m ay p a t t e rn m atch o n ob j ec t s an d fac t s .
W e can d ev e lo p so f t w a re us i n g on l y r u l es , on l y o b j ec t s , o r a mix tu r e o f
o b j ec t s an d ru l e s . C LIP S h a s a l so b een d e s i gn ed fo r i n t eg r a t i on wi t h o t h er
l an gu ages su ch as C and J av a . R u l e s and ob jec t s fo rm an in t eg r a t ed s ys t em t oo
s in ce r u l es c an p a t t e rn - ma t ch o n f ac t s an d ob j ec t s . In ad d i t i o n to b e i n g used a s a
s t an d - a lo ne t o o l , C LIP S can b e ca l l ed f ro m a p r o ced ur a l l an gu age , pe r fo r m i t s
f u n c t i on , an d t h en r e tu r n co n t ro l b ack t o t h e ca l l i n g p ro gr am. Li k ew is e , p r oced ura l
co d e can b e d e f i ned a s ex t e rn a l fun c t io ns and ca l l ed f r om C LIP S . When t he
ex t e r n a l cod e com pl e t e s ex ecu t i on , con t ro l r e tu r ns t o C LIP S . C LIP S i s an ex ce l l en t
t oo l fo r w o r d -s ens e d i s amb i gu a t i on .
23
Conclusion
M T i s r e l a t i v e l y n ew i n In d i a – abo u t a d ecad e o l d . In co m p ar i s on wi th MT e f f o r t s
i n Eu ro p e an d J apan , wh ich a r e a t l ea s t 3 d ecad es o l d , i t w ou ld s eem t h a t In d i an
M T h as a l o n g wa y to go . Ho w ev e r , t h i s c an a l s o b e an ad v an tage , b ecaus e In d i an
r e s ea r ch e rs c an l e a r n f r om t he ex p e r i en ce o f t h e i r g lo ba l co un t e r pa r t s . The r e a re
c l os e to a d oz en p ro j ec t s n ow , wi th abo u t 6 o f t h em b e i ng i n ad v an ced p r o to t yp e o r
t e chn o lo g y t r an s fe r s t age , and t h e r es t h av in g b een n ewl y in i t i a t ed .
T h e In d i an N LP / MT scen e so f a r h as been ch ar ac t e r i z ed b y an acu t e
s ca r c i t y o f b a s i c l ex i ca l r es ou r ces s uch a s co rp o ra , MRD s , l ex i co ns , t h es au r i and
t e rm in o lo g y b an k s . A l s o , t he v a r io us M T gr o up s h av e u s ed d i f f e r en t f o rm al i sms
b e s t s u i t e t o t h e i r s p ec i f i c a pp l i c a t i on s , an d h en ce t h e re h as b een l i t t l e sh a r in g o f
r e s ou r ces am on g t h em. T hes e i s sue s a r e b e in g add r e s s ed no w . T he r e a re
go v e r nm en t a l a s we l l a s vo lu n t a r y e f f o r t s un de r w a y t o d ev e l op com mo n l ex i ca l
r e s ou r ces , and to c r ea t e fo r ums f o r co ns o l i d a t i n g an d co o rd i n a t i n g N LP an d MT
e f f o r t s . I t app ea rs t h a t t h e ex p lo r a to r y p h as e o f In d i an MT i s ov e r , an d th e
co ns o l i d a t i on p h ase i s abo u t t o b egi n , w i th t h e fo cus m ov i n g f r om p ro o f -o f -
co n cep t p r o t o t yp e s t o p ro du c t i on iza t i on , d ep lo ym en t , co l l ab o r a t i v e re s ou rce
s h a r in g and ev a l u a t i on .
T h e co r e An us aa rak a ou tp u t i s i n a l an gu age c l ose t o t he t a r ge t
l an gu age , and can b e u nd e rs t oo d b y t h e h u m an r ead er a f t e r s om e t r a i n in g . T he
q u es t io n i s ho w m u ch t r a i n i n g i s n ece s s a r y t o ge t a v e r y h i gh d egr ee o f
co mp r eh ens i on . O ur e x p e r i ence o f w or k in g am on g In d i an l an gu ages sh o ws th a t t h i s
t r a i n i n g i s l i k e l y t o b e sm al l . R e as on f o r t h i s i s t h a t In d i a f o rms a l i n gu i s t i c a r e a :
In d i an l an gu ages sh a r e v o cab u l a r y an d g r amm at i ca l cons t ru c t i on s . Th e r e a r e a l so
s h a r ed p r agm at i c s an d cu l t u r e . S i mi l a r ap p r o ach can b e ap p l i ed t o b u i ld Eng l i s h to
H i nd i An us aa r ak a . A s tu d y can b e co n du c t ed r e l a t ed to t r a i n i n g req u i red to r ead
s u ch an ou t pu t . The ex p ec ta t i on i s t ha t E n g l i s h to Hi nd i u s ab l e s ys t em can b e bu i l t
ex cep t t h a t i t w i l l r eq u i r e l o n ge r t r a in in g .