Proposal for a Kannada Script Root Zone Label Generation ... · dynasty, in the 6th century A.D.,...
Transcript of Proposal for a Kannada Script Root Zone Label Generation ... · dynasty, in the 6th century A.D.,...
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
1
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
LGRVersion:3.0Date:2019-03-06Documentversion:2.6Authors:Neo-BrahmiGenerationPanel[NBGP]
1. GeneralInformation/Overview/Abstract ThepurposeofthisdocumentistogiveanoverviewoftheproposedKannadaLGRintheXMLformatandtherationalebehindthedesigndecisionstaken.Itincludesadiscussionofrelevantfeaturesofthescript,thecommunitiesorlanguagesusingit,theprocessandmethodologyusedandinformationonthecontributors.TheformalspecificationoftheLGRcanbefoundintheaccompanyingXMLdocument: proposal-kannada-lgr-06mar19-en.xmlLabelsfortestingcanbefoundintheaccompanyingtextdocument: kannada-test-labels-06mar19-en.txt
2. ScriptforwhichtheLGRisProposedISO15924Code:KndaISO15924N°:345ISO15924EnglishName:KannadaLatintransliterationofthenativescriptname:Nativenameofthescript:ಕನ#ಡ
MaximalStartingRepertoire(MSR)version:MSR-4SomelanguagesusingthescriptandtheirISO639-3codes:Kannada(kan),Tulu(tcy),Beary,Konkani(kok),Havyaka,Kodava(kfa)
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
2
3. BackgroundonScriptandPrincipalLanguagesUsingIt
3.1Kannadalanguage
KannadaisoneofthescheduledlanguagesofIndia.ItisspokenpredominantlybythepeopleofKarnatakaStateofIndia.ItisoneofthemajorlanguagesamongtheDravidianlanguages. Kannada is also spoken by significant linguisticminorities in the states ofAndhraPradesh,Telangana,TamilNadu,Maharashtra,Kerala,Goaandabroad.Asperscholars,Kannadawasaspokenlanguageduringthe3rdcenturyB.C.Ptolemy,ascholarfromAlexandria,inhisTheGeographywrittenduringthefirsthalfofthesecondcenturyA.D.mentionssomeKannadawords.PtolemyspeaksofmanyplacesinKarnatakasuchasKalgeris(identifiedasKalkeri),Modogoulla(Mudugal),Badamios(Badami)andsoon.AllthesearenotonlyplacesinKarnataka,butarealsonamesofKannadaorigin.ThefamousHalmidiRecordoftheKadambas,whichisaninscriptionofthe5thcenturyA.D.,istheoldestavailableevidenceofKannadalanguagewritteninthepre-OldKannadascript.KappeArabhatta’sRecordatBadami (700A.D.)has the firstKannadapoem in%&ಪ( tripadi metre. The oldest available literary work in Kannada is ಕ)*ಾಜ-ಾಗ/ –
Kavirajamarga,abookonpoeticsbelongingto9thcentury.ThisworkspeaksofsomeearlierpoetsinKannada.Hence,Kannadamusthavebeenafullydevelopedlanguagebythe5thorthe6thcenturyA.D.andmusthavebeenaspokenlanguageforatleastafewcenturiesearlier.Kannadaisattestedepigraphicallyforaboutoneandahalfmillennia,andliteraryOldKannadaflourishedinthe6th-centuryGangadynastyandduringthe9th-century Rashtrakuta Dynasty. Kannada has an unbroken literary history of over athousandyears.
3.2EvolutionofKannadascript
TheKannadalanguageiswrittenusingtheKannadascript,whichevolvedfromthe5th-centuryKadambascript.TheoldestformofKannadascriptbeginsin3rdcenturyB.C.Thefirst popular andwell-knownKannada scriptwas calledKadamba script used by theKadambadynastyduring5thcenturyA.D.Buhler,thefamousepigraphistsaysthattheKadamba script is the earliest formof the present dayKannada script. DuringGangadynasty, in the 6th centuryA.D., the script used is known asAdi Ganga script,whichresemblesKadambascript.During6-7thcenturyA.D.,theChalukyasofBadamiusedascriptwhich isnowcalledBadamiChalukya scriptbyhistorians.Rashtrakutawas thenextfamousdynasty,whichruledduring8-10thcenturyA.D.andthescriptusedduringthosetimeisreferredtoasRashtrakutascript.ThescriptusedbytheKalyanaChalukyarulersiscalledKalyanaChalukyascript.Itcanbeseenintherecordsof10-12centuryAD.Cursivewritingwas started during the13th century byHoysala kings.They built thedecorativecursivewayofwritingbasedonthescriptofKalyanaChalukyas.Inscriptions
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
3
atBeluruandHalebeeduhavetextwrittenusingthiskindofscript.TheVijaynagarkingsruledduringthe14-16thcenturyA.D.didnotmakeanymajormodificationstothescript.The lastdynastyofKarnataka, thekingsofMysoredevelopedwhat isknownasModiscript.ItiscalledModiscriptor0ೕ2ಬರಹ(Modibaraha).Mostofthepublicrecords
that were written during the period of the Mysore kings are in the Modi script. NoinscriptionswerewrittenintheModiscriptasthisstyleisdifficulttoinscribeonastone.Thismaybeconsideredthelatestdevelopedformofthescript,andistaughtevennowinschoolsascursivewritingforKannada.
Figure1:EvolutionofKannadascriptfrom3rdcenturyB.C.to18thcenturyA.D.
(fromhttps://karnatakaitihasaacademy.org/karnataka-history/evolution-of-kannada-script/)
3.3Languagesconsidered
ApartfromtheKannadalanguage,otherlanguagesthatusetheKannadascriptare-Tulu,Kodava(Coorgi),Konkani,Havyaka,Sanketi,Beary(byaari),Arebaase,Koraga,etc.Tuluhaditsownscript,whichisnotinmuchusenowadayseventhoughlotofeffortsarebeingdoneof late to revive theTulu script.TheKonkani language iswritten inDevanagari,Roman,andMalayalamscriptsalso.
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
4
3.4StructureofwrittenKannada
ThestructureofKannadaissimilartootherIndianlanguages,especiallytoTelugu.TheheartofthewritingsystemistheAkshar.TheKannadaalphabetisknownasaksharamaleorvarnamale.Themodernalphabetcontains49characters.ThishasbeenarrivedatbyremovingtwocharactersthataremainlyusedtowriteclassicalKannadatexts.Thesetwocharacterswereinusejustabout50yearsago.Characterscombinetoformcompoundcharacters called as samyuktakshara (conjuncts). These compound characters havedistinctdisplayforms.Thetotalnumberofsuchcombinationswillbeabout650,000.Thebasiccharactersinvarnamaleareclassifiedintothreemaincategories.Theyare-swara(vowels),vyanjana(consonants)andyogavahas.
3.4.1Swaras(vowels) Therearethirteenvowels
Letter Diacritic ISOnotation
ಅ N/A a
ಆ ◌ಾ ā
ಇ ◌ i
ಈ ◌ೕ ī
ಉ ◌ು u
ಊ ◌ೂ ū
ಋ ◌ೃ rū
ಎ ◌ e
ಏ ◌ೕ ē
ಐ ◌ೖ ai
ಒ ◌ೂ o
ಓ ◌ೂೕ ō
ಔ ◌ au
Table1:KannadaSwaras(vowels)
(fromhttps://en.wikipedia.org/wiki/Kannada_alphabet)
Whenavowelfollowsaconsonant,itiswrittenwithadiacriticratherthanasaseparateletter.Sometimesthesearereferredtoasvowelsignsormatras.Vowelsignsormatrasareattachedonlytoconsonants.
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
5
3.4.2Yogavahas(Semiconsonants) TheYōgavāha(part-vowel,partconsonant)includetwoletters:
1. Theanusvara:ಅಂ(aṁ)
2. Thevisarga:ಅಃ(aḥ)
3.4.3.Vyanjanas(consonants)Two categories of consonant characters (Vyanjanas) are defined in Kannada: thestructured consonants (Vargīya Vyañjana) and the unstructured consonants (AvargīyaVyañjana.The structured consonants are classified according to where the tongue touches thepalate of themouth and are classified accordingly into five structured groups. Theseconsonantsareshownhere
voiceless voicelessaspirate
voiced voicedaspirate
nasal
Velars ಕ(ka) ಖ(kha) ಗ(ga) ಘ(gha) ಙ(ṅa)
Palatals ಚ(ca) ಛ(cha) ಜ(ja) ಝ(jha) ಞ(ña)
Retroflex ಟ(ṭa) ಠ(ṭha) ಡ(ḍa) ಢ(ḍha) ಣ(ṇa)
Dentals ತ(ta) ಥ(tha) ದ(da) ಧ(dha) ನ(na)
Labials ಪ(pa) ಫ(pha) ಬ(ba) ಭ(bha) ಮ(ma)
Table2:KannadaConsonants
(fromhttps://en.wikipedia.org/wiki/Kannada_alphabet)
The unstructured consonants are consonants that do not fall into any of the abovestructures:ಯ(ya),ರ(ra),ಱ(ṟa)(obsolete),ಲ(la),ವ(va),ಶ(śa),ಷ(ṣa),ಸ(sa),ಹ(ha),ಳ
(ḷa),ೞ (ḻ) (obsolete).From this list, the twoobsolete characters (ಱ andೞ)havebeen
removedinmodernvarnamalebringingthetotalnumberofcharactersto49.
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
6
3.4.4Implicitvowelಅ(a)inconsonants
All consonants (vyanjanas) in Kannada when written as ಕ (ka),ಖ (kha),ಗ (ga), etc.
containanimplicitvowelಅ(a).Theconsonantsh,i,j,etc.,areshownafterremoving
theimplicitvowelಅ(a).Infact,manygrammarbooksonKannadalisttheconsonantsby
removing the implicit ಅ (a). The Unicode character U+0CCD, which is the Kannada
equivalent of Devanagari’s Halant U+094D (or VIRAMA as Unicode calls it), followsconsonantstoremovetheimplicitಅ(a).Halantcanonlyfollowaconsonantandnoother
characters.Avowelsign(matra)followingtheconsonantreplacestheimplicitvowelbyadifferentvowel.
3.4.5ConjunctsKannada is known to have a large number of conjuncts which are nothing butcombinationofconsonantsandvowelsigns(matras).Thesearealsoknownassyllables.Different types of consonant and vowel sign combinations are possible. They are thefollowing:
• Consonant+Vowelsign, e.g.,ಕ(ka, U+0C95)+ ◌ೂ (U+0CCA,matraofvowelಒ) =oೂ
• Consonant+Halant+Consonant, e.g.,ಕ (ka, U+0C95)+ Halant(U+0CCD)+ ಕ (ka, U+0C95)=ಕr
• Consonant+Halant+Consonant+Vowelsign, e.g. ಕ (ka, U+0C95)+ Halant(U+0CCD)+ ಕ (ka, U+0C95)+ ◌ೂ (U+0CCA,matra
ofvowelಒ) =oೂr
• Consonant+Halant+Consonant+Halant+Consonant e.g.,ಷ (ṣa, U+0CB7)+ Halant(U+0CCD)+ಟ (ṭa,U+0C9F)+Halant(U+0CCD)+ ರ
(ra, U+0CB0) =ಷs
• Consonant+Halant+Consonant+Halant+Consonant + Vowelsign
e.g.,ಷ (ṣa, U+0CB7)+ Halant(U+0CCD)+ ಟ (ṭa,U+0C9F)+Halant(U+0CCD)+ ರ
(ra, U+0CB0) +◌ೂ (U+0CCA,matraofvowelಒ) =tೂs
Conjunctsclusterhavingmorethan3consonantsinonesyllablearenormallynotseeninKannada.
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
7
3.4.6Purevowelsinthemiddleofaword
InKannada,itiscommontohavewordsstartingwithavowel.Sometimes,oflate,peoplearewritingwordshavingpurevowelsinthemiddleofaword.Thiskindofwritingwasoriginally normally not seen in Kannada. This kind of writing has been made arequirementtowritethewordsimportedfromotherlanguages,especiallyfromEnglish.Linguisticallythisisnotinvalidandhencecanbeallowed.
3.4.7IllegalcombinationsTherearesomecombinationswhichareinvalidasperKannadagrammar.Theyarelistedbelow:
3.4.7.1 Havingtwoormoreconsecutivevowelsigns(matras).3.4.7.2 Havingavowelsign(matra)afteravowel.3.4.7.3 Havingavowelsign(matra)afteraYōgavāha(anusvaraorvisarga).3.4.7.4 HavingaHalantafteravowelorvowelsign(matra).3.4.7.5 HavingaYōgavāhaafteraHalant.
For3.4.7.4therecouldbecasesinvolvingmulti-worddomainswhereVmayneedtobeallowedtofollowanH.ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendswithaHalantandthesecondwordbeginswithaVowel.SomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintended.However,byandlarge,theformofthefirstwordwithoutaHisconsideredsufficientforthefullrepresentationofthesoundintendedforthefirstword.Thisisauniquesituationnecessitatedbythelackofhyphen,spaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire.Otherwise,VisneverrequiredtobeallowedtofollowanH.However,permittingthismaycreateaperceptualsimilaritybetweentwolabels(withandwithoutH)formajorityofthelinguisticcommunity,hencethisisexplicitlyprohibitedbytheNBGP.Dependingontheprevailingrequirementsbythecommunity,afutureNBGPmayconsiderrevisitingthisrule.
4. OverallDevelopmentProcessandMethodologyNeo-BrahmiGenerationPanel(NBGP)hasbeenformedbymembershavingexperiencein linguistics and computational linguistics. Under the Neo-Brahmi Generation Panel,thereareninescriptsbelongingtoseparateUnicodeblocks.EachofthesescriptshasbeenaseparateLGR;however,theNeo-BrahmiGPensuresthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmi-derivedscripts.
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
8
NBGPconsideredallthelanguageswithEGIDSscale1to4andfoundthatKannadascriptis being used for Kannada, Tulu, Beary, Konkani, Havyaka, Kodava, among otherlanguages.
4.1GuidingPrinciples
TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-pointrepertoireacrosstheboardforallthescriptswithinitsambit.
4.1.1Inclusionprinciples:
4.1.1.1Modernusage:Every character proposed should be in the everyday usage of a particular linguisticcommunity.Thecharacterswhichhavebeenencoded intheUnicode for transcriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire.
4.1.1.2Unambiguoususe:Everycharacterproposedshouldhaveunambiguousunderstandingamongthelinguisticaboutitsusageinthelanguage.
4.1.2Exclusionprinciples:ThemainexclusionprincipleisthatofAcknowledgementofEnvironmentalLimitations.These comprise of protocols or standards which are pre-requisites to the LabelGenerationRulesets.Allfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity.
4.1.2.1ExternallimitsonScope:Thecodepointrepertoire forrootzonebeingaveryspecialcase,uptheladder in theprotocolhierarchies,thecanvasofavailablecharactersforselectionasapartoftheRootZonecodepointrepertoireisalreadyconstrainedbyvariousprotocollayersbeneathit.Followingthreemainprotocols/standardsactassuccessivefilters:i.TheUnicodeChart:Outofallthecharactersthatareneededbythegivenscript,ifthecharacterinquestionisnotencodedinUnicode,itcannotbeincorporatedinthecodepointrepertoire.Suchcasesarequiterare,giventheelaborateandexhaustivecharacterinclusioneffortsmadebyUnicodeConsortium.ii.IDNAProtocol:Unicode being the character encoding standard for providing the maximum possiblerepresentationofagivenscript/language,ithasencodedasfaraspossibleallthepossible
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
9
charactersneededbythescript.However,theDomainnamebeingaspecializedcase,itisgovernedbyanadditionalprotocolknownasIDNA(InternationalizedDomainNamesinApplications).TheIDNAprotocolintroducesexclusionofsomecharactersoutofUnicoderepertoirefrombeingpartofthedomainnames.
Example: Kannada sign CANDRABINDU ! (U+0C81) is not allowed to be part of thedomainname.
iii.MaximalStartingRepertoire:TheRoot-zoneLGRbeingarepertoireofthecharacterswhicharegoingtobeusedforcreationoftherootzoneTLDs,whichinturnareanevenmorespecializedcaseofdomainnames,theROOTLGRprocedureintroducesadditionalexclusionsonIDNAallowedsetofcharacters.Example:KannadaSignAVAGRAHA"ऽ"(U+0CBD)evenifallowedbyIDNAprotocol,isnotpermittedintheRootZoneRepertoireasperthe[MSR].Tosumup,therestrictionsstartoffbyadmittingonlysuchcharactersasarepartofthecode-blockofthegivenscript/language.ThisisfurthernarroweddownbytheIDNA2008Protocol and finally an additional filter in the form of Maximal Starting Repertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore.
4.1.2.2NoRareandObsoleteCharacters:There are characterswhich have been added toUnicode to accommodate rare formsespeciallylikeKANNADALETTERVOCALICL"ಌ"(U+0C8C)aswellasitsmatraforms"◌"
(U+0CE2). All such characters will not be included. This is in consonance with theConservatismprincipleaslaiddownintheRootZoneLGRprocedure.
4.2MethodologytoincorporatethefeedbackreceivedthroughPublicCommentprocess
TheKannadascriptLGRproposalwaspublishedforpubliccommenttoallowthosewhohadnotparticipatedintheNBGPtomaketheirviewsknown.TheNBGPanalyzedallcommentsreceivedtofinalizetheproposal.Theanalysisofpubliccommentscanbeaccessedonlinegivenat[111].
5. RepertoireSection5.1providesthesectionofthe[MSR]applicabletotheKannadascriptonwhichtheKannadacodepointrepertoireisbased.Section5.2detailsthecodepointrepertoire
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
10
thattheNeo-BrahmiGenerationPanel[NBGP]proposestobeincludedintheKannadaLGR.
5.1KannadasectionofMaximalStartingRepertoire[MSR]Version4
Figure2:KannadaCodePagefrom[MSR]
Colorconvention1:Allcharactersthatareincludedinthe[MSR]-YellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]-PinkishbackgroundNotPVALIDinIDNA2008-Whitebackground
1This document needs to be printed in color for this to be read correctly.
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
11
5.2Codepointrepertoire
GivenbelowistherepertoireforKannadabasedonUnicodecharacterset.
Sr.No.
UnicodeCodePoint
Glyph CharacterName Category Reference
1 0C82 ◌ಂ KANNADASIGNANUSVARA
Anusvara 110
2 0C83 ◌ಃ KANNADASIGNVISARGA
Visarga 110
3 0C85 ಅ KANNADALETTERA Vowel 110
4 0C86 ಆ KANNADALETTERAA Vowel 110
5 0C87 ಇ KANNADALETTERI Vowel 110
6 0C88 ಈ KANNADALETTERII Vowel 110
7 0C89 ಉ KANNADALETTERU Vowel 110
8 0C8A ಊ KANNADALETTERUU Vowel 110
9 0C8B ಋ KANNADALETTERVOCALICR
Vowel 110
10 0C8E ಎ KANNADALETTERE Vowel 110
11 0C8F ಏ KANNADALETTEREE Vowel 110
12 0C90 ಐ KANNADALETTERAI Vowel 110
13 0C92 ಒ KANNADALETTERO Vowel 110
14 0C93 ಓ KANNADALETTEROO Vowel 110
15 0C94 ಔ KANNADALETTERAU Vowel 110
16 0C95 ಕ KANNADALETTERKA Consonant 110
17 0C96 ಖ KANNADALETTERKHA
Consonant 110
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
12
Sr.No.
UnicodeCodePoint
Glyph CharacterName Category Reference
18 0C97 ಗ KANNADALETTERGA Consonant 110
19 0C98 ಘ KANNADALETTERGHA
Consonant 110
20 0C99 ಙ KANNADALETTERNGA Consonant 110
21 0C9A ಚ KANNADALETTERCA Consonant 110
22 0C9B ಛ KANNADALETTERCHA Consonant 110
23 0C9C ಜ KANNADALETTERJA Consonant 110
24 0C9D ಝ KANNADALETTERJHA Consonant 110
25 0C9E ಞ KANNADALETTERNYA Consonant 110
26 0C9F ಟ KANNADALETTERTTA Consonant 110
27 0CA0 ಠ KANNADALETTERTTHA
Consonant 110
28 0CA1 ಡ KANNADALETTERDDA
Consonant 110
29 0CA2 ಢ KANNADALETTERDDHA
Consonant 110
30 0CA3 ಣ KANNADALETTERNNA
Consonant 110
31 0CA4 ತ KANNADALETTERTA Consonant 110
32 0CA5 ಥ KANNADALETTERTHA Consonant 110
33 0CA6 ದ KANNADALETTERDA Consonant 110
34 0CA7 ಧ KANNADALETTERDHA
Consonant 110
35 0CA8 ನ KANNADALETTERNA Consonant 110
36 0CAA ಪ KANNADALETTERPA Consonant 110
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
13
Sr.No.
UnicodeCodePoint
Glyph CharacterName Category Reference
37 0CAB ಫ KANNADALETTERPHA Consonant 110
38 0CAC ಬ KANNADALETTERBA Consonant 110
39 0CAD ಭ KANNADALETTERBHA
Consonant 110
40 0CAE ಮ KANNADALETTERMA Consonant 110
41 0CAF ಯ KANNADALETTERYA Consonant 110
42 0CB0 ರ KANNADALETTERRA Consonant 110
43 0CB2 ಲ KANNADALETTERLA Consonant 110
44 0CB3 ಳ KANNADALETTERLLA Consonant 110
45 0CB5 ವ KANNADALETTERVA Consonant 110
46 0CB6 ಶ KANNADALETTERSHA Consonant 110
47 0CB7 ಷ KANNADALETTERSSA Consonant 110
48 0CB8 ಸ KANNADALETTERSA Consonant 110
49 0CB9 ಹ KANNADALETTERHA Consonant 110
50 0CBE ◌ಾ KANNADAVOWELSIGNAA
Matra 110
51 0CBF ◌ KANNADAVOWELSIGNI
Matra 110
52 0CC0 ◌ೕ KANNADAVOWELSIGNII
Matra 110
53 0CC1 ◌ು KANNADAVOWELSIGNU
Matra 110
54 0CC2 ◌ೂ KANNADAVOWELSIGNUU
Matra 110
55 0CC3 ◌ೃ KANNADAVOWELSIGNVOCALICR
Matra 110
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
14
Sr.No.
UnicodeCodePoint
Glyph CharacterName Category Reference
56 0CC6 ◌ KANNADAVOWELSIGNE
Matra 110
57 0CC7 ◌ೕ KANNADAVOWELSIGNEE
Matra 110
58 0CC8 ◌ೖ KANNADAVOWELSIGNAI
Matra 110
59 0CCA ◌ೂ KANNADAVOWELSIGNO
Matra 110
60 0CCB ◌ೂೕ KANNADAVOWELSIGNOO
Matra 110
61 0CCC ◌ KANNADAVOWELSIGNAU
Matra 110
62 0CCD ◌ KANNADASIGNVIRAMA
Halant/Virama
110
Table3:Codepointrepertoire
5.3Codepointsnotincluded Followingcodepointshavenotbeenincludedintherepertoire.
Sr.No.
UnicodeCodePoint
Glyph CharacterName ReasonforExclusion
1. 0C8C ಌ KANNADALETTERVOCALICL
NotusedinKannada
2. 0CB1 ಱ KANNADALETTERRRA Obsoletecharacter,notusedinmodernKannada
3. 0CBC ◌ KANNADASIGNNUKTA DoesnotbelongtoKannada,notneededinLGR
4. 0CC4 ◌ೄ KANNADAVOWELSIGNVOCALICRR
NotusedinKannada
5. 0CD5 ◌ೕ KANNADALENGTHMARK Notinuse
6. 0CD6 ◌ೖ KANNADAAILENGTHMARK
Notinuse
Table4:Codepointnotincluded
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
15
6. Variants
6.1In-scriptvariants
TherearenovariantswithintheKannadascript,whentheformationofalabelisgovernedbytheWholeLabelEvaluationrulesinsection7.
6.2Cross-scriptvariantsanalysis
SomecharactersofKannadalookthesameassomecharactersinDevanagari,Gujarati,Telugu,MalayalamandSinhala.Thetablesbelowlistthem.
6.2.1Cross-scriptvariantsforKannadaandTeluguanalysisTheTeluguandKannadacodepointsetsinTable5arecross-scriptvariantcodepoints.
VariantSet TeluguCodePoint KannadaCodePoint
1 ◌ం (0C02) ◌ಂ (0C82)
2 ◌ః (0C03) ◌ಃ (0C83)
3 అ (0C05) ಅ (0C85)
4 ఆ (0C06) ಆ (0C86)
5 ఇ (0C07) ಇ (0C87)
6 ఈ (0C08) ಈ (0C88)
7 ఐ (0C10) ಐ (0C90)
8 ఒ (0C12) ಒ (0C92)
9 ఓ (0C13) ಓ (0C93)
10 ఔ (0C14) ಔ (0C94)
11 ఖ (0C16) ಖ (0C96)
12 గ (0C17) ಗ (0C97)
13 జ (0C1C) ಜ (0C9C)
14 ఝ (0C1D) ಝ (0C9D)
15 ఞ (0C1E) ಞ (0C9E)
16 ట (0C1F) ಟ (0C9F)
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
16
Table5:TeluguandKannadacodepointanalysis
6.2.2Cross-scriptvariantsforKannadaandDevanagarianalysisVisargaistheonlypotentialvariantcodepointthatexhibitsshapesimilaritybetweentheKannadaandDevanagariscripts.However,asthisisacombiningmarkandtherearenoothervariantcodepointsbetweenthetwolanguages,itisnotdefinedasavariantcodepoint.
DevanagariCodePoint KannadaCodePoint
◌ः (0903) ◌ಃ (0C83)
Table6:DevanagariandKannadacodepointanalysis
17 ఠ (0C20) ಠ (0CA0)
18 డ (0C21) ಡ (0CA1)
19 ఢ (0C22) ಢ (0CA2)
20 ణ (0C23) ಣ (0CA3)
21 థ (0C25) ಥ (0CA5)
22 ద (0C26) ದ (0CA6)
23 ధ (0C27) ಧ (0CA7)
24 న (0C28) ನ (0CA8)
25 బ (0C2C) ಬ (0CAC)
26 భ (0C2D) ಭ (0CAD)
27 మ (0C2E) ಮ (0CAE)
28 య (0C2F) ಯ (0CAF)
29 ర (0C30) ರ (0CB0)
30 ల (0C32) ಲ (0CB2)
31 ళ (0C33) ಳ (0CB3)
32 ◌ (0C3F) ◌ (0CBF)
33 ◌ు (0C41) ◌ು (0CC1)
34 ◌ృ (0C43) ◌ೃ (0CC3)
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
17
6.2.3Cross-scriptvariantsforKannadaandGujaratianalysisVisargaistheonlypotentialvariantcodepointthatexhibitsshapesimilaritybetweentheKannadaandGujaratiscripts.However,asthisisacombiningmarkandtherearenoothervariantcodepointsbetweenthetwolanguages,itisnotdefinedasavariantcodepoint.
GujaratiCodePoint KannadaCodePoint
◌ઃ (0A83) ◌ಃ (0C83)
Table7:GujaratiandKannadacodepointanalysis
6.2.4Cross-scriptvariantsforKannadaandMalayalamanalysisAnusvaraandVisargaaretheonlypotentialvariantcodepointsthatexhibitshapesimilaritybetweentheKannadaandMalayalamscripts.However,astheyarecombiningmarksandtherearenoothervariantcodepointsbetweenthetwolanguages,theyarenotdefinedasvariantcodepoints.
KannadaCodePoint MalayalamCodePoint
◌ಂ (0C82) ◌ം (0D02)
◌ಃ (0C83) ◌ഃ (0D03)
Table8:KannadaandMalayalamcodepointanalysis
6.2.5Cross-scriptvariantsforKannadaandSinhalaanalysisInitiallyNBGPconsideredthreevariantsetsbetweenKannadascriptandSinhalascriptasshowninTable9.
VariantSet KannadaCodePoint SinhalaCodePoint 1 ◌ಂ (0C82) ◌ං (0D82)
2 ◌ಃ (0C83) ◌ඃ (0D83)
3 ರ (0CB0) ර(0DBB)
Table9:KannadaandSinhalacandidatevariantsetsTheSinhalaGenerationPanel(SinhalaGP)disagreedthatರ (0CB0)andර(0DBB)arevariantcodepoints.TheSinhalaGPalsodisagreedthatර(0DBB)andtherelevantTelugucodepointర (0C30),arevariantcodepoints.AsafollowupSinhalaGPandNBGPmetanddiscussedthesethreecodepoints.
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
18
KannadaCodePoint TeluguCodePoint SinhalaCodePoint
ರ (0CB0) ర (0C30) ර(0DBB)
Table10:Kannada,Telugu,andSinhalacandidatevariantsetThetwoGPsagreedthatthevariantrelationshipdoesnotexist.BasedontheconsensusofthetwoGPs,thissetofcharactershasbeenremovedfromthevariantsets.Therefore,AnusvaraandVisargaaretheonlypotentialvariantcodepointsthatexhibitshapesimilaritybetweentheKannadaandSinhalascripts.However,astheyarecombiningmarksandtherearenoothervariantcodepointsbetweenthetwolanguages,theyarenotdefinedasvariantcodepoints.
VariantSet KannadaCodePoint SinhalaCodePoint 1 ◌ಂ (0C82) ◌ං (0D82)
2 ◌ಃ (0C83) ◌ඃ (0D83)
Table11:KannadaandSinhalacodepointanalysis
7. WholeLabelEvaluationRules(WLE)ThestructureofwrittenKannadaisprovidedinsection3.4.usingthefollowingvariablesordefinitions:
V → VowelM→ MatraC → ConsonantH → Halant/ViramaB → AnusvaraX → Visarga
Rulesforformingaksharoraclusterofoneunitaregivenbelow:
1. Rule1:HmustbeprecededbyC2. Rule2:MmustbeprecededbyC3. Rule3:BmustbeprecededbyC,VorM4. Rule4:XmustbeprecededbyC,VorM5. Rule5:VcannotbeprecededbyH
8. Contributors1.Dr.U.B.Pavanaja,[email protected].,[email protected],[email protected]
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
19
9. References[MSR] IntegrationPanel,"MaximalStartingRepertoire—MSR-4Overviewand
Rationale",7February2019https://www.icann.org/en/system/files/files/msr-4-overview-25jan19-en.pdf(Accessedon18February2019)
[NBGP]Neo-BrahmiGenerationPanel[101] AComparativeGrammaroftheDravidianorSouthIndianFamilyofLanguages
byRobertCaldwell,Trubner&Co,London,SecondEdition,1875[102] EvolutionofKannadascript-https://karnatakaitihasaacademy.org/karnataka-
history/evolution-of-kannada-script/(Accessedon18February2019)[103] HistoryoftheKannadaScriptandLanguage-
https://bookstalkist.com/history-of-the-kannada-script-and-language/(Accessedon18February2019)
[104] HistoryoftheKannadaLiterature-http://kamat.com/kalranga/kar/literature/history1.htm(Accessedon18February2019)
[105] Kannadaalphabet-https://en.wikipedia.org/wiki/Kannada_alphabet(Accessedon18February2019)
[106] AboutKannadalanguage-https://en.wikipedia.org/wiki/Kannada(Accessedon18February2019)
[107] EthnologueentryaboutKannada-http://www.ethnologue.com/19/language/kan/(Accessedon18February2019)
[108] OLACresourcesinandabouttheKannadalanguage-http://www.language-archives.org/language/kan(Accessedon18February2019)
[109] EncyclopaediaBritannicaentryaboutKannada-https://www.britannica.com/topic/Kannada-language(Accessedon18February2019)
[110] ಕನ#ಡ ಮಧzಮ {ಾzಕರಣ, %ೕ.ನಂ. }&ೕಕಂಠಯz, ~ೕ�ಾ ಬುh ��, �ೖಸೂರು, ೨೦೦೧
(KannadaMadhyamaVyakarana(meansAnIntermediateKannadaGrammar),T.
N.Sreekantaiya,GeethaBookHouse,Mysore,2001.)[111] PubliccommentfeedbackforKannada,Telugu,OriyaScriptLGRProposals,
https://docs.google.com/document/d/1m9MbBfNBQZAFc9SOYpt0lgeeyM3N-DsUP173J4Vb948(Accessedon18February2019)
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
20
AppendixI:ConfusableCodePointsAnalysis
A-1.KannadaandTelugu
SomecodepointsforKannadaandTeluguscriptwerediscussedandtheNBGPconcludedthatthesearenotconfusablecodepoints.Thetablebelowlistsalldiscussedcodepointsandtheirresolution.
Ser.No.
KannadaCharacter
Unicodecodepoint
Telugucharacter
Unicodecodepoint
NBGPResolution
1 ಎ 0C8E ఎ 0C0E Distinguishable
2 ಘ 0C98 ఘ 0C18 Distinguishable
3 ಙ 0C99 ఙ 0C19 Distinguishable
4 ಚ 0C9A చ 0C1A Distinguishable
5 ಛ 0C9B ఛ 0C1B Distinguishable
6 ಪ 0CAA ప 0C2A Distinguishable
7 ಫ 0CAB ఫ 0C2B Distinguishable
8 ವ 0CB5 వ 0C35 Similar
9 ಶ 0CB6 శ 0C36 Similar
110 ಷ 0CB7 ష 0C37 Distinguishable
11 ಸ 0CB8 స 0C38 Similar
12 ◌ 0CCC ◌ 0C4C Distinguishable
TableA-1:NBGPresolutionofTeluguandKannadacodepoints
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
21
A-2.KannadaandDevanagari
ThefollowingtabledefinesKannadaandDevanagaricodepointswhichareconfusable.
DevanagariCodePoint KannadaCodePoint
◌ः (0903) ◌ಃ (0C83)
TableA-2:DevanagariandKannadaconfusablecodepoints
A-3.KannadaandGujarati
ThefollowingtabledefinesKannadaandGujaraticodepointswhichareconfusable.
GujaratiCodePoint KannadaCodePoint
◌ઃ (0A83) ◌ಃ (0C83)
TableA-3:GujaratiandKannadaconfusablecodepoints
A-4.KannadaandMalayalam
ThefollowingtabledefinesKannadaandMalayalamcodepointswhichareconfusable.
KannadaCodePoint MalayalamCodePoint
◌ಂ (0C82) ◌ം (0D02)
◌ಃ (0C83) ◌ഃ (0D03)
TableA-4:KannadaandMalayalamconfusablecodepoints
A-5.KannadaandSinhala
ThefollowingtabledefinesKannadaandSinhalacodepointswhichareconfusable.
KannadaCodePoint SinhalaCodePoint
◌ಂ (0C82) ◌ං (0D82)
◌ಃ (0C83) ◌ඃ (0D83)
TableA-5:KannadaandSinhalaconfusablecodepoints
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
22
AppendixII:Zero WidthJoinerandZeroWidthNonJoinerinKannadadomainnames Zero Width Joiner (ZWJ) and Zero Width Non Joiner (ZWNJ) have special roles inKannada.ZWNJisusedinsequenceslikeConsonant(C)+Halant(U+0CCD)+ConsonantkindwherethesecondCshouldnottakethevattuformafter(below)thefirstconsonant.Example:1.ಜ(U+0C9C) + ◌ +(U+0CCD)+ ಕ(U+0C95) = ಜr–withoutusingZWNJ 2.ಜ(U+0C9C) + ◌ +(U+0CCD)+ ZWNJ (U+200C) +ಕ(U+0C95) = � ಕ –usingZWNJ ExampleofthewordnotusingZWNJ–*ಾಜುr-ಾ� (Rajkumar)ExampleofwordusingZWNJ–*ಾ� ಕು-ಾ� (Rajkumar)Bothwordsmeanthesame.Buttheirpronunciationsareslightlydifferent.ThesecondformdoesnotoriginallybelongtoKannada,butnowadaysduetotheinfluenceofEnglishandHindi,somepeopleareusingtheform.ButthisformisneededtowritemanyEnglishwordsinKannadalikesoftware(�ಾ�� {ೕ�,usingZWNJ). Thewordsoftwarewillbecome�ಾ���ೕ� ifZWNJisnotused.ZWJ has a unique purpose in Kannadawhere conjuncts are formedwith consonantswherethefirstconsonantisರ (Ra)(U+0CB0).WhenconjunctisformedwithRa+Halant+<consonant>,thedefaultformissecond<consonant>followedbyarkavattu,whichisKannadaequivalentoftherephofDevanagari.Example:ರ (U+0CB0)+ ◌ (U+0CCD)+ ಕ(U+0C95)= ಕ/
OriginalKannadadidnothavethisarkavattuorreph.ItwaslikeanyotherC1+Halant+C2wheretheC2takesthevattuform,evenwhenC1=ರ (Ra).Togetthevattuformforರ + Halant+<consonant>,ZWJisused.Example:ರ (U+0CB0) + ◌ (U+0CCD) + ZWJ (U+200D) + ಕ (U+0C95) = ರ r –Microsoft’simplementationರ (U+0CB0) + ZWJ (U+200D) +◌ (U+0CCD) +ಕ (U+0C95) =ರ r – as perUnicode’sdefinition.BrowsersFirefoxandChromeusethisrenderingrule.TheoriginalKannadaform(notusingarkavattuorreph)isneededwhenthefirstletteroftheworditselfisರ (Ra,U+0CB0)followedbyHalantandaconsonant.Example- ರ (U+0CB0)+ Halant (U+0CCD)+ ಯ (U+0CAF) = ಯ/ as in�ಾ/� (rallywritteninKannada).ThisiswrongasperthewritingstylefollowedinKannada.Herethe
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
23
secondconsonantಯ (Ya,U+0CAF)musttakethevattuform.TogetthatonemustuseZWJasexplainedabove.CorrectformofrallywritteninKannadawillbe*ಾ z�makinguseofZWJ.OnesuchwordIcanthinkofwillbethecompanyRallisIndia.WhentheygoforregisteringdomainnameinKannada, ifZWJ isnotallowed indomainname, theywillhavetogofor�ಾ/�ೕ�.�ಾರತ,whichisthewrongforminKannada.Theyshouldwriteitas*ಾ z�ೕ�.�ಾರತ bymakinguseofZWJ.ThusbothZWJandZWNJareneededforhavingproperdomainnamesinKannada.HowtoavoidduplicatedomainnamesinvolvingZWJandZWNJ?ZWJandZWNJareusedmainlytohavetwodisplayformsofwhatislinguisticallysamewordor combinationof characters inKannada.WhenZWJandZWNJsareallowed indomainnames forKannada, itwill create twodomainnameswhichhave twodisplayformsbut linguistically theyaresame.TomakethebrowsersandDNSstreat themasequal,wehavetoignoreZWJandZWNJsforcomparingtwowords.Thismethodologyisfollowedbythespell-checkusedinMicrosoftWord.Thesamephilosophycanbeappliedherealso.AcceptingZWJandZWNJindomainnamescreatesconfusiontoamajorityofthelinguisticcommunityandjoinercharactersareprohibitedfortheRootZone,hencethisisexplicitlyprohibitedbytheNBGP.
ProposalforaKannadaScriptRootZoneLabelGenerationRuleset(LGR)
24
AppendixIII:NBGPCross-scriptVariantInclusionPolicy
If,inanytwogivenscripts,allthepotentialcross-scriptvariantsconsistofdependent(e.g.VowelSigns,Anusvara,Visarga,Chandrabinduetc.)charactersONLY,thenthatentiresetcanbeignoredandnocross-scriptvariantsbeproposedbetweenthosetwoscripts.
If,inanytwogivenscripts,thereisATLEASTONEnon-dependent(e.g.Consonant,Voweletc.)cross-scriptvariantcharacter/sequencepresent,allthepotentialcross-scriptvariantsshouldbeconsideredandproposedbetweenthetwoscripts.Thiscross-scriptanalysishasbeenrestrictedtothescriptsthathavedescendedfromtheBrahmiasmostofthemsharesimilarusagepatterns.Byandlarge,allofthesescriptshaveacommonsetofcharactersthatexistedinBrahmiscriptandbearthesameidentities.However,asthescriptsbranchedoutfromtheBrahmi,dependingonvariousfactors,theshapesofthecharacterschanged.Thischangeintheshapewasnotuniformacrossallthecharactersandthescripts.Somecharactersshapesdidchangesignificantly,whereassomeofthemstillretainedsimilarity.Thecross-scriptsimilarityanalysisalsoaimstoidentifysuchcaseswherethesamecharacterretainedalmostthesameshapedespitebeingpartofthedifferentscripts.Thesesetofcharactersarevariantsofeachotherintruesensethanmerelyofco-incidentalvisualsimilarity.Sincehavingsuchlabelsisarealisticpossibilityandthecorrespondinglabelslookalmostexactlyalike,NBGPhasproposedthemasblockedvariants.
NBGPacknowledgestheconcernthatthisshapeisquitegenericandmayhaveparallelsinotherscriptsnotunderitsambit.However,asNBGPdoesnothaveanyexposureaboutactualusageofthose characters in those particular scripts, NBGP desisted from including them in theanalysis.AsNBGPhasalreadyconsideredalltherelatedscriptsunderthecross-scriptvariantanalysis,thesimilarityofthecharactersbelongingtoNBGPscriptswithotherscriptsnotundertheNBGPambit,maybeofamereco-incidentalvisualnature.
Additionally,thisconcernisnotlimitedtothesetwocharactersbutforallthecharactersinallthescriptsunderthescopeoftheRootLGRprocedure.CarryingoutthisanalysiscanpracticallybedoneonlywiththeGenerationPanelsthatexistwhiletheNBGPisactive.ThisstillleavesoutthosescriptsoutofthescopewhichmaynothaveaGenerationPanelestablishedyet.Hence,carryingoutthisexerciseinentiretyisquiteimpracticable.Thisconundrumcanberesolvedifallsuchcasesarehandledbythe"StringSimilarityAssessmentPanel"ofICANN.