Undeciphered Scripts in the Unicode Age · Undeciphered Scripts in the Unicode Age Challenges for...

46
Undeciphered Scripts in the Unicode Age Challenges for encoding early writing systems of the Near East Internationalization and Unicode Conference #42 September 2018 Dr. Anshuman Pandey Dr. Deborah (Debbie) Anderson Script Encoding Initiative, UC Berkeley

Transcript of Undeciphered Scripts in the Unicode Age · Undeciphered Scripts in the Unicode Age Challenges for...

Undeciphered Scripts in the Unicode Age

Challenges for encoding early writing systems of the Near East

Internat iona l i za t ion and Unicode Conference #42 • September 2018

Dr. Anshuman Pandey Dr. Deborah (Debb ie) Anderson

Scr ipt Encod ing In i t i a t i ve , UC Berke ley

WELL-KNOWN UNDECIPHERED SCRIPTS (OUTSIDE THE NEAR EAST)

INDUS VALLEY

RONGO RONGO

MESO-AMERICAN SCRIPTS

UNDECIPHERED (OR PARTLY DECIPHERED)N.E. SCRIPTS NOT IN UNICODE

1. Proto-Cuneiform2. Proto-Elamite3. Linear Elamite 4. Cretan Hieroglyphs

5. Byblos 6. Proto-Sinaitic 7. Cypro-Minoan

Byblos

Proto-ElamiteProto-Cuneiform

Proto-Sinaitic

Cypro-Minoan

Cretan Hieroglyphs

Linear Elamite

UNDECIPHERED (OR PARTLY DECIPHERED)N.E. SCRIPTS NOT IN UNICODE

UNDECIPHERED (OR PARTLY DECIPHERED)N.E. SCRIPTS NOT IN UNICODE

4000 BC 3000 BC 2000 BC 1000 BC

Proto-Cuneiform

Proto-Elamite

LinearElamite

Cypro-Minoan

Byblos

Proto-Sinaitic / Proto Canaanite

TIMELINE OF UNDECIPHERED/PARTLY DECIPHERED SCRIPTS

OF THE NEAR EAST (3200-1500 BC)

Cretan Hieroglyphs

4000 BC 3000 BC 2000 BC 1000 BC

Proto-Cuneiform

Proto-Elamite

LinearElamite

Cypro-Minoan

ByblosProto-Sinaitic / Proto-Canaanite

TIMELINE OF UNDECIPHERED/PARTLY DECIPHERED SCRIPTS OF THE NEAR EAST

Cretan Hieroglyphs

4000 BC 3000 BC 2000 BC 1000 BC

Proto-Cuneiform

Proto-Elamite

LinearElamite

Cypro-Minoan

Byblos

TIMELINE OF UNDECIPHERED/PARTLY DECIPHERED SCRIPTS OF THE NEAR EAST

Cretan Hieroglyphs

4000 BC 3000 BC 2000 BC 1000 BC

Proto-Cuneiform

Proto-Elamite

LinearElamite

Cypro-Minoan

ByblosProto-Sinaitic (Proto-Canaanite)

TIMELINE OF UNDECIPHERED/PARTLY DECIPHERED SCRIPTS OF THE NEAR EAST

Cretan Hieroglyphs

KEYS TO DECIPHERMENT

• Corpus large enough? Can the number of signs be determined?

EXAMPLE: LINEAR BOver 5000 tablets with inscriptions; 85-90 distinct signs

KEYS TO DECIPHERMENT

• Known relationship(s) to other scripts?

EXAMPLE: LINEAR BShown to have some relationship to the Cypriot Syllabary, which was deciphered earlier

Cypriot Syllabary

KEYS TO DECIPHERMENT

• Underlying language (or language family) known?

EXAMPLE: LINEAR BNouns varied only by 1 sign, suggesting the language was inflected (and led to identifying it as an Indo-European lang.)

KEYS TO DECIPHERMENT

• Bilingual available? (Or are there other ways to be able to confirm a reading?)

Letter from Michael Ventris to E. Bennett, May 1953

COMMENTS ON N.E. UNDECIPHERED SCRIPTS

• 1. Scripts in this talk reflect a spectrum, ranging from partially deciphered to completely undeciphered.

•• 2. May attract wide-ranging, unusual

theories

Proto-Cuneiform

Linear Elamite

ENCODING PROCESS

PROPOSAL IS WRITTEN

REVIEWED BY UNICODE SCRIPT AD HOC

REVIEWED BY UNICODE TECHNICAL COMMITTEE AND APPROVED

REVIEWED BY ISO SC2 AND WORKING GROUP 2 AND PUT ON ISO BALLOT

ENCODING PROCESS

PROPOSAL IS WRITTEN

REVIEWED BY UNICODE SCRIPT AD HOC

REVIEWED BY UNICODE TECHNICAL COMMITTEE AND APPROVED

REVIEWED BY ISO SC2 AND WORKING GROUP 2 AND PUT ON ISO BALLOT

FACTORS TO CONSIDER

RE: UNDECIPHERED

SCRIPTS(FROM THE

SCRIPT AD HOC)

• Does the script have a stable list of the characters that scholars refer to?

From CHIC = Corpus Hieroglyphicarum InscriptionumCretae (Godart and Olivier)

FACTORS TO CONSIDER

RE: UNDECIPHERED

SCRIPTS(FROM THE

SCRIPT AD HOC)

• How much material in the script exists today?

Proto-Sinaitic stele

FACTORS TO CONSIDER

RE: UNDECIPHERED

SCRIPTS(FROM THE

SCRIPT AD HOC)

• What is the state of decipherment?

• Is the underlying language known?

Alice Kober’s files for Linear B

FACTORS TO CONSIDER

RE: UNDECIPHERED

SCRIPTS(FROM THE

SCRIPT AD HOC)

• Can a strong case be made to encode the script? Is text in script being interchanged?

CYPRO-MINOAN

CYPRO-MINOAN

CYPRO-MINOAN

• Found on ca. 250 objects

• Current proposal is stalled:

• Some characters in proposal are not regarded today by scholars as valid

• Apparent duplicates in repertoire (from Enkomi tablet)

Enkomi tablet ENKO Atab 001

PROTO-ELAMITE

PROTO-ELAMITE

• Found on over 1600 tablets, most from Susa, in SW Iran

• A short-lived writing system (ca. 3100-2900 BC)

PROTO-ELAMITE

• Closed set of characters (300-400)

• Similar numerical system to Proto-Cuneiform

• New texts will be available from Tehran soon

BYBLOS SYLLABARY

BYBLOS SYLLABARY

• Byblos (modern Lebanon)

• Other parts of Mediterranean

• 18th-15th c. BCE

• 10 extant records

• Origins unknown

• Egyptian hieratic script?

• Syllabic script

BYBLOS SYLLABARY

• Structure

• signs represent Semitic CV syllables

• Directionality

• believed to be left to right

BYBLOS SYLLABARY

• Repertoire

• ~1050 characters in corpus

• ~90 to ~120 distinctive signs

• No number signs

• No punctuation

BYBLOS SYLLABARY

• Decipherment status

• No consensus on repertoire

• Variants vs. distinctive signs?

BYBLOS SYLLABARY

• Unicode status

• Allocated to SMP roadmap

• No proposal

• Challenges for encoding

• Open repertoire

• Character-glyph distinctions

• Unconfirmed sign values

PROTO-SINAITIC

PROTO-SINAITIC

• Also known as ‘Early Alphabetic’

• ~18th-17th c. BCE

• Inspired by Egyptian Hieroglyphs

• Supposed first alphabetic script

PROTO-SINAITIC

• ~50 inscriptions

• Serabit el-Khadim (Sinai), 17th c. BCE

• Wadi el-Hol (Qena, Egypt)

... lbʿlt (‘...to the Lady’), Gardiner 1916

PROTO-SINAITIC

• Ancestor

• Egyptian Hieroglyphs

• Descendants

• Proto-Canaanite, in turn, Phoenician

• all organically evolved alphabets, abjads, abugidas

PROTO-SINAITIC

• Repertoire

• Closed set of characters

• ~20 base signs

• some variants

• No number signs

• No punctuation

PROTO-SINAITIC

• Structure

• Directionality:

• Horizontal

• left to right

• right to left

• Vertical: top to bottom

• glyphs may be rotated

• Non-joining, non-cursiveWadi el-Hol inscriptions

(a)

(b)

PROTO-SINAITIC

• Users / Inventors

• Two hypotheses:

• ‘Illiterate’ miners

• Literate foreman

Serabit el-Khadim, Sinai

PROTO-SINAITIC

• Origin

• Hieroglyphs selected by shape of the sign with a familiar object

• No apparent semantic or phonetic connection between source and target

• Acrophonic?, Logographic?

mʿhbʿl ... (‘beloved of the La(dy)...’), Gardiner 1916

PROTO-SINAITIC

• Unicode status

• proposed, but rejected

• Everson (N1688), 1988

• unallocated to roadmap

PROTO-SINAITIC

• Current usage

• active scholarship

• representation of signs in publications

• exchange of documents containing signs

• fonts

Goldwasser,, “From the Iconic to the Linear”, 2016

PROTO-SINAITIC

• Status of decipherment

• value of signs not firmly deciphered

• variance in typology (alphabetic, logographic, rebus?)

PROTO-SINAITIC

• Issues with encoding

• Typology?

• Sign values?

Darnell, others: rb ...

Colless: "Excellent (R) banquet (mšt) of the celebration (H) of `Anat (`nt). ’El (’l) will provide (ygš)plenty (rb) of wine (wn) and victuals (mn) for the celebration (H). We will sacrifice (ngt_) to her (h) an ox (’) and (p) a prime (R) fatling (mX)."

PROTO-SINAITIC

• Issues with encoding

• Representative glyphs?

• Directional variants?

• Horizontal (mirror)

• Vertical (rotated)

• Variant vs. distinctive

N.E. SCRIPTS IN THE UNICODE STANDARDNOT FULLY UNDERSTOOD

• Scripts in Unicode containing some characters whose values are still unknown:

• Linear B• Carian• Anatolian Hieroglyphs

U+145E8 ANATOLIAHIEROGLYPH A435

= syllabic a-x?

N.E. SCRIPTS IN THE UNICODE STANDARDNOT FULLY UNDERSTOOD

• Scripts that are partly deciphered/language not fully understood:

Linear A

• Underlying language not fully understood:

Etruscan, (Old Italic script) script)

Etruscan inscription in Old Italic script

APPROACHES TO UNDECIPHERED SCRIPTS

• Use image-based solutions/PUA until script is better understood, or a stronger case for encoding can be made

• Option: Encode characters as symbols, such as was done for Phaistos Disc symbols

APPROACHES TO UNDECIPHEREDSCRIPTS – PROTO-SINAITIC

• Model A

• Use Phoenician or Hebrew encoding (current practice for existing fonts)

• prevents distinctive representation of script in plain text

• Model B

• Encode as a separate script

• Character repertoire

• encode all attested signs?

• directional variants?

• Handle directionality using mark-up

• Goal: interchange, not perfection

CONCLUSION

• Semantic-gap conundrum

• Information recorded, but cannot access or decode

• Usage conundrum:

• Represent this information digitally, but no support

• Encoding conundrum:

• How to define semantics for the unknown?