Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________...
-
Upload
ethel-gray -
Category
Documents
-
view
219 -
download
2
Transcript of Multilingual, Multi-script Catalog Requirements (An Arcadia Project) ________________________...
Multilingual, Multi-script Catalog Requirements
(An Arcadia Project)________________________
January 29, 2010
Jan 2010
Outline_____________________________________________________
• Background about the Arcadia non-Roman script project
• Introductions
• Orbis vs. YUFind and systems like YUFind
• Requirements discussion
• Wrap-up
Jan 2010
Project Goals _____________________________________________________
• Gap analysis of multilingual, multi-script functionality in Lucene-Solr-Solrmarc discovery applications (e.g., YUFind)
• Identification of desirable functionality
• Collaboration opportunities, community interest
• Recommendations with level-of-effort analysis
Jan 2010
Orbis vs. Yufind_____________________________________________________
vs
Chinese example:
“中日韩经济合作的新起点”
N-gram tokens, where N=2: <中日 > <日韩 > <韩经 > <经济 > <济合 >
<合作 > <作的 > <的新 > <新起 > <起点>
Jan 2010
Background: NR Scripts in Catalog Records_____________________________________________________
Jan 2010
JACKPHY_____________________________________________________
Jan 2010
One-to-Many (CJK)_____________________________________________________
Example: “Mao Zedong”
毛泽东 Simplified
毛澤東 Traditional
毛沢東 Kanji (Modern)
Jan 2010
One-to-Many (CJK) _____________________________________________________
“Mao Zedong” in simplified Chinese characters retrieves 527 results
Jan 2010
One-to-Many (CJK) _____________________________________________________
The same search in traditional Chinese characters yields154 hits.
Also Note paired fields
Jan 2010
One-to-Many (Digraphs)_____________________________________________________
ירטשאפטו ו
The Yiddish word “Virtshaft” is entered here with two separate vavs (i.e., key stroke ‘u’ in Microsoft’s Hebrew IME): U05D5 + U05D5
Jan 2010
One-to-Many (Digraphs) _____________________________________________________
N = 49 results
Jan 2010
One-to-Many (Digraphs)_____________________________________________________
ירטשאפטװ
The same word is this time entered as a double-vav digraph = U05F0 (via MS Hebrew IME key combo right-alt+u)
Jan 2010
One-to-Many (Digraphs)_____________________________________________________
N = 11 results
Jan 2010
NR Spelling Suggestions_____________________________________________________
Unhelpful suggestion?
Jan 2010
Labels and Facets_____________________________________________________
Should script/language of query determine script/language of facets?
Jan 2010
Labels and Facets_____________________________________________________
Better would be:
杉本つとむ , 1927- (11)高橋幹夫 , 1935- (11)野口武彦 . (8)渡辺信一郎 , 1934- (7)
OR:
Sugimoto, Tsutomu, 1927- (11)Takahashi, Mikio, 1935- (11)Noguchi, Takehiko. (8)Watanabe, Shin’ichirō, 1934- (7)
But not both mixed together.
Let end user decide?
Jan 2010
Labels and Facets_____________________________________________________
We would like to choose our preference of display script here. For example,
<Original scripts>江戸By: 野村兼太郎 , 1896-1960.Published: 1942Format: Book, Electronic Resource
江戶 の 翻訳家たち By: 杉本 つとむ , 1927- Published: 1995Format: Book, Electronic Resource
We would like to ask library users the best option for displaying parallel field data:<Original scripts>江戶 / 田中優子編 . Contributors: 田中優子 , 1952-Format: Book Language: Japanese Published: 東京 : 作品社 , 1998.Series: 日本の名随筆 . 03 别卷 ; 94
<Paired w/OS first>江戶 / 田中優子編 . Edo / Tanaka Yūko hen. Contributors: 田中優子 , 1952-
Tanaka, Yūko, 1952- Format: Book Language: Japanese Published: 東京 : 作品社 , 1998.
Tōkyō : Sakuhinsha, 1998.
Series: 日本の名随筆 . 03 别卷 ; 94 Nihon no meizuihitsu. 03 Bekkan ; 94
Jan 2010
Language/Script of Interface _____________________________________________________
OCLC’s brief record display
Interface easily flipped to one of several languages
Jan 2010
Language/Script of Interface_____________________________________________________
OCLC’s detailed record display with Japanese language interface
Language/Script of Interface
OCLC WorldCat.org does localization of labels and instructions as well as localization of mapped facet values. Examples here in Chinese.
Jan 2010
Language/Script of Interface_____________________________________________________
Jan 2010
Language/Script of Interface & Text Directionality_____________________________________________________
Jan 2010
Sorting of Results_____________________________________________________
江戸文学俗信辞典 Edo bungaku zokushin jiten
江戸文学地名辞典 Edo bungaku chimei jiten
江戸文学辞典 Edo bungaku jiten
江戸文様辞典 Edo mon’yo jiten
Jan 2010
Sorting of Results_____________________________________________________
Also note bi-directional text
Jan 2010
Sorting within result sets: Options to Consider
_____________________________________________________
For multiple languages sharing a script, e.g. Chinese ideographs, Arabic, Hebrew, or Latin, how would the users prefer to see the result sets sorted?
We consider here the Chinese & Arabic cases…
Jan 2010
Sorting within Result Sets: Options to Consider
_____________________________________________________
Sorting of results returned in Chinese script—
Three sort strategies: (a) sort by Romanized equivalents; (b) sort by pronunciation; or (c) sort by radical-stroke?
Jan 2010
Sorting within Results Sets:Arabic script
_____________________________________________________
How to handle additional Arabic-script characters in use for languages such as Persian, Kurdish, and/or Urdu?
(fah ,ف vah, derived from) ڤ پ (pah)(gim , ج chah, derived from) چ (gaf) گ (zayin ,ز zāī, derived from) ژ
Jan 2010
Discussion
User Needs and Expectations