Testing multilingual support in Mail User Agents TERENA Pilot Project
-
Upload
jarrod-eaton -
Category
Documents
-
view
27 -
download
0
description
Transcript of Testing multilingual support in Mail User Agents TERENA Pilot Project
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_1
Testing multilingual support in Mail User Agents
TERENA Pilot Project
Yuri Demchenko, TERENA <[email protected]>
TNC’98 Dresden October 5-8, 1998
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_2
TERENA Pilot Project on Testing Multilingual MUAs
• Officially started in April 1998 till September 1998
• The project objectives can be described as:
– Develop benchmarking methodology for Multilingual MUAs, and specify templates for collecting the results in a coherent way.
– Design a set of composite multilingual test messages
– Configure each MUA for all supported national character sets and send the test messages to other MUAs and to themselves.
– Compile the results, analyzing how the MUA composes, sends, receives and displays the test messages.
– Prepare recommendations for users - correct setup and operation of popular multilingual MUAs
àç
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_3
The list of mail clients to be tested
• Derived from TERENA MUAs usage statistics based on analysis of more than 3000 messages from TERENA Mail archives collected during the period August 1997 - March 1998
áóêè
Microsoft Windows (NT, 3.11, 95)•Microsoft Outlook Express •Netscape Mail 3.x and 4.x •Netscape Messenger •Qualcomm Eudora 3.0 and 4.0 beta •Pegasus Mail •The Bat! •ESYS Simeon•Alis Tango Mailer
UNIX Terminal•Elm •MH•Pine
UNIX GUI (with X11R6) •Netscape Mail •EXMH •Z-Mail
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_4
Activity and Projects in i18n and Multilingual Support
• i18n activity (ISO, IETF, ECMA, TERENA, Unicode Consortium)
• CEN/TC304 works on European character sets and keyboard
• MAITS project
• Internet Mail Consortium - Report on using International Characters in Internet Mail
• Terena Pilot Project on Testing Multilingual support in MUAs
âåäè
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_5
Internet Mail Consortium - i18n Report
Summary of recommendations
1. Explicit charset parameter
2. Sending UTF-8
3. Displaying UTF-8
4. Choosing charsets on creation
5. Specifying languages
6. Multi-language text
7. Non-ASCII headers
8. Handling all common charset
9. MTAs and 8-bit content
ãëàãîë
Report strongly recommends that all mail-creating and mail-displaying programs created or revised after January 1, 1999, must be able to create and display mail using UTF-8 and have ability to handle all common charsets in addition to UTF-8
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_6
Standard on i18n and Character Sets Technologies
• ISO standards– ISO 2022 Character Set Concept and Terminology
– ISO 8859-x Character Sets
– ISO Standards on APIs i18n and FDCC
• Unicode standards
• RFC 2277 IETF Policy on Character Sets and Languages
• Recommendation of IAB Workshop on character sets technology (RFC 2130)
• MIME format of messages (Using MIME in Internet Mail) RFC 2045-RFC 2049
• RFC 822 - Syntax of electronic messages format according
äîáðî
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_7
Standards in i18n and Multilingual Support in Internet Mail
• RFC 2045 - RFC 2049, RFC 2231 - MIME – Coded Character Set
– Character Encoding Scheme specified by the Charset parameter to the Content-Type header field
– Transfer Encoding Syntax like Base64, QP specified by the Content-Transfer-Encoding header field
• RFC 2277 - IETF Policy on Character Sets and Languages – main definitions and requirement for language tagging
• RFC 2130 - Recommendation of IAB Workshop on character sets technology
– framework for interoperability between the many characters in use
– an architecture model for on-the-wire transmission of text
– recommendations for tagging transmitted (and stored) text
åñòü
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_8
RFC 2130 Architecture model
• User interface issues (OS, GUI, API)– Layout
– Culture
– Locale
– Language
• On-the-wire– The Coded Character
– The Character Encoding Scheme
– The Transfer Encoding Syntax
æèâåòå
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_9
The testing and the evaluation scheme
MTA
MTA
OS Environment (Language, KBD, TTFs, l10n, etc.)
Compose Settings(Font (A, S, B, Q),Mapping) Change Settings
(Language/Encoding)
Send Settings(MIME (QP, Base64),uuencode)
Compose Message(Type, Cut&Paste, Reply,Forward, Attachment)
MUA
Message Editor Message Sender
Sending Message
MessageComposer
Set of MLTest Messages
OS Environment (Language, KBD, TTFs, l10n, etc.)
Read Settings(Font (A, S, B, Q),Mapping) Change Settings
(Language/Encoding)
Receiving Settings(MIME (QP, Base64),uuencode)
Read Message(Replied Msg, ForwardedMsg, Attachment)
MUA
Message Reader Message Receiver
Receiving Message
MessageReader(Human, User)
çåëî
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_10
Testing of Multilingual support in MUAs
• Includes the following phases:
– Evaluation of Multilingual features/settings of MUAs
– Testing Message Reading procedure
– Testing Message Composing procedure
– Testing Message Sending and Receiving procedure
çåìëÿ
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_11
Evaluation of Multilingual features/settings of MUAs
• READ operation mode– choose Language/Encoding
– choose Fonts (Optional for Address, Subject, Message Body, Quoted Text)• Optional - Font mapping
• COMPOSE operation mode– choose Language/Encoding Settings
• Optional - Possibility to switch Language/Encoding during composition/typing
– choose Fonts (Optional for Address, Subject, Message Body, Quoted Text)• Optional - choose Spelling/Language/Dictionary
• SEND operation mode– set MIME encoding (Quoted Printable, Base64)
• Optional - select/disable Uuencode mode (non standard)
– Allow/disallow 8-bit in Header Fields
– select/disable HTML in body parts
èæå
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_12
Message Reading procedure
• Multilingual MUAs should support the following features:– Reading/Displaying non-ASCII characters in Message Body
– Reading/Displaying non-ASCII characters in Message Header (Address, Subject Lines)
– Reading Forwarded Message with non-ASCII characters in Address, Subject, Message Body, using the same or different MIME character set attributes
– Reading Attached non-ASCII Text File (Document)
• Possible problems are detected comparing the original and the delivered test messages appearance– This includes the evaluation of the MUAs correct/incorrect
processing of the MIME attributes of the test message.
è
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_13
Message Composing procedure
• Message composition operations to be tested– Typing message from keyboard
– Copy and Paste operations
– Text/File attachments
– Quoted text/message
– Edit different parts of message
– Charset/Encoding processing by Message Composer/Editor
• Real Message composition also includes operations like:– Typing non-ASCII text in Message Body and Message Header
– Pasting non-ASCII-Text into Body and Header fields
– Reply to message with non-ASCII Text
– Forward message with non-ASCII content
– Attach text documents containing non-ASCII characters
êàêî
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_14
Test messages set
Each test is performed in at least 2 character sets, one of which is US ASCII (or ISO 8859-1), and the other with characters that are not part of US-ASCII or ISO 8859-1.
• Mandatory– tmsg1 - Message with non-ASCII characters/text in the Subject line
– tmsg2 - Message with non-ASCII characters/text in Mail Address free-form name
– tmsg3 - Message with non-ASCII characters/text in the Message Body text (single part)
– tmsg4 - Message with non-ASCII characters/text in text/plain attachment
• Optionally– tmsg6* - Message with UTF-7/UTF-8 Character set in
Message Body and Header (optional)
ëþäè
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_15
Testing program mapìûñëåòå
test-1display
test-2print
test-3reply totmsg12
test-4reply totmsg3
test-5reply totmsg3 Cut&Paste
test-6forward all
test-7type kbd
test-8exch tmsg5
test-9test-1-5tmsg6
tmsg1non-ASCIISubjecttmsg2non-ASCIIAddresstmsg3non-ASCIIBodytmsg4non-ASCIIAttachmenttmsg5non-Latin1defaulttmsg6UTF8 inBody, Header
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_16
Testing Methodology - The tests to be performed
• test-1 - Receive all 4 test messages tmsg1-tmsg4 and display them correctly (Change Language/Alphabet/Encoding Options if needed)
• test-2 - Print all 4 messages tmsg1-tmsg4 to the standard printer
• test-3 - Reply to messages tmsg1 and tmsg2, and check that information is returned in the same character set as it arrived in
• test-4 - Reply to message tmsg3 using "reply including quote of body"
• test-5 - Reply to message tmsg3 using the environment's "cut and paste" function to insert the non-ASCII characters into the outgoing message
• test-6 - Forward all 4 messages to the originator address
• test-7 - Generate, as completely as possible, the same messages from the keyboard of the IUT
• test-8* - Check possible text distortion when exchanging by tmsg1-2-3 with non-ASCII Default Language/Alphabet/Encoding
• test-9* - Provide tests 1-5 for message tmsg6* with UTF-7/UTF-8
íàø
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_17
Testing Results Presentationîí
MS Outlook Express 97 for Windows 95 URL: http://www.microsoft.com/outlook/Language/ EncodingSetting
Examination:non-ASCII text(8-bit)Send/Receive/Attachment
Support of non-ASCIItext in RFC 822message parts/fields
Testing:Support of non-ASCII text
NotesProblemsRecommendations
Compose(As is, MIME(QP, Base64),UTF7/UTF8,HTML)
Body Subject
AddressFree-form
Read Type Paste Send Forwardmessage
Attachedtext
MessagesList
Central European(ISO, Windows)Cyrillic (ISO,Windows, KOI8-R,KOI8-RU)……Universal Alphabet(UTF-7, UTF-8)
As isMIME (QP, Base64)UTF7/UTF8HTMLHTML(Multipart/Alternative)
+ + + +**+*5
+***+*6
+ +*4 + + +*5 ** You can’t changeencoding for Cyrillictext when readingmessage
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_18
ML MUAs Testing Results and Data Analysis
• Testing results are documented and presented at – http://park.kiev.ua/multiling/ml-mua/prjdocs/mlmua-repv1.html
• Standards overview on Internationalisation and Multilinguality – http://park.kiev.ua/multiling/ml-mua/mldoc-review.html
• Test messages constructor pilot version – http://park.kiev.ua/multiling/ml-mua/testcon.html
ïîêîé
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_19
Evaluation of ML MUAs
• First group - includes MUAs that support multiple languages/alphabets by means of multiple charsets support and use internal language/charset transformation
• Microsoft Outlook Express – Netscape Messenger 4.04 and previous product Netscape Mail 3
– exmh for X Windows
• Second group - provides ML support by selecting proper font for creating and displaying messages
ðöû
– Eudora Pro 3.0
– Pegasus
– Forte Agent
– The Bat!
– Simeon
UNIX Terminal Products
– pine
– elm
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_20
First group - Full Multilingual Support
• Microsoft Outlook Express – has the best and richest multilingual support
– use effective internal conversion scheme that is good controlled by users via setup and Alphabet/Charset selection menu
• Netscape Messenger 4.04 and Netscape Mail 3.04 – provide rich multilingual support for many charsets/encodings
– but are very inflexible for Languages that have many charsets in use (F.E., Cyrillic Windows CP-1251 and KOI8-R/U for Russian/Ukrainian, or ISO 8859-2 and Windows CP-1250 for Central European Languages
– Netscape products for X Windows - the same features.
• exmh for X Windows – provides good support for main groups of European languages
using Latin 1, Latin 2 Cyrillic charsets
ñëîâî
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_21
Second group – Simplified Multilingual Support
• Popular in Latin1 (ISO 8859-1) and English speaking community
• Languages and charsets/encodings support is provided by selecting proper font for creating and displaying messages. – Eudora Pro 3.0
– Pegasus
– Forte Agent
– The Bat! – provide simple conversion between Cyrillic encodings (ISO 8859-5, Windows CP-1251, KOI8-R)
– Simeon
– pine and elm for UNIX
òâåðäî
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_22
Common problems of multilingual support in MUAs
• Conversion between different Encodings/Charsets for the same language
• Correct processing of MIME tags in message Header fields (Subject and Address lines) during displaying when charset name in header is different from Message Body
• The same problems occur when user tries to change Charset/Encoding when displaying or composing message, or use Copy&Paste operations for different Charsets
• View message source code and/or message info (charset/encoding for the Header and Body, Multipart MIME structure, so on)
• Using common and correct terminology for language/charset settings in MUAs
óê
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_23
Project’s Main Results
• The international environment of the project allowed to discover the main problems in multilingual MUAs support
• Multilingual test messages set
• Evaluation scheme for the forthcoming ML MUAs
• Project activity was conducted in coordination with other multilingual related projects:– IMC MAIL-I18N report on Internationalization and Character Set
technologies
– Mozilla i18n project (Netscape 5.0)• PT members have contributed to the new Ukrainian Language enabled Mozilla
• proposed model of multilingual support in MUAs was discussed
– ESYS Simeon IMAP Mail multilingual features testing
ôåðòü
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_24
Follow-on Projects and activityõåð
• Testing new products using proposed methodology– New releases of OutLook Express 98, Netscape Messenger 4.5 and 5.0
– New products of 1999 that is expected will implement recommendations of IETF/IMC
• Another areas of further activity– Establishing ML/i18n supporting Charsets repository for online support of
Multilingual mail (mapping reference tables download, translation, configuration, etc.)
– Creating Web based ML test messages Constructor which pilot version is demonstrated at project’s page
• http://park.kiev.ua/multiling/ml-mua/testcon.html
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_25
Test Messages Constructor http://park.kiev.ua/multiling/ml-mua/testcon.html
îò
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_26
Test Messages Constructor - Creating test messageöû
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_27
Project Team
Yuri Demchenko, TERENA
Konstantin Chuguev, Ural Technical University, Russia
Janja Faganel, Jozef Stefan Institute, Slovenia
Vadim Shevchenko, Kiev Polytechnic Institute
Alexey Medvedev, Kiev Polytechnic Institute
÷åðâü
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_28
Acknowledgmentsøòà
• Borka Jerman-Blazic, Jozef Stefan Institute, Slovenia
• Claudio Allocchio, Sincrotrone Trieste & INFN Trieste, Italy
• Peter Heijmens Visser from TERENA for provided MUAs usage statistics
• Harald T. Alvestrand, Maxware Norway
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_29
IMPORTANT NOTE
Multilingual page will be moved and supported at TERENA webserver
http://www.terena.nl/multiling/
åð
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_30
åðû
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_31
åðü
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_32
ÿòü
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_33
þ
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_34
èà
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_35
þñ ìàëûé
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_36
þñáîëüøîé
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_37
êñè
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_38
ïñè
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_39
Russian/Ukrainian LanguagesHistorical overview
• VI-XI cent. - Ancient Rus written language
• X-XIV cent. - Cyrillic written language– Invented by Cyrill and Methody (Saloniki) in IX cent
– First introduced in Moravia with advent of Christianity
– Introduced in Kiev Rus with advent of Christianity in X cent.
• XIV-XVII - Forming Russian literature language– With Forming Moscow State after Mongol higo
• XVII - Developing modern Russian literature language– Lomonosov, Puskin
ôèòà
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_40
Ukrainian Literature Languageèæèöà
• Common ancient roots with Russian and all Slavic languages
• Was influenced by centuries of conquerors’ languages– features of analytical language (as English)
• 1818 - Published Gramatics of Ukrainian (malorussian) dialect– introduced “ukr. i”, “¥´” (for “kg” sounds), spelling of “äç”, “äæ”
– Forming modern Ukrainian literature language (Taras Shevchenko)
• 1921 - Published “Main rules of Ukrainian orthography”• 1984 - introduction of new/lost ukr. letter “¥´”
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_41
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project ML_MUA_42