CELAN WP 2 DELIVERABLE D2.1 ANNOTATED CATALOGUE … · ANNOTATED CATALOGUE OF BUSINESS-RELEVANT ......

90
CELAN WP 2 – DELIVERABLE D2.1 ANNOTATED CATALOGUE OF BUSINESS-RELEVANT SERVICES, TOOLS, RESOURCES, POLICIES AND STRATEGIES AND THEIR CURRENT UPTAKE IN THE BUSINESS COMMUNITY Project Title: CELAN Project Type: Network Project Programme: LLP – KA2 Project No: 196466-LLP-1-2010-1-BE-KA2-KA2PLA Version: 1.1 Date: 2013-01-30 Author: Infoterm Contributors: Universität Wien (interviews), FAV (CELAN Typology and interview format); FU Berlin (CELAN Typology and interview format), other CELAN partners (comments) and external experts (comments) The CELAN network project has been funded with support from the European Commission, LLP programme, KA2. This communication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Transcript of CELAN WP 2 DELIVERABLE D2.1 ANNOTATED CATALOGUE … · ANNOTATED CATALOGUE OF BUSINESS-RELEVANT ......

CELAN WP 2 – DELIVERABLE D2.1

ANNOTATED CATALOGUE OF BUSINESS-RELEVANT SERVICES, TOOLS, RESOURCES, POLICIES AND STRATEGIES AND THEIR CURRENT UPTAKE IN THE BUSINESS COMMUNITY

Project Title: CELAN Project Type: Network Project Programme: LLP – KA2 Project No: 196466-LLP-1-2010-1-BE-KA2-KA2PLA

Version: 1.1 Date: 2013-01-30 Author: Infoterm Contributors: Universität Wien (interviews), FAV (CELAN Typology and interview format); FU Berlin (CELAN Typology and interview format), other CELAN partners (comments) and external experts (comments) The CELAN network project has been funded with support from the European Commission, LLP programme, KA2. This communication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

CELAN D2.1_fv1.1

2

Executive Summary Within the framework of the

“overarching aim of WP2 to identify and analyse existing language (and language-related) services, tools and resources relevant to business users, i.e. those that have proved practical usefulness and can be shown to enable business users to address specific language needs”

Task 2.1 is to cover “desk research on business-relevant language policies/strategies, language training and assessment, language technology tools, linguistic/language services, language and other content resources, guidelines and standards, and consultancy services with a view to producing a catalogue of such services, tools, and resources. In addition, the task will seek to indicate the current uptake of respective resources within the business community. … /and include/ the identification and evaluation of pertinent standards.”

Starting off from the requirement that “WP2 and especially T2.1 are about highlighting language services, tools, and resources that are used by and useful for enterprises – in particular SMEs” industry is analysed from a LI perspective followed by an in-depth investigation of the products and services of the language industry sector in the form of an evaluation of

Language technologies (LT) and language technology tools/systems (LTT),

Language and other content resources (LCR),

Language services (LS) and language service providers (LSP),

LI sector-internal services,

LI-relevant standards, guidelines and certification,

LI-related policies/strategies. T2.1 was carried out in close coordination with T2.2 from the very beginning. Thus, desk research supported by preliminary consultations for the interviews resulted in a comprehensive overview (see: CELAN D2.1 Annex 1) of LI products and services which are evaluated in this document. Since the number and variation of LTT and LCR as well as LSP is increasing virtually by the day, the “CELAN Typology of Language Industry Products and Services” (see: CELAN D2.1 Annex 3) was developed as a comprehensive meta-catalogue providing guidance through the jungle of fast increasing LI products and services. CELAN D2.1 collects ample indications for the current uptake of respective LI resources within the industry&business community, bearing in mind the increasing language- and LI-related needs (and demands stemming from the needs) of industry&business triggered by accelerated globalization and the development of the Internet (and other global networks) as technological driving forces of globalization. The investigations in WP2 show that the LI has become quite complex over the last ten years. There are numerous and many different kinds of LI products (comprising LTT and LCR) as well as services (LS) and LSP on the market. Therefore, the demand for qualified language experts has definitely risen, on the one hand. However, it also became evident that the success of an SME on global markets was often attributed to the use of LT (or LSP) rather than to language competences/skills on the other hand. In other cases, it was directly attributed to the application of ICT, although LT deserved the merits. This shows that both language experts and LI in general have an image problem. Concerning the indication of the current uptake of LI offers in the business community one can say that the LI in all its facets can meet customers’ language-related needs and demands. CELAN D2.1 can show that the resources offered by the LI are at different levels of complexity and sophistication and in fact address an array of customer needs/demands of different levels of complexity and sophistication depending on – among others – the size of the enterprise, its degree of specialization, the industry sector it belongs to and the customer demands of the target markets it aims at. Nevertheless the uptake of LI products and services – while being widespread in large-scale industry – is still limited among SME. The reasons why the visibility of the LI among SME is still comparatively low will be investigated and analysed in CELAN D2.4.

CELAN D2.1_fv1.1

3

Abbreviations AAC augmentative and alternative communication AAL ambient assisted living CALL computer-assisted language learning CAT computer-assisted translation CBT computer-based training CDP content development platforms CEFR Common European Framework of Reference for Languages: Learning,

Teaching, Assessment guidelines CL corporate language CLIL content and language integrated learning CMS content management systems CNL controlled natural language CPU control processing unit CRM customer relationship management CMT controlled machine translation CT corpus technology DITA Darwin Information Typing Architecture DfA design for all DTP desktop publishing ERP enterprise resource planning ECM enterprise content management EUATC European Union of Associations of Translations Companies FOSS free and open source software GILT globalization, internationalization, localization and translation G11N globalization HEI higher education institutions HLT human language technologies HTML hypertext mark-up language I&D information and documentation I18N internationalization ICT information and communication technologies IMT interactive machine translation IPR intellectual property rights L10N localization LCR language and other content resources LI language industry LMS learning management system LO learning objects LOM learning object metadata LS language services LSP language service providers LT language technology LT&T language teaching and training LTT language technology tools MALL mobile assisted language learning MT machine translation MoU/MG Management Group of the ITU-ISO-IEC-UN/ECE Memorandum of

Understanding concerning eBusiness standardization NLP natural language processing NMB national member bodies OA office automation OAXAL open architecture for XML authoring and localization reference model OCR optical character recognition

CELAN D2.1_fv1.1

4

OER open educational resources OPE one person enterprise OPI over the phone interpreting OS operating system OSS open source software PDA personal digital assistant PDM product data management PIM product information management PR public relations PwD persons with disabilities QA quality assurance SaaS software as a service SC sub-committee SCORM sharable content object reference model SDO standards developing organizations SME small and medium-sized enterprises SOP standard operating procedures ST speech technology STT speech technology tools STM scientific, technical and medical (writing) TBT Technical Barriers to Trade (Agreement on) TD technical documentation TC technical committee TEL technology enhanced learning TL-PMS translation/localization project management systems TM translation memory TMS terminology management systems UI user interface UNCRPD United Nations Convention on the Rights of Persons with Disabilities VUI voice user interface WCAG world content accessibility guidelines WCXM web content & experience management WG working group WTO World Trade Organization WYSIWYG what you see is what you get XML eXtensible Markup Language

CELAN D2.1_fv1.1

5

Table of contents

Executive Summary .................................................................................................. 2 Abbreviations .......................................................................................................... 3 Table of contents ...................................................................................................... 5 1 Industry at large and the language industry (LI) ........................................................... 6 1.1 Fragmentation of industry&business ..................................................................... 6 1.2 The Language industry (LI) ................................................................................ 7 1.3 Globalization and multilingualism ......................................................................... 8 1.4 Preliminary assumptions ..................................................................................11 2 Methodology ....................................................................................................14 3 Evaluation of language technologies (LT) and language technology tools/systems (LTT) ........17 3.1 Translation technology .....................................................................................18 3.2 Text technologies ...........................................................................................20 3.3 Terminology management systems (TMS) .............................................................22 3.4 Speech technology (ST) and speech technology tools (STT) .......................................24 3.5 Content management systems (CMS) ..................................................................26 3.6 Language teaching/learning systems/tools .............................................................28 4 Evaluation of language and other content resources (LCR) ............................................32 4.1 Structured content ..........................................................................................32 4.2 Unstructured content .......................................................................................34 4.3 Analysis of LCR .............................................................................................34 5 Evaluation of language services (LS) and language service providers (LSP) .......................39 5.1 Text creation, editing, re-purposing services ...........................................................40 5.2 Translation services ........................................................................................43 5.3 Interpreting services ........................................................................................46 5.4 Localization (L10N), globalization (G11N), internationalization (I18N) services ..................48 5.5 Desktop publishing (DTP) services (complementary to GILT services) ...........................51 5.6 Language teaching&training (LT&T) .....................................................................53 5.7 Language-related industry consultancy services ......................................................55 5.8 Communication services for persons with disabilities (PwD) ........................................57 6 Evaluation of LI sector-internal services ...................................................................60 7 Evaluation of LI-related standards, guidelines and certification relevant to industry&business ...62 7.1 Basic standards related to the ICT infrastructure with particular impact on the LI ...............64 7.2 Specific standards pertaining to LTT, LCR, LS and LI-related competences and skills .........65 7.2.1 Specific standards pertaining to LT ...................................................................65 7.2.2 Specific standards pertaining to LCR .................................................................66 7.2.3 Specific standards pertaining to language services ................................................67 7.2.4 Specific standards on LI related competences and skills and certification schemes .........67 7.3 LI-relevant certification .....................................................................................68 8.1 Standardization and certification as a service to the public/society at large ......................70 8.1.1 Standardization (as a service to the public/society) ................................................71 8.1.2 Certification (as a service to the public/society) ....................................................73 8.2 Language policy (as a service to the public/society) ..................................................74 8.3 Accessibility policies and strategies (as a service to the public/society) ...........................75 8.3.1 Accessibility policies at international and European level .........................................75 8.3.2 Accessibility-related standardization activities at European level ................................76 8.4 Business-relevant language policies and strategies ..................................................78 8.4.1 Overall enterprise language policies/strategies .....................................................78 8.4.2 Language policies/strategies for enterprises belonging to the LI ................................80 9 Indication of the current uptake of LI offers in the business community ..............................81 10 Conclusions ..................................................................................................82 References (documents): ...........................................................................................83 References (standards and legislation):..........................................................................85 List of Annexes .......................................................................................................85 Appendixes: ...........................................................................................................85 Appendix 1: Tables ..................................................................................................86 Appendix 2: LI Mind map ...........................................................................................87 Appendix 3: Recommendation on software and content development principles 2010 ..................90

CELAN D2.1_fv1.1

6

CELAN D2.1

Annotated catalogue of business-relevant services, tools, resources, policies and strategies

and their current uptake in the business community

1 Industry at large and the language industry (LI) According to the proposal for the CELAN project:

“The principal short-term target group of the proposed project is business users (both employers and employees) who need support in the development of multi-lingual expertise in order to enhance and sustain their economic activity. In particular, it is expected that these users will be primarily active in SMEs, who are typically less aware of/or have more limited access to the resources provided by the language industry. A further potential target would be young people seeking initial employment, who by analysing the needs expressed by employers and appreciating the various systems now offered to support and expand multilingualism, may be better placed to prepare themselves for the business world.”

In order to relate the essence of this statement to industrial reality, one has to analyse, where in Industry (incl. the LI) employers or employees need language- or LI-related competences, and how the following industrial aspects relate to each other:

Overall structure of industry from a language and LI perspective,

Sector-internal structure of the language industry sector.

1.1 Fragmentation of industry&business Industry in general is highly fragmented: many different categories of enterprises are active in hundreds of sectors of industry&business. As large-scale enterprises are mostly active at global level, they usually know about globalization (G11N), localization (L10N) and internationalization (I18N) as well as the language, legal and cultural requirements necessary to perform well on foreign markets. This has given “communication” in all its guises new dimensions. In this connection the various products and services of the LI have become an important factor to improve communication within a given language community and between language communities.

Globalization (G11N) – largely accelerated due to the development of the Internet – has fundamentally changed the way of how exports and trade are conducted: from physical presence, (analogous) communication and documentation… to electronic exchange of information. This is also reflected by a growing number of international legal instruments (such as the WTO Agreements – see also clause 7.1) and by increased standardization activities, which have impacted language use and application to a degree unimaginable 10-20 years ago. In this context the fulminant rise of the language industry over the last ten years becomes understandable. (see also: Study on the size of the language industry in the EU, 2009)

These reflections led to a clarification of the relationship between the various sectors of the LI and industry at large, which – from a LI perspective – can be represented graphically in Figure 1 below. Roughly speaking, industries/businesses can be subdivided into:

Large-scale enterprises,

Medium-sized enterprises,

Small enterprises,

Micro-enterprises (including OPE). In this figure industry&business can also be differentiated into manufacturing (producing), trading (trade) and rendering services. Enterprises can develop and use LI products and

CELAN D2.1_fv1.1

7

services internally or outsource their development and use to LI service providers. Increasingly, enterprises are looking for the most appropriate combination of having/doing things by themselves and outsourcing. As information and communication technologies (ICT) infrastructures, systems/tools and services are everywhere, they are not depicted in the table but duly taken into account throughout WP 2.

*including LI sector-internal business: language services and LTT & LCR development

Figure 1: Relations between categories of enterprises and the LI as well as among LI

1.2 The Language industry (LI) The language industry (LI) is that sector of industrial activity dedicated to designing, producing, and marketing tools, products, or services related to or based on human language technology (HLT). Undeniably, there would be no LI, if there were no language technologies (LT) with their language technology tools/systems (LTT). Thus, from a technology point of view, the LI is a part of the ICT (information and communication technologies) industry, but also draws upon the fields of linguistics, lexicography, software engineering, artificial intelligence, and interface design. It is geared towards supporting a range of different traditional and new applications. As a sign for its maturity, LI is also quite fragmented with probably more than hundred categories of enterprises, which are mushrooming and competing for customers. On the one hand, these LI enterprises know their markets quite well – it is part of their survival strategy. On the other hand, complex sector-internal business relations have developed over time and continue to increase – in spite of certain concentration and integration tendencies. As in European industry at large small and medium-sized enterprises (SME) prevail in the LI, whereby in the fields of language service providers (LSP) the vast majority are individuals, many of them OPE. Aiming at obtaining “representative” results in WP2 proved to be a challenge. The LI is a multi-category industrial sector and there is no established statistical category in industry, research and education called “language industry”. Language technologies (LT) are a

CELAN D2.1_fv1.1

8

foundational part of the LI; they support the rendering of language services (LS – e.g. done by LSP), language-related activities in industry&business at large, the development of language and other content resources (LCR) as well as the application of computer-assisted teaching methods in language learning (and beyond). Thus, the industry sector called language industry (LI) is in itself highly self-referential on the one hand, (see: figure 2 below) and is increasingly expected to integrate or become interoperable with other tools, methods and activities in the enterprise, on the other hand.

Figure 2: Relations between the various aspects of the LI

It is important to recognize that most of the identified products and services on the LI market did not exist (at all or in their present form) 15 years ago – many of them not even 5 years ago. Similarly, many of the individual features of LI products and services on the market today did not yet exist around 2000. This must also be reflected in connection with education and training. Last but not least, sector-internal development and provision of tool and services has developed into a substantial business within the LI. This, too, offers job opportunities for language and language technologies experts.

1.3 Globalization and multilingualism Globalization (G11N) triggered an exponential increase:

In terms of quantities and different forms of communication (largely in the form of linguistic information and documentation or structured and unstructured content- increasingly in electronic form),

In the number of languages to be dealt with even by small enterprises. No company – not even the biggest ones – could have coped with this exponential increase without the results of R&D in the fields of human language technologies (HLT) after World War II turning into industrial applications. Undeniably, there would be no language industry (LI), if there were no language technologies (LT) with their language technology tools/systems (LTT). G11N also triggered the need first for language related standards, then language technology related standards, later certification and new kinds of managerial policies/strategies. Some of these standards, certification schemes and policies/strategies rely on or refer to new professional profiles, competences and skills, which are not taught at formal educational

CELAN D2.1_fv1.1

9

institutions. Because of this and given the fact that the LI has become quite fragmented, more and more the acquisition of the needed linguistic and technical competences and skills is done through vocational training and on-the-job-training. But even vocational training can hardly catch up with the speed of development. While many traditional language-related jobs, such as employed translators, were abolished, minimized or redefined in the course of this development, a multitude of new jobs emerged:

Based on the demand for new professions as well as professional profiles, competences and skills in general,

Throughout the rapidly growing LI, especially by language service providers (LSP),

New demands of language competence in enterprises as well as in public administration.

Formal education definitely lags behind to comply with the need for language-related competences and skills in industry (including the LI). Thus, there are many gaps, which need to be filled in the future.

Figure 3: T-Index – Which markets offer the most potential for your website? (Taken from Translated (2012-08-14_23:01_CET): http://www.translated.net/en/languages-that-matter)

CELAN D2.1_fv1.1

10

In order to make the question of language in relation to globalization tangible, the language service provider (LSP) Translated s.r.l. (based in Italy) prepared the T-Index which provides answers to the question “Which markets offer the most potential for your website?” T-Index is a statistical index that shows online market share per country. (see: figure 3 above) It is a percentage value that indicates the online market share of each country on the Internet by combining the Internet population and its estimated GDP per capita. The higher the T-Index, the higher is the online sales potential in a country. However, its figures can be taken also as a general compass concerning the importance of language in the globalization strategies of enterprises. Sorted by language (see also Table A in Appendix 1) T-Index claims (based on somehow simplifying statistical calculations) that:

Translating a website into these 3 languages gives you access to 50% of the worldwide online sales potential: English, (simplified) Chinese, Spanish;

Translating a website into these 10 languages gives you access to 80% of the worldwide online sales potential: (in addition) Japanese, German, French, Portuguese, Russian, Arabic, Korean;

Translating a website into these 15 languages gives you access to 90% of the worldwide online sales potential: (in addition) Italian, (traditional) Chinese, Dutch, Turkish, and Farsi/Persian.

However, these figures reflect a snapshot of today neglecting many language communities of many million people. The world is changing fast and the figures may look quite different already in a couple of years. Beside, these figures match only to some degree with those calculated by country. Furthermore, mobile phone manufacturers are making efforts to extend the number of interfaces localized into different languages towards above 1.000 in order to reach an ever increasing number of communities of lesser used languages. In 2006 it would have needed 130 languages to reach 1 billion Internet users. In 2009 one could potentially reach 1 billion users with only the languages of the top 10 language communities. By 2015, as growth migrates to developing nations and it becomes conceivable to connect as many as 5 billion users, it will take more than 1,000 languages to reach them. (Lionbridge, 2009) Sorted by country (again based on somehow simplifying statistical calculations – see also Table B in Appendix 1):

Localizing a website for these 5 markets gives you access to 50% of the worldwide online sales potential: USA, PR China, Japan, Germany, UK;

Localizing a website for these 20 markets gives you access to 80% of the worldwide online sales potential (in addition to the above): France, Brazil, Russia, South Korea, Italy, Canada, Spain, Mexico, Turkey, India, Australia, Taiwan, Iran Netherlands, and Argentina.

However, only few countries (e.g. China) show a fast growing share of the online market.

Figure 4: Internet users distribution by region (Taken from: Education Sector Factbook 2012

http://gsvadvisors.com/wordpress/wp-content/uploads/2012/04/GSV-EDU-Factbook-Apr-13-2012.pdf)

CELAN D2.1_fv1.1

11

Multilingualism is usually defined in connection with language competences in more than two languages. The narrow definition of some researchers of bilingualism studies (namely “only those individuals who are very close to two monolinguals in one should be considered bilingual”) applied to multilingualism has been abandoned by many researchers in favour of “a common human condition that makes it possible for an individual to function, at some level, in more than one language”. The key to this very broad and inclusive definition of bi-/multilingualism is 'more than one'. However, under the perspective of the LI there is also another key, namely “to function, at some level”. Let us consider certain knowledge of or about:

Certain or some languages for software engineering or adaptation purposes,

The cultural, legal and other requirements of localization of products and services into other languages,

The cultural, legal and other requirements for doing business in other language communities (even within the same language).

A beginner’s or even “rudimentary” knowledge might be sufficient for performing well under given requirements. Different kinds of “language+” competences – where language possibly is not the dominant factor – as indicated above have become important today at every level from the workshop level via academia and public administration up to decision making in many application fields. In many circumstances the ability to communicate – rather than a perfect knowledge of the language – is the key to achieve the intended results.

1.4 Preliminary assumptions Could it be that the LI industry is booming, because people from outside of the traditional language professions were looking for and finding new ways to overcome language and communication barriers? One of these ways came from management, whose function among others is to reduce costs. Another way came from human language technologies (HLT) to overcome language barriers and costs by means of technology. Both influenced each other in the LI. The Internet megatrend also opened the doors to new learning technologies, systems and methods, thus enabling people to exchange and create increasingly more information which is an important source of knowledge building and sharing. Another push factor is the general development of the ICT sector, which has become one of the main types of customer of the LI. While making extensive use of LT and LS the ICT somehow disregard LI as a sector of its own.

The highly significant statement of the company owner in the field of LSP: “If business in industry is thriving, there is business for LI products and services” indicates that:

The development of the LI is dependent on the demand from industry at large;

Obviously there is a need for having the LI in all its facets today;

Finding solutions to the issues of language in conjunction with legal and cultural requirements is one of the prerequisites for the success of business. This fact should be reflected in business strategies and policies at individual enterprise level as well as at the policy level of a country.

An enterprise that wants to reach out into new markets is strongly advised to analyse language, legal and cultural requirements necessary to perform well on the target markets.

This new market could well be domestic – for instance targeting a given migrants community (e.g. by ethno-marketing).

Instead of making the effort&investment for gaining another 20% of the domestic market, an enterprise may well gain a substantial share in – let us say – four external markets with the same effort&investment. (See: SMEs go global- Steiner, 2003)

CELAN D2.1_fv1.1

12

The LI as a whole shows high two-digit growth-rates because the demand is growing exponentially in the course of globalization, which triggers localization.

Language technologies (LT) and language technology tools/systems (LTT) were the key to the efficiency increase in language services (LS) and the emergence of more (as well as some large and very-large) language service providers (LSP). The ICT would not have been successful to the degree we can see without the LT.

It should not be overlooked that the LI products and services also represent a cost factor – and an increasing one, if not controlled. Financially strong large-scale industry can afford to develop and apply any kind of LTT and LCR as well as develop language services internally or use any kind of LSP through outsourcing. SMEs are under heavy financial constraints and subject to fierce competition. In most cases they do not have the capacities to investigate how the state of the art of LTT, LCR and LSP would benefit their enterprise. In many countries one-person enterprises (OPE) are already the majority – for them the higher levels of LI products and services are out of reach in terms of background information, cost, technical complexity and competences and skills concerning information and communication technologies (ICT). Large-scale enterprises also have the financial potential to employ or train the human resources – often through high-level R&D activities. In addition, their constant business intelligence activities make it less risky for them to take decisions concerning future developments, than it is for SME (or even micro-enterprises and OPE). Therefore, CELAN focuses on SMEs (including micro-enterprises and OPE) without neglecting large-scale industry. Other studies, such as PIMLICO (Hagen, 2011) focus on language competences/skills in trade and industry. Concerning language use in trade,

“it is clear from the PIMLICO case studies and the ELAN Report that English is, and continues to be, the dominant language of global trade, but it is not exclusive and our educationists should plan for a multilingual global trading environment. For example, English proficiency is now seen by business more as a generic skill much like computing skills or numeracy which people in international trade are assumed to possess. It is also recognised that certain sectors use only English in all their trade dealings, e.g. biotechnology and the aeronautical industry. For a long time, its use has moved far away from its cultural roots in the Anglo-Saxon world and there is increasing recognition of the emergence of new simplified, or abbreviated varieties, often referred to ’mid-Atlantic’, or ‘off-shore’ English, prompting the wry view of one businessman that the most widely-spoken language in the world is ‘broken English’!” (Hagen, 2011)

According to the ELAN Report (2006) the languages used by European SME for exporting were: English 51%, German 13%, French 9%, Russian 8%, Spanish 4%. Others 15%. This largely differs from the Internet users’ distribution by region. However, whereas most European SMEs in the ELAN report cite English as the primary language used for business communication in major export markets, there is widespread use of other languages as well: … While English remains as important as ever on the Internet, other languages such as Chinese, Russian, Spanish, and Portuguese are becoming comparatively more important. In Eastern Europe, German and Russian are still used almost as often as English as international languages of trade. A Slovenian company indicated it had suffered substantial losses in Europe due to language barriers, but particularly in Spain owing to its lack of language skills. (Hagen, 2011)

CELAN D2.1_fv1.1

13

What can be said for sure is that

The number of learners of English worldwide is, however, likely to peak at around 2 billion in the next decade and the world is becoming increasingly multilingual.

Generally, the PIMLICO companies recognise that excellent English is essential for international trade, but espouse functional multilingualism in their international trade.

This is apparent because of the number of markets where English does not suffice: e.g. trading in Latin America can be impossible without some Spanish (or Portuguese in Brazil); doing business in Russia with only English increasingly becomes unmanageable outside of the main centres of population.

The change of the use of languages in trade by region, by industry branch, by language community etc. is obvious and has probably accelerated;

The world is becoming more multilingual;

Some languages, such as Chinese, Russian, Spanish, and Portuguese are becoming comparatively more important.

All sectors have to cope with these changes, if they are globalized or if they are going to globalize.

CELAN D2.1 assumes that for being successful in foreign markets, such as China, Russia, Brazil, etc. it is crucial to get your website, product description, manuals, marketing and promotion material localized into the target markets’ languages. For this, industry needs a multitude of LSP properly using LT and LCR. LT and LCR are indispensable for LSP to achieve good results. This role of the LI – with its products and services – is grossly undervalued in the public eye.

CELAN D2.1_fv1.1

14

2 Methodology According to the description of CELAN WP2 Task 1 (i.e. this Deliverable D2.1) the subtasks

were:

“T 2.1 Desk research on business-relevant language policies/strategies, language training and assessment, language technology tools, linguistic/language services, language and other content resources, guidelines and standards, and consultancy services with a view to producing a catalogue of such services, tools, and resources. In addition, the task will seek to indicate the current uptake of respective resources within the business community. … /and include/ the identification and evaluation of pertinent standards.”

Thus, the planned main results of CELAN WP2 Task 1 were to come up with an annotated catalogue of business-relevant:

Language technology tools (LTT),

Language and content resources (LCR),

Language services and language service providers (LSP),

Guidelines and standards,

Language policies and strategies (concerning LTT, LCR and LSP as well as standards).

In parallel, they indicate the current uptake of respective resources within the business community. The sub-contract for the identification and evaluation of pertinent standards was properly carried out under the guidance of Infoterm.

The investigation of business-relevant language services, tools and resources and the preparation of interviews with industry were carried out in parallel under frequent consultation of experts in pertinent industry sectors and academia. It soon became clear that the two tasks had to be carried out in close coordination. In this connection the sloppy use of terminology in the ICT sector proved mind-boggling in the beginning – for instance, what is the difference between a program, software, tool, software package, suite, solution etc.? The investigation started off with a rough categorization and collecting information on LI products and services:

1 LT&LTT – LANGUAGE TECHNOLOGY AND LANGUAGE TECHNOLOGY TOOLS For the purpose of WP 2 the LT&LTT were, in the first phase, subdivided into: 1.1 Translation technology & translation technology tools 1.2 Text technology 1.3 ST&STT – speech technology & speech technology tools 1.4 CMS – content management systems 2 LCR – LANGUAGE AND OTHER CONTENT RESOURCES 3 LS&LSP – LANGUAGE SERVICES AND LANGUAGE SERVICE PROVIDERS For the purpose of WP 2 the LS&LSP were, in the first phase, subdivided into: 3.1 Interpretation services 3.2 Translation services 3.3 Localization (L10N), globalization G11N), internationalization (I18N) services 3.4 LT&T – Language teaching & training

It was soon found out that the range of LI products and services had substantially differentiated beyond this categorization over the last few years. The identification of such products and services produced such overwhelming amounts of products (LTT and LCR) and services (and the respective LSP) that a complete registering was out of the reach of the CELAN Project. On the basis of the investigations one can estimate (nota bene on the market):

More than hundred LT developers (sometimes marketing several LTT with several

CELAN D2.1_fv1.1

15

versions, levels, releases, combinations each with high development dynamics) adding up to more than thousand LTT,

More than ten thousand LCR of all sorts, Probably well over hundred thousand LSP in Europe (micro-enterprises and OPE

included). Figures about LSP are difficult to investigate for several reasons:

New types of LTT developers are not yet organized in professional associations, but seek to find business in existing ones;

Individual LSP are often translator and interpreter, or translator and localizer, or localizer and technical writer, etc.;

SME among LSP struggle to offer all kinds of services in order to survive and compete in this market.

The traditional publishing sector has not been included in this investigation, although it obviously is also making more and more use of LT. In some cases encyclopedia or dictionary publishers “reinvented” themselves as online publishers of LCR. In the course of the investigations, the best indications of the degree and ways of uptake of LI products and services within the business community came from the interviews with LT developers and LSP, who rightly were considered to know their markets better than anybody else. This was taken into account in the course of the surveys and interviews resulting in CELAN D2.2. The sub-contract for the identification and evaluation of pertinent standards was accordingly carried out under the guidance of Infoterm. Its results were compiled in CELAN D2.1 Annex 2: Investigation of business-relevant standards and guidelines in the fields of the language industry, and are summarized in this Deliverable. The investigation revealed that there is a close connection between standards and certification, why certification was included here. Business-relevant language policies/strategies, which often also refer to standardization and certification, proved to be so significant that it justified a detailed chapter in this document. (see: chapter 8) Language training and assessment – excluding that occurring in formal public education – was recognized as one of the thriving businesses under language services and is addressed in clause 5.6. Consultancy services are dealt with at several instances of this Deliverable due to the growing demand for them among LSP and industry&business customers alike. On the basis of a mind map of LI products and services (Appendix 2) which had been already developed in cooperation with LI experts, a number of preliminary consultations were carried out. They ultimately led to the final version of the CELAN Typology (CELAN D2.1 Annex 3) and to the Language Industry Supply-side Questionnaire (Appendix 3), both major conceptual and design input into the navigation tool. In addition, the contacts established in the course of the consultations and preliminary interviews with LI and other experts were used to conduct additional in-depth interviews concerning the uptake of LI products and services (as well as of standards etc.) by industry&business. For these interviews a “free format” was used. (Appendix 4) Some of the results of these interviews were also taken into account in various parts of this Deliverable. The CELAN Typology was found most appropriate as the structure behind the navigation tool to guide users to the information they are looking for. In view of more than thousands of individual language technology tools/systems (LTT), probably more than ten thousand language and other content resources (LCR) and about hundred thousand language service providers (LSP) in Europe, it proved to be the optimal meta-catalogue for structuring and evaluating the great amount of LI products and services. Therefore, the CELAN Management

CELAN D2.1_fv1.1

16

Committee decided on 3-4 December 2011 to take it as the foundation for the CELAN navigation system (WP4). In the weeks thereafter the CELAN Typology was further developed in such a way that it could provide hyperlinks between the boxes (i.e. categories) and links to external sources of information. It was tested again by experts in academia and enterprises. Later it was supplemented with examples of best practice and success stories. Thus, the final version of the CELAN Typology – meeting with the approval of academic as well as of industry&business experts – served as first major input into WP 4. It has the following overall structure (see: CELAN D2.1 Annex 3 Typology of LI products and services:

1. Language technologies (LT) &language technology tools (LTT)

a. Translation technology b. Text technologies c. Terminology management systems (TMS) d. Speech technology (ST)and speech technology tools (STT) e. Some kinds of content management systems (CMS) f. Language teaching/learning systems

2. Language and other content resources (LCR)

a. Terminological data and similar b. Lexicographical data and similar c. Other kinds of structured content online d. Unstructured content

3. Language services (LS) & language service providers (LSP)

a. Text creation, editing, re-purposing b. Translation services c. Interpreting services d. Localization (L10N) services e. Desktop publishing (DTP) services f. Language teaching and training services g. Language industry consultancy services h. Communication services for persons with disabilities (PwD)

4. Standardization, certification and language policy

a. Standardization b. Certification c. Language policy

For the sake of coherence and consistency, CELAN D2.1 Annex 1 Overview on the language industry (LI) products and services was prepared in line with the CELAN Typology.

CELAN D2.1_fv1.1

17

3 Evaluation of language technologies (LT) and language technology

tools/systems (LTT)

Due to the pervasive marketing slang in the LI, next to no distinction can be made between systems, tools, suites, solutions, platforms, modules, work benches, etc. Price indications do not help either, as there are complex open source software (OSS) tools/systems available free of charge or at low cost, while some comparatively low-complexity tools/systems are offered at high price on the market. In addition the terminology to describe these tools/systems is fuzzy and may range from flowery expressions, via all kinds of euphemisms to outright misleading performance indications. Because of the high degree of fragmentation into too many categories in thousands of business fields, it is impossible to find a representative sample of enterprises being customers/users of LI products or services. Most enterprises using LTT probably do not even know they are using it – and certainly not under the name of LI. Considering their closeness to the market, it was decided to interview the LI enterprises and survey through them the market-uptake of LI product and services by industry&business at large. This in fact proved to be successful so that the CELAN Typology represents the “meta-catalogue” for the evaluated LI products and services. (see: CELAN D2.1 Annex 1) Undeniably, there would be no language industry (LI), if there were no language technologies (LT) with their language technology tools/systems (LTT) which emerged out of the field of Human language technology (HLT – or natural language processing, NLP). According to the CELAN Typology the Language technologies (LT) and LTT are subdivided into:

a. Translation technology, b. Text technologies, c. Terminology management systems (TMS), d. Speech technology (ST)and speech technology tools, e. Content management systems (CMS), f. Language teaching/learning systems.

The focus of LTT – especially in commercial development – is not primarily on technology, but on the applications they shall support. If applied vertically LTT can be an add-on for pure business software, if properly designed with integration and interoperability in mind. As customers have many different ideas about their own needs – and more often than not they do not know their language- or LI-related needs well – LT suffer from an array of disadvantages e.g. in terms of preconceptions and lack of awareness on the market. Nevertheless, the LI market is growing fast.

Due to the fact that applications increase in terms of volume and types, there are many complaints about the cost, quality, capability for integration and interoperability, problems of content interchange etc. In addition, LTT are often language-dependent and nearly always needs adaptation. Domain adaptation for quality improvement encompasses higher costs. However, market demands that LTT should be usable immediately (in as many languages as possible) and to be easy to adapt and integrate AND at the same time be less expensive. This is a challenge, which can only be solved by increasing the development efficiency of LTT on the one side, and the productivity of LTT applications, on the other side. This is particularly demanded by LSP enterprises which are under pressure from their customers in industry&business.

The demands and requirements from the customers (or internal ICT departments) concerning system and content integration have triggered the need to combine some (potentially all) LTT under the emerging GILT conception. GILT originally comprised globalization (G11N), internationalization (I18N), localization (L10N) and translation, to which

CELAN D2.1_fv1.1

18

text technologies and desktop publishing (DTP) systems/tools and services have to be added. (see: chapter 5) Although LTT are of growing importance in the ICT industry and in industry&business at large, they are not sufficiently represented in the education and training of higher education institutions (HEI).

3.1 Translation technology

Translation technology comprises (see: CELAN D2.1 Annex 1):

Machine translation (MT) systems,

Computer-assisted translation (CAT) tools/systems,

Localization (L10N) systems,

Translation/localization project management systems (TL-PMS). Common to translation technology is that at least one source text in given language is translated into a target text. In combination with other LTT, translation technology is heading for multilingual applications – beyond the original language-pair orientation. All types of translation technology can be further subdivided into sub-types – often representing

re-combinations of some of their features, functionalities or modules,

combination of their features, functionalities or modules with those of other LTT. In many/most language services one can find functionalities or modules of translation technology, even in those originally designed for monolingual purposes or for non-linguistic purposes. In any case, translation technology is among the fields of LT showing the fastest development cycles. Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use. Current machine translation software often allows for customization by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardized text. In general the use of a MT system is costly, whether the system is purchased, leased or self-developed – not to mention the customization necessary for each application environment. Pre- and/or post-editing may require the employment of specialized experts, the respective training schemes and continuous training programmes. Therefore, in general the use of tailored MT systems is not feasible for SMEs. This applies less to some dedicated MT systems, for instance, for structured content in databases, such as for product catalogues. However, translation and localization services normally use for such purposes specially adapted translation memory (TM) tools. On the other hand, there are a few types of SME, which make extensive use of MT services offered. This will be covered under MT services. Some advanced computer-assisted translation solutions include controlled machine translation (CMT). This type of technology is widely known amongst professional translators and terminologists and also available to any individual translators who wish to invest in such technology. Higher priced MT modules generally provide a more complex set of tools available to the translator, which may include terminology management features and various

CELAN D2.1_fv1.1

19

other linguistic tools and utilities. Carefully customized user dictionaries based on correct terminology significantly improve the accuracy of MT, and as a result, aim at increasing the efficiency of the entire translation process. A viable application for MT is content scanning, that is, using a translation system simply to obtain a rough draft so as to be able to get the general gist of a text. MT is widely used in the European Commission for this purpose, for example, and it is widely used on the Internet. For the purpose of this overview MT (incl. interactive machine translation – IMT) has been separated from computer-assisted translation (CAT) and localization (L10N) and refers to the software used for MT. Online machine translation offered through the Internet is covered under services. On 24 March 2011 the European Patent Office (EPO) and Google have signed a long term agreement to collaborate on machine translation of patents into multiple European, Slavonic and Asian languages. Similar efforts are undertaken in East Asia among others within the framework of the Advanced Industrial Property Network (AIPN). Besides, MT is already quite common in the field of trademark application. It can be expected that multilingual approaches to MT will bring about significant advances in MT. Furthermore, it will facilitate the access to the world’s IPR (intellectual property right) system for SME and make it more affordable for them. According to Rinsche/Portera-Zanotti (2009), TM tools (alignment tools, bilingual and multilingual databases, storing information by segments) remain the leading technology in terms of efficient production. Apart from TM tools, software user interface localization tools are commonly used in localization projects. Localization industry has blossomed by the development of suites of tools which improve production in terms of time and quality. In multilingual projects these types of tools have become more sophisticated as long as the systems are improved and the global market grows. There is a growing market for developing plug-ins or connectors to add and improve other existing LTT. Localization is also at the heart of mobile application development. Development trends: (1) Experts say that development goes in a direction where MT will be freely or at low cost available either as a module of LTT or as a functionality in many online services. For the time being only large enterprises will really need (and have the resources to develop or adapt at considerable cost) MT for a specific purpose or for an array of purposes. But the situation is changing rapidly – use of MT is increasingly coming into reach of any organization. There is a tendency to have CAT tools feed into MT which then is used to improve CAT tools productivity. Besides, new advanced MT systems (e.g. the MOSES statistical machine translation system) are becoming easily accessible to any enterprise, incl. SME. (2) There is an increasing and positive tendency in large organizations towards the automation of translation related tasks and the use of CAT tools, which are performing better by the day. (3) Tools/systems belonging to translation technology can be combined with text condensation, text analysis, social media content, content management systems (CMS) etc. for information retrieval and analysis or other purposes. Nearly all high-end LTT or their combinations comprise translation technology – in particular translation memory (TM). (4) Machine translation – e.g. in combination with speech technology – is going to become a common feature of mobile devices (often in combination with assistive technologies complying with the requirements for persons with disabilities, PwD). Uptake in industry&business: (1) Large-scale industry mostly either develops or uses translation technology or outsources the language services. Some use highly sophisticated combinations with different kinds of content management systems (CMS) for business intelligence and other purposes.

CELAN D2.1_fv1.1

20

(2) Most exporting SME use – or will sooner or later have to use – translation technology either themselves or through language service providers (LSP). (3) Next to all LSP use translation technology in one way or other. Some of the large ones offer as a service sophisticated combinations with CMS for business intelligence and other purposes. (4) For micro-enterprises or OPE among the LSP translation technology is on the one hand, more and more an absolute necessity, but on the other hand, a big technological barrier due to the lack of interoperability of systems. Standards and certification: There are competing standards in the field of translation technology, which makes it difficult especially for LSP to work with them. (see: CELAN D2.1 Annex 2 Chapter 3) Given the increasing call for integratability and interoperability of translation tools/systems, standards are developed with the aim to bridge interoperability gaps. Certification of systems for standards-compliance will probably become reality within the next five years. Additional gaps: (1) The quality of translation technology largely depends on the quality of available content. On the one hand, more and more quality content is emerging, on the other hand, it still lacks standards and the application of standards to make content useable, i.e. interoperable, independently from individual systems. (2) Systems to more or less reliably measure the cost effectiveness of translation technology standards for inter-lingual comparable word-count and for checking the quality of the translated texts are still missing. (3) For micro-enterprises or OPE among the LSP translation technology is a big technological barrier due to the lack of interoperability of systems. (4) Test programs and benches to validate the effectiveness of different MT systems are not easily accessible.

Recommendations: A way of taking advantage of MT systems is to combine them with the knowledge of a human translator, e.g. in the form of interactive machine translation (IMT), or to combine them with various pre-editing and post-editing approaches.

3.2 Text technologies Text technologies comprise in particular (see: CELAN D2.1 Annex 1):

Scientific and technical writing and the respective authoring tools,

Technical documentation (TD) and the respective TD systems,

Corpus technology (CT) and the respective tools/systems,

Desktop publishing (DTP) and the respective tools/systems. All types of text technologies can be further subdivided into sub-types – often representing

Re-combinations of some of their features, functionalities or modules,

Combination of their features, functionalities or modules with those of other LTT. Even dedicated learning authoring tools exist. In many/most language services functionalities or modules of text technologies are used, even in those designed from the outset for multilingual purposes or for non-linguistic purposes. Professional authoring systems/tools, TD systems/tools, and DTP systems/tools, include text lay-outing and graphics functionalities to an extent which exceed those of office automation software by far. Some also cover the functionalities of professional pre-print software. More and more they are shortcutting the process into professional printing – thus, they are also used by translation and other language service providers (LSP) to finish the results of their services for publication. These systems/tools have to cope with all kinds of symbols,

CELAN D2.1_fv1.1

21

formulas, graphics and other non-linguistic elements – not to mention high-quality and high-resolution color graphics or photos. Increasingly, they are also required to be capable of handling different languages and scripts. Some scientific publishers offer their authors templates on the basis of advanced authoring tools including functionalities of text lay-outing and extended graphics, sometimes with features of professional pre-print software in the background. This considerably reduces the time from text creation to publication from a technical point of view by avoiding revision and proofreading. Authoring systems/tools and TD systems/tools are often combined with controlled natural language (CNL) approaches and tools in order to support style checking and other text quality control processes. Corpus technology largely focuses on text corpora and speech corpora or on combinations of both. It can be applied for a number of purposes. In fact certain functions and features of corpus technology are also used in different kinds of applications such as in: word processors, translation memory modules/systems, database programs, web browsers, editing programs, and communication programs. Although most of the major office automation software today can handle foreign languages and their scripts to an amazing degree, text technologies developed for certain languages and their scripts or fully localized versions of such software still perform better and with more functions for the native and non-native user. However, features and functionalities of some text technologies (such as DTP) are gradually integrated into existing high-level office automation software so that even non-experts of text technology are enabled to use them. This puts constant development pressure on the commercial developers of such software and poses investment risks on the LSP side. Development trends: (1) The tools/systems belonging to the text technologies can be combined with or are converging with (desktop) publishing systems/tools. Corpus technology has become indispensable for browsers and other tools in the Internet. The efficiency of text technologies often depends on the availability of good language and other content resources (LCR). (2) Some text technologies (and DTP) – adding multilingual features and functionalities – are converging towards the GILT (globalization, internationalization, localization and translation) approach and technology development. Uptake in industry&business: (1) Large-scale industry use sophisticated text technologies if they have enough volume or other reasons to justify the cost for software and specialized human resources involved. The same applies to larger LSP. (2) On the other hand, quite a number of large-scale enterprises and larger LSP ask specialized LSP to carry out the respective tasks for which they need highly sophisticated software and the respective human capacities. (3) For most LSP being micro-enterprises or OPE sophisticated text technologies are out of reach. This can imply that they are driven out of certain segments of the LSP market. Standards and certification: There are competing standards and gaps in standardization with respect to interoperability and integratability in the field of text technologies, which makes it difficult even for highly professional and specialized LSP to work with them. (see: CELAN D2.1 Annex 2 Chapter 2 to get a general panorama about some pertinent standards) Additional gaps: (1) Progressively controlled natural languages (CNL – or simplified natural languages) are used to increase the degree of consistency and coherence, but the respective systems still lack maturity and standards.

CELAN D2.1_fv1.1

22

(2) More and more quality content is emerging which can be used in support of text technologies, but it still lacks standards and the application of standards to make content useable independently from individual system. Recommendations: Text technology today is no longer out of reach for SME. Using authoring tools for writing scientific-technical texts considerable reduces the time from text creation to publication. Web CMS have become state-of-the-art. Technical writing and TD often use “controlled natural language” (CNL – or simplified language) approaches and tools to control linguistic variation for the sake of clarity and understandability. Developers of TD systems are gradually including functionalities of professional text lay-outing and extended graphics, even features of professional desk-top publishing software. This reduces the time from the creation of the documentation to the delivery considerably.

3.3 Terminology management systems (TMS)

TMS belonged to the first market-ready products of the LI. On the one hand, they have emerged as a support for large-scale terminology work in organizations. On the other hand, they allow in-house translation or localization services or commercial translation or localization service providers to manage their terminology centrally and systematically in database form – e.g. as terminology modules of a computer-assisted translation system or localization system. (see: Schmitz/Straub 2010) There are large-scale TMS only for linguistic purposes and other TMS which include also non-linguistic kinds of concept representations. Sometimes these systems become key elements of a language strategy. Governments and industries use TMS for harmonizing terminology as a goal in itself or for supporting translation, such as IATE, the inter-institutional terminology database of the European Union. (see: http://iate.europa.eu/) Nowadays, virtually all universities with translation studies in some European countries train their students in the use of TMS systems. Many translators’ associations are offering training in the application of TMS systems. This is often done in cooperation with the respective developers or their distributors. Therefore, the proper use of TMS can be considered as a state-of-the-art competence of young graduated translators today. Today, Terminology tools are used by professionals as standalone products, products integrated to other systems or translator workbench solutions. Term extraction tools are developed based on two approaches: linguistic (usually to work in a single language) and statistical. Different aspects of these approaches are combined to develop further term extraction tools. However, the user´s purposes vary so much that the degree of sophistication of these tools has become disputed on the market. The localization industry has contributed to the development of additional terminology tools. Advanced search engines cannot be effective for professional purposes without terminological data. The fact that search engines list websites according to certain priorities, which involve defining key terms and repeating them systematically as well as ad words bought to increase visibility on the Internet, also contributes to increased awareness of terminology work and tools. Beside, these terminological and other linguistic data can be used for a better performance of online machine translation. Quality management and multilingualism has contributed to the growing demand for terminology tools in industry&business. Compared to the positive effects of an appropriate terminology management in view of an enterprise-wide corporate language (CL) or in non-linguistic applications (such as in parts administration) clearly makes up for the cost of TMS and terminology work in an enterprise.

CELAN D2.1_fv1.1

23

Development trends: (1) The more quality content – especially structured content – is available from large-scale commercial or non-commercial content providers, the more the present day TMS or terminology management modules might become obsolete or change function. Large-scale TMS may converge towards CMS or develop into a particular type of CMS for multilingual structured domain content. This is shown by the emergence of web-based cooperative content development platforms (CDP) for some kinds of content, such as for multilingual product classifications, which are overcoming the limitations of today’s terminology systems. (2) The development of search engines on the one hand and the need to include language and content resources (LCR) in mobile tools and applications will bring about new developments in the field of terminology management.

Uptake in industry&business: (1) Large-scale industry use sophisticated TMS if they have enough volume or other reasons to justify the cost for software and specialized human resources involved. The same applies to larger LSP. (2) An increasing number of large-scale enterprises and larger LSP ask specialized LSP to carry out the respective tasks concerning terminology management (and other language and content resources) for which they need highly sophisticated software and the respective human capacities. (3) For most Micro-enterprises or OPE among LSP sophisticated TMS are out of reach. This can imply that they will be driven out of certain segments of the LSP market. Standards and certification: The field of TMS, also called computer-assisted terminography, is one of those in the LI where the first methodology standards were developed. In principle the potential for a high degree of interoperability is given, but as the existing TMS allow very much freedom in adapting the system to the needs of the individual user, it requires considerable organization-internal standardization efforts to achieve interoperability of system and content within and across organizations. Otherwise, the exchange of data between individuals may soon become next to impossible or senseless due to a lack of reliability of data. The “ECQA Certified Terminology Manager” can be considered the first attempt to certify skills and competences in this field. (see: CELAN D2.1 Annex 2 item 3.2) Additional gaps: (1) More and more good content is emerging which can be used in support of text technologies, but it still lacks standards and the application of standards to make content useable independently from individual systems. (2) In order to make content useable independently from individual systems, computer-assisted terminology workflow and data quality management has to be developed and the respective methods to be standardized (with certification in mind). (3) Terminologists still have to understand that other linguistic and non-linguistic data entities (such as proper names, icons etc.) can be as important as terminological entries. However, TMS would be best suited to record, process and maintain them for an array of purposes. (4) Many companies still use spread sheets for their terminology data gathering and exchange needs. (5) Even more so than “translation”, terminology is considered only as cost factor to be avoided as much as possible. Recommendations: TMS are most useful for harmonizing the company terminology (e.g. for the sake of a coherent corporate language), enhancing its quality and for supporting any multilingual activity. As terminology modules of a computer-assisted translation system or localization system, TMS allow translation service or localization service providers to manage their

CELAN D2.1_fv1.1

24

terminology centrally and systematically in database form as part of the respective LSP language policy. The use of TMS contributes to enhance the technical communication in the company so that content and technical documentation is accurate and consistent. TMS drastically Increase productivity by shortening document revision time. TMS could easily be further developed into a sort of language-oriented CMS for all kinds of structured content including terminology. This is well understood by several LI experts, but – while it is a hidden need – there is no “market” for this in terms of funding.

3.4 Speech technology (ST) and speech technology tools (STT) STT were designed to respond to or duplicate the human voice. It started with speech recognition and speech synthesis and later branched out into different applications. They are used for aiding the voice- or hearing-disabled as well as the blind, communicating with computers without a keyboard, marketing goods or services by telephone, enhancing computer games etc. They comprise speech recognition and speech synthesis tools, high-speed speech transcription and dictation tools, speech compression and manipulation, voice access to information, up to innovative systems, such as video rewrite and other dubbing systems. In research and development in the field of speech technology often large-scale speech corpora are used – for instance in the development of instant automatic interpretation (e.g. for military applications, which will sooner or later find their way into general applications). The uses of speech technology today comprise among others (see: CELAN D2.1 Annex 1):

Communicate with computers without a keyboard,

Market goods or services by telephone,

Aid the voice-disabled, the hearing-disabled and the blind,

Enhance game software,

Increase productivity. In this connection some features of speech technology are already widely used in consumer electronics and other low-end devices. However, the first instant interpretation software has appeared on the market. According to Global Industry Analysts (see: Myron 2012) "Organizations [that] intend to differentiate themselves in a crowded business environment, through improved automated customer care, are increasingly deploying speech recognition systems." As a result, the worldwide market for voice/speech recognition systems and software is expected to reach $69.4 billion by 2015. This growth is largely due to investments in multimodal technology with talk, touch, and type interfaces; analytics for better quality assurance in the contact center; voice security, especially from healthcare companies gearing up for government-mandated electronic health records; and, of course, mobile voice search, where Apple has made speech technology cool with the release of its iPhone, which includes Siri, the speech-enabled personal digital assistant. So far the biggest application was in telephone answering services either by a user organization itself or by a call center service contracted by the organization. The improvement in call-completion rates in speech recognition applications is a consequence of three forces: (a) higher recognition accuracy, (b) better application design, and (c) greater customer adaptation. Recognition accuracy of acoustic models has been trending upwards for several years now, resulting in better performance for both native and non-native

CELAN D2.1_fv1.1

25

speakers across both mobile and VoIP channels. As sales of mobile devices climb, more people will use mobile voice search capabilities to locate nearby businesses of interest.

Development trends: (1) In conjunction with the integration of multimodal technology into ICT at large with among others

Talk, touch, and type interfaces,

Analytics for better quality assurance in the contact center,

Voice security, especially from healthcare companies gearing up for government-mandated electronic health records,

Mobile voice search, STT (including its manifold HLT and content aspects) will become a driver of ICT development. Lately the first instant interpretation software has appeared on the market – still with limited usefulness, but the development direction is clear. (2) By adding voice technology capabilities, the virtual (i.e. mobile) office has finally come of age. It frees companies from computers and keyboards, allowing work to be done anywhere, at any time. One of the key factors driving the growth of the speech industry lies in the fact that speech is the only modality that can provide a consistent user interface across all devices. Uptake in industry&business: Industry&business uses well STT in connection with telephone answering tools/services often down to smaller SME. Telephone answering services have upgraded their STT to a degree unimaginable 10 years ago. Over the phone (or via teleconference) interpreting is a booming business, obviously meeting the needs of enterprises and other organizations for interpreting especially short-time and at short notice. Standards and certification: There is a trend in the speech technology industry to develop specialized products and applications for niche segment, such as customized speech transcription software for legal and healthcare industries. This implies that the increasingly important interoperability requirements are rarely met. As business processes being handled by answering services become more standardized, one will expect to see turnkey applications for tasks that are structured, common, and take short time to execute. This will trigger the need for new or extended existing standards, standards-compliant speech corpora and the respective certification schemes. (see: CELAN D2.1 Annex 2 item 2.3.4) Additional gaps: (1) The importance of establishing standards for the speech paradigm is similar to that of HTML for the Web paradigm. Since the general Web services inevitably touch on a wide range of software, hardware, client/server, PDA, and telephone services, (not to mention content) a standard way must be adopted to add speech to Web services and applications. The development of open standards supported by the industry would be a key enabler of making speech mainstream. In this way, duplication of development work performed for different environments can be effectively avoided and the investment in deploying speech recognition can be preserved. (2) In computer-assisted language learning (CALL) early native speakers and foreign language learners make a constantly evolving set of mistakes as they develop an ear for the language and get used to putting phonemes together in combinations they have never mouthed before. Something like a solution that would work for learners of any background through these stages of learning, and would work across 25 languages is needed. (3) There are many examples of sentences that sound the same (within one language and between languages), but can only be disambiguated by an appeal to context, Therefore, the

CELAN D2.1_fv1.1

26

"understanding" of the meaning of spoken words regarded by means of natural language understanding is the next barrier to be overcome. Recommendations: (1) It is necessary to continue investing in research and development to bridge the recognition performance gap between human and machine, and, in particular, to invest in novel approaches with the potential to create breakthrough progress. Dealing with conversational speech in a natural, free style and making systems robust in all possible practical acoustic environments are two of the most critical technical challenges. (2) With more recent advancements in customer-facing technology solutions, mining calls for specific content and spoken phrases is easily available, affordable, and now within reach for different sectors. Businesses can now discover critical customer intelligence without hardware or software installations, and can implement an automated market research tool at a much more affordable cost.

3.5 Content management systems (CMS)

For the purpose of CELAN only those CMS which are primarily processing language data and LCR, such as web CMS, are considered as LTT. However, language technologies play a critical role in many CMS systems – e.g. for authoring assistance, document indexing, search, version control, summary generation or the automatic translation of the content into multilingual versions. That is why CMS have been considered in the CELAN framework – not least because they provide employment opportunities for language and language technology experts. On the other hand, content management is a business world of its own with its own approaches and business models. The industry leading web content management solutions provide an eBusiness platform that can streamline business operations and provide solutions for supporting sales, marketing, public relations, human resources, collaboration, and customer care. The Web CMS industry’s current rise in mobile projects and requirements is clearly just the beginning of a larger trend that will continue to shape the industry. Besides, open source software (OSS) initiatives (such as Drupal with a large user base) are emerging which offer flexibility regarding upgrading and low entry level. These are requirements meeting the needs of SME in particular. However, although there are no or modest license fees the cost for full-fledged implementation may be similar to those of commercial solutions. CMS hosting is usually expensive, from the point of view of software maintenance, data maintenance (even well coded dynamic websites will take up more server CPU and memory than the exact same website produced with HTML code), CMS being extremely content-hungry, proper management of shared hosting resources and server maintenance (requiring high-level technical support), the requirement of licensing multiple products (at least for complete solutions, having complicated architectures with multiple modules resulting in heavy infrastructural demands including multiple different database types), new versions keeping on coming all the time, hackers finding out ways to exploit the popular CMS (which requires to keep on updating at least once in 6 months), etc. Because of the complex nature of CMS – integrating or combining different tools/systems under a roof architecture – keeping track of several licenses of different kind for an array of tools/systems, adapting possibly the whole architecture in case of a new release of one of the components, etc. involves costs, which are possibly not affordable to SMEs whether in industry&business or LSP. In addition to all the foregoing, CMS are – because of their complexity – comparatively vulnerable to hacker attacks and other kinds of un-friendly invasion. That is why ICT security issues rank very high with users of CMS. High-end CMS, such as enterprise CMS, are still out of reach of most SME. But some language-oriented CMS, such as web CMS, would deserve to be more widely used.

CELAN D2.1_fv1.1

27

Development trends: (1) The need for managing and understanding information – already more than 99% in digital form! – in order to effectively impact job performance, will continue to grow exponentially. Therefore, content management systems (CMS) will continue to enjoy sales growth as companies deploy new systems or revise current ones. The CMS industry will follow its trend of consolidation by acquisition, merger and partnerships to provide deeper services in managing content. And more workers can be classified as “knowledge workers” whose performance requires efficient access and use of information – and the necessary ICT competences to perform satisfactorily. (2) Organizations related to portal and web content & experience management (WCXM) technologies highlight four trends, one of which is renewed focus on content (driven by mobile, digital marketing, etc.). The “post-PC Web CMS” industry will bring many changes. Web CMS software will continue to evolve by supporting better mobile and tablet delivery – and by aligning (with high return on investment) business intelligence, digital marketing and forms processing solutions. Web CMS software will also join the ranks of the broader sales cloud movement by integrating customer relationship management (CRM), becoming easier to use and implement and delivered as a much more affordable value-priced software as a service solution (SaaS). The authors of the book Semantic technologies in content management systems (Maass/Kowatsch 2012) investigate how Semantic Technologies can further increase interactivity and integration capabilities of CMS and discuss their business value to millions of end-user organizations. Uptake in industry&business: (1) Basic CMS are used in almost every industry by millions of enterprises. In contrast to the 90s, they are no longer used as isolated applications in one organization but they support critical core operations in business ecosystems. Content management today is more interactive and more integrative: interactive because end-users are increasingly content creators themselves and integrative because content elements can be embedded into various other applications. (2) Nevertheless content management growth remains strong with a booming market for applications that promise great cost savings and productivity for companies that invest in them. According to a Deloitte report on Enterprise Content Management (ECM) (2012), content management cuts labor costs associated with authoring and design by 50% in both online and print endeavors. ECM has evolved to provide far more business value than just content management and governance. (3) Since recently a strong development push from PIM system developers to include all kinds of content of the organization can be recognized. (see: clause 5.1) Advanced PIM systems allow enterprises that own staff and customers access – possibly filtered according to their requirements – all product related data anyplace, anywhere and anytime, not to mention any channel especially through mobile devices. This is in line with the general ICT trend in the direction of mobile devices and approaches. Standards and certification: The CMS industry is not fond of standards for a number of reasons, one of it is the very raison d’être and definition of standard as: “a publication that provides rules, guidelines or characteristics for activities or their results, for common and repeated use. … Everyone benefits from standardization through increased product safety and quality as well as lower transaction costs and prices.” (ISO/IEC: 2004) Like cars, CMS depreciate and have to be upgraded in shorter intervals than many other ICT products. Standards would make the system design transparent and bring down prices. (see: CELAN D2.1 Annex 2 Chapter 5. Recommendation 2) Additional gaps: Complete CMS are costly from various points of view. They represent a highly integrated cluster of different systems, which are difficult to maintain, upgrade and secure. Licensing

CELAN D2.1_fv1.1

28

is an option to purchasing a CMS. Often the content or large parts of the content elements handled in a CMS is locked into the respective CMS with reduced content interoperability so that porting it to another CMS is at least very difficult. Recommendations: Buy or rent a CMS? If you buy a content management system, you are responsible for the costs of these upgrades. It is only if you rent a content management system from an application service provider, that the responsibility for upgrades and trouble shooting is no longer a cost that has to be budgeted for. The issue with renting an application is that you are unable to customize the site and this may negate the benefit of having a CMS that fits your needs. There are web CMS which allow non-technical users to make changes to a website with little training. Most web CMS include WYSIWYG editing tools allowing non-technical individuals to create and edit multilingual content. Active users of web CMS software usually receive regular updates that include new features and keep the system up to current web standards.

3.6 Language teaching/learning systems/tools

eLearning is essentially the computer-assisted and increasingly network-enabled transfer of skills and knowledge. It comprises all forms of electronically supported learning and teaching. In general a language-independent approach to the creation and maintenance of content in eLearning systems/tools is used so that any kind of subject can be taught in any language. Content is delivered via the Internet, intranet/extranet, audio or video tape/CD, satellite TV, etc. OECD (2005) suggest that different types or forms of eLearning can be considered as a continuum, from no eLearning, i.e. no use of computers and/or the Internet for teaching and learning, through classroom aids, such as making classroom lecture PPT slides available to students through a course web site or learning management system, to laptop programs, where students are required to bring laptops to class and use them as part of a face-to-face class, to hybrid learning, where classroom time is reduced but not eliminated, with more time devoted to online learning, through to fully online learning, which is a form of distance education. Through the Internet, blended learning is turned into online blended learning.

Learning management systems (LMS) are software packages usually on a large scale that enable the management (i.e. creation and maintenance) and delivery (i.e. knowledge transfer) of learning content and resources to students. Most LMS are web-based to facilitate "anytime, anywhere access to learning content and administration." (see: http://en.wikipedia.org/wiki/Learning_Management_System) Although learning CMS and LMS have some overlapping technologies, the products can be quite different. A learning CMS includes the following functionalities: “Content creation (...), content management (...), collaboration tools, (...), assessments and analytics, search and retrieval (...), formal learning (...), performance support and informal learning". However, the platform as such is usually poor in terms of content. (see: http://wikieducator.org/Exemplary_Collection_of_tools_and_standards_for_producing_open_educational_content)

Although LMS could also cover computer assisted language learning (CALL) they usually do

not. LMS are criticized for being content centric. However, the technology is used for

organizational control rather than the empowerment of the learner. Organizations using an

LMS have a central place to store course material online for access by specified users.

These organizations can track and analyze learning results overtime, and are able to

administer learning evaluations online. (see: Ann Brown and Jordy Johnson, 2007

http://www.microburstlearning.com/articles/Five%20Advantages%20of%20Using%20a%20Learning%20Management%20System.pdf)

CELAN D2.1_fv1.1

29

Language teaching/learning may be supported by different eLearning or technology enhanced learning (TEL) or computer-based training (CBT) systems/tools for various purposes, such as

Web-based learning,

Computer-based learning,

Virtual education opportunities,

Digital collaboration,

Other kinds of eLearning methods and technologies. Although a multitude of software is available at different price levels, only a minute minority of eLearning users in Europe use language training tools/systems, while the vast majority uses all kinds of LTT. (see: Rinsche/Portera-Zanotti 2009) From a teaching and learning theory perspective it is important to separate the content creation process from the specific tool being used to deploy the content, not only to focus attention on the process of creating truly compelling and interactive learning objects, but also to ensure that the content can be easily shared and reused without being locked into a specific tool/system. Through the European R&D Framework Programmes since 1984 the EU Commission in coordination with the member states has done great efforts to finance cooperative projects also in the field of language education and the respective teaching/learning technologies. There were many projects to adapt and implement

Computer-assisted language learning (CALL),

Content and language integrated learning (CLIL)

Other language teaching/learning solutions. Many of these projects follow the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR) guidelines.

The EU Commission stated in 2000 (when nearly 80% of online resources came from the USA) that Europe produces too little of the educational multimedia software, products and services available to serve training and education. It further stated that Europe suffers from a worrying shortage of qualified staff, particularly teachers and trainers with ICT at their fingertips. Globally the e-learning market is approximated to be $50 billion USD, and currently represents the fastest growing segment in the global education market which is valued at $2.3 trillion USD. (see: http://www.myngle.com/blog/2009/10/01/the-language-e-learning-

industry-part-3-language-learning-market-ii) Much of the current USD 83 billion spent today in language training is heavily fragmented and offered in a variety of forms. Within these arenas there are several methods and concepts that are presented as viable and promising ways to learn a language; however there is no clear accepted method to learn a second language. Although the language acquisition market maybe significantly large at $83 billion USD, the current language training spending within the eLearning market is still marginal. (see: http://www.myngle.com) According to another source (ambientinsight.com) the worldwide language learning market (all languages combined) was $58.2 billion in 2011. The overall worldwide language learning market is gradually shrinking due to the adoption of cost-efficient technology-based products and the migration away from classroom and print products. As market figures diverge significantly depending on the source and which aspects of the language learning market are analysed, the figures above should be taken as revealing tendencies rather than accurate figures.

Development trends: (1) Any of the general forms of eLearning can be found also in language learning: learning management systems (LMS) and course management systems (not to confound with

CELAN D2.1_fv1.1

30

learning content management systems). High-end solutions are practically accessible only to large organizations, including universities. But since open educational resources (OER) are becoming more widely available, and many universities are opening their content to the public, individuals can make use of computer-assisted language learning (CALL) at no or little cost. (2) At the lower end many different kinds of systems/tools are emerging and used by enterprises or individual in enterprises to improve their language skills, such as vocabulary trainers and language training courses on CD-ROM. In competition to the expensive classroom language teaching services web-based learning platforms, learning content exchange platforms, social media inspired learning platforms etc. are mushrooming. Some of the web-based multiple language learning platforms are very professional and attract millions of users. (3) In the course of these developments the demand for differentiation according to language level (i.e. language register such as workshop language), target group (e.g. migrants, youths, etc.), domain (such as scientific/technical domain or application subject), and specific purpose (such as speed learning for business negotiations) is growing. (4) Developments in Internet and multimedia technologies are the basic enabler of eLearning, with consulting, content, technologies, services and support being identified as the five key sectors of the eLearning industry. Social networks are more and more becoming an important part of eLearning. (5) Given the developments of mobile technologies, mobile assisted language learning (MALL) has been coined to describe using handheld computers, mobile phones or other mobile devices to assist in language learning. Uptake in industry&business: Only large enterprises can afford to develop or purchase/lease high-end eLearning systems – often within the framework of a systematic language training policy. All kinds of uptake can be found in enterprises: organizing or offering (in-house or external) language learning opportunities using language teaching/learning systems. The awareness for the increasing alternative language learning approaches/technologies and opportunities is comparatively low. Standards and certification: (1) In the educational area IEEE 1484.12.1:2002 (multipart) Standard for Learning Object Metadata (LOM) describes the attributes that learning objects may have, which can also be applied for computer-assisted language learning (CALL). The closely related SCORM (Sharable Content Object Reference Model), a collection of standards and specifications for web-based eLearning, is also a good example for the fact that increasingly several standards have to be applied for a given purpose. Standard-conformant LOM are self-contained units that are properly tagged with keywords, or other metadata, and stored in an XML file format. (2) These standards themselves are early in the maturity process with the oldest being less than years old, certification schemes are still at an infancy stage. Besides, they are also relatively vertical specific, although LOM and SCORM have become widely used over the last years. Additional gaps: (1) There is a lack of interoperability among the plethora of “teaching/learning systems” that have been created without mentioning the literacy required to know these tools and adopt these technologies at a full range. In addition the content developed for and used in one of the systems is hardly re-usable in another system. (2) There is a lack of didactic experience on the side of the system (and educational content) developers and a lack of ICT skills and competences on the side of the professional language teachers/trainers. (3) A multitude of software is available at different price levels, but only a minute minority of eLearning users uses language training tools/systems, while the vast majority uses all kinds of LTT. (see: Rinsche/Portera-Zanotti 2009)

CELAN D2.1_fv1.1

31

(4) Increasingly there are authoring tools and other text technology tools/systems on the market, which meet LOM and SCORM standards. Therefore, content created in tools such as these can be hosted e.g. on a SCORM certified LMS. Recommendations: Today’s eLearning systems/tools geared towards language teaching/learning should be designed to empower the learners on the basis of social principles. Instead of creating more platforms, developers should reuse, repurpose or reorient the existing ones and work collaboratively, so that the urgently needed system and content interoperability can be achieved. LMS are widely used by regulated industries (e.g. financial services and biopharma) for compliance training. …. Many corporate organizations use LMS, as part of training and employee management”. (see: 5 Advantages of using the Learning Management System (LMS). November 24, 2011, by TLTTeam http://www.timelesslearntech.com/blog/5-advantages-of-using-the-learning-management-system-lms/) Ever-improving technology, new laws and regulations, increased job requirements, and a changing workforce are all factors that create an environment where employers must efficiently and effectively deliver and manage learning experiences for their employees.

CELAN D2.1_fv1.1

32

4 Evaluation of language and other content resources (LCR) LCR in the LI are collections of structured or unstructured content published or accessible in electronic form: in databases, on CD-ROM or dedicated devices (e.g. electronic dictionaries), or on the Internet through online access. It is estimated that only 1% of today’s content is conventionally published on paper. The major kinds of online content resources comprise:

Many kinds of structured content, such as terminological data online, lexicographical data online and other kinds of structured content online,

Many kinds of unstructured content. Education was the first field to systematically prepare content to be used as learning material for computer-assisted learning. IEEE 1484.12.1:2002 (multipart) Standard for Learning Object Metadata (LOM) describes the attributes that learning objects may have, which can also be applied for computer-assisted language learning (CALL). The closely related SCORM (Sharable Content Object Reference Model) is about creating sharable content objects (SCO) of online training material that can be shared/reused across systems and in different contexts. (see: http://scorm.com/) LOM and SCORM apply to LO of structured and unstructured nature. General unstructured content, however, is usually composed of many items of structured content of all sorts, which could be reused as LO. There is ongoing R&D concerning methods and technology for the extraction of structured content out of unstructured content for an array of purposed – including eLearning. There following statement is important in view of the fact that potentially any digital object falling under LCR could be re-used/re-purposed as learning object. Although the tertiary sector is beginning to show examples of opening its eLearning systems and content to the public (e.g. in the form of OER, open educational resources):

“The authoring, storage, delivery and reuse of educational content is rapidly becoming a significant problem in the tertiary education sector where significant content is generated for the plethora of courses delivered each year. Effectively being able to manage this authoring process (authoring, storage, delivery and reuse) will offer significant advantages for the tertiary education sector. The challenges being faced in the content authoring process in tertiary education sector can be summarised as follows:

Little or no archiving of content (each lecturer redevelops content).

Tools used are content developer specific.

Content types supported depend on the platform used by each developer.

Important standards are not necessarily supported (i.e. WCAG, SCORM, etc.).

Content is typically recreated for each delivery mode (i.e. PDF, PowerPoint slides, lecture notes, etc…).

Content cannot be updated easily.” (Blackhall, 2011)

4.1 Structured content Structured content refers to information or content that has been broken down and classified in a semantic way, i.e. by using metadata (in the field of terminology called data categories). The metadata themselves also constitute a specific kind of structured content and are or should be maintained in metadata repositories. Structured content usually is stored and processed in database management systems. Today’s best practices in the development, maintenance and distribution of structured content make use of web-based cooperative and distributed approaches, which are (increasingly) multilingual and (less often) multimodal from the outset. If based on international standards, they could lead to content repositories federated according to the language or domain or other relevant criteria. Structured content today is evidenced by many types of information:

CELAN D2.1_fv1.1

33

Terminology o Nomenclature, taxonomy, typology, ... o Glossary, vocabulary, ... o Terminological phraseology, morphology, … o Graphical symbols and other non-linguistic representations o Properties, characteristics, attributes, ... o Proper names of all sorts ...

Thesauri, classification schemes, keywords

Encyclopedic (knowledge) entries

Knowledge-enriched terminology entries

(explained) proper names, ...

Ontologies, topic maps, ...

Directories of all sorts

Metadata registries/repositories However, more often than not the content is contradictory, not coherent and consistent (even within same repositories), not integratable, not reliable, not interoperable between repositories and applications. Nevertheless, the number of online resources is increasing, but quality-oriented LCRs are still comparatively rare. They will increase in order to provide more reliable content to the user, whose efforts to compare different databases to find out quality data can be as painful as with conventional publications.

The variety of such resources, among others including proper names and non-linguistic data which can also – sometimes even must – be used as works of reference, is ever increasing and increasingly accessible on the Internet.

Collection of formulas,

List of names of pharmaceutical substances,

Non-proprietary names of drugs (WHO),

Catalogue of hazardous goods,

Names of laws and other regulations,

Names of organizations and their abbreviations,

Structured directories of producers, service providers etc.,

ZIP (postal) codes and the locations they refer to,

Time zones,

Currencies (and exchange rates),

Graphical symbols,

Traffic signs (in different parts of the world),

The structure of the government and its public administration. Some of the above are connected to text corpora, such as legal information systems, which then can be used indirectly as works of reference. Little recognized new mass markets are emerging for lexicographical resources: more or less all big “national” general dictionaries (Trésor de la langue française, Larousse, Duden, Webster, OED etc.) and encyclopedias (Encyclopaedia Britannica etc.) publishers have developed online versions or even stopped the print publication. Some of them can be regarded as best practice – especially those (like www.leo.org) which link up with others and provide additional data, such as British English or American pronunciation, collocations, etc. The possibly hundreds of existing specialized encyclopedias (e.g. medical encyclopedias) will sooner or later follow the online trend. Many of these resources can be – or could be – used for several purposes if they are/were interoperable. Increasingly the aspect of content interoperability is gaining importance for this reason. It would be a huge waste of human resources to continue developing resources with lots of basically same content entities for eBusiness, eHealth, etc. – especially with respect to eLearning, where all of them could be reused/re-purposed. Some resources are just

CELAN D2.1_fv1.1

34

publications turned into comparatively simple databases, but more and more the usefulness of “integration” (physically if not virtually) and interoperability is recognized. In Europe, the multilinguality of such repositories has become a basic requirement – even for language variants, such as Austrian German or Swiss French. In the field of eBusiness for instance, the Internet provides easy access to lots of hitherto difficult to obtain scientific, legal and administrative content resources (here, too, increasingly including multilingual, multimodal and multimedia data). For example, product information is handled in a structured and consistent way in the form of electronic catalogues by product catalogue management which is part of product data management (PDM). PDM is essential for creating and developing cost-effective means to help customers and channel partners understand what the functionalities and usability of certain products or services are. It includes comparison of product features, advice related products and alternative products and services. Product catalogue information is typically used in websites, mail order catalogues, web shops, enterprise resource planning (ERP) systems, price comparison services, and manufacturer websites. Increasingly product data management – whether integrated in or interoperable with ERP systems or not – is outsourced to specialized LSP.

4.2 Unstructured content Unstructured content covers a broad range of different kinds of semi-structured or unstructured documents (containing text more or less organized into sections sometimes comprising non-linguistic content), or other visual, audio or audiovisual content. This includes music, films, etc. which often have linguistic content embedded or combined with it. Using language technology tools, content elements of unstructured content can be turned into re-usable structured language and other content resources (LCR). Increasingly, unstructured documents are used as parallel texts for translation purposes, as an online service offering text samples, or in translation memory systems, computer-assisted translation (CAT) tools/systems or content management systems (CMS). Examples of unstructured content (also called unstructured data) may include books, journals, internal documents, manuals, health records, audio, video, files, and unstructured

text such as the body of an e-mail message, web page, or word processor document. While the main content being conveyed does not have a defined structure, it generally comes packaged in objects (e.g. in files or documents...) that have themselves structure and are thus a mix of structured and unstructured data, but collectively this is still referred to as "unstructured data". Unstructured content often includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today and most of the Internet data can actually be considered unstructured.

4.3 Analysis of LCR

There would be no language industry (LI), if there were no language technologies (LT) is the common perception. This overlooks that content is as important as technology, otherwise LT would be senseless. Usually this importance of content is underestimated and makes many language technology tools/systems (LTT) useless or language services inefficient. It is increasingly recognized that only “quality content” is good content and interoperable (i.e. reusable and repurposable) content saves costs and increases efficiency. Increasingly LSP and LCR providers are

Offering content maintenance services,

Making their LCR interoperable. But there is still a long way to go: private content owners – nearly all enterprises are also content (=knowledge) owners – still fear that they may lose control over their content considered as their major knowhow, if they let external service providers take care of the

CELAN D2.1_fv1.1

35

maintenance of their content. However, given the exponentially growing amounts of content, enterprises start analyzing the cost of in-house content management against the benefits and risks of outsourcing content management. The savings in costs and time through content integration are exemplified by the following figure from the eLearning field using the SCORM standard to integrate content by making it interoperable.

Figure 5: The Cost of Content Integration

(Taken from: http://scorm.com/scorm-explained/ 2012-10-15)

In technical documentation/communication it was already recognized years ago that not following the basic principles of content management, namely “single source” and “resource sharing” results in high costs rising exponentially with every new language added to the documentation.

Figure 6: Potential cost increase in technical documentation per content item

(Taken from: presentation by Ben Martin 2001)

Figure 6 was intended to back the following calculation. 10 words (costing at that time USD 0.23), if translated into 7 languages and copied manually into 10 publications, would cost USD 161.00. If these same words translated into 7 languages can be reused/replaced automatically in 10 publications, the costs would be only one tenth, namely USD 16.10.

CELAN D2.1_fv1.1

36

Gradually, it is recognized that the real costs do not incur in system development or use, but in content development and even more so maintenance. Effective content creation starts with measures to guarantee quality content, and maintenance should be governed by quality-oriented workflows. Most of the standards to comply with the above-mentioned requirements exist, but are still rarely applied, because content is considered as trivial by LT developers as well as industry&business users. The resulting costs are more often than not hidden undetected somewhere in the balances or worse deliberately concealed from financial control. The above contemplations explain why there is an increasing interest in data and information quality in industry&business and public administration. This meets with the development of web-based cooperative content development platforms (CDP) for some kinds of quality content, such as for multilingual product classifications, which are overcoming the limitations of today’s terminology and other language resources.

Development trends:

(1) On the market there is a growing demand for quality-oriented LCR and web-based content creation and maintenance. There are resources freely available as contribution to society and others where the LCR developer or distributor benefits from advertisement. Some resources can be freely used (under fair use premises), others (such as many sources for business data) only provide access after payment. It can be expected that new business models for structured and unstructured content will emerge. In parallel social web approaches and mobile technologies are spreading. (2) Increasingly there are services directly geared towards the developers and users of LCR, such as:

LTT system development for LCR management and maintenance,

Content development, maintenance or hosting services. (3) Traditional publishers’ business models (incl. certain digital carriers) are threatened by the emergence of online LCR databases. It should be mentioned that the traditional business model for publishing dictionaries and other language resources favors publications with languages of large language communities. This is of great concern to UNESCO reckoning that about half of the more than 6000 estimated languages of the world are in fact endangered languages. (4) In technical documentation (e.g. using DITA for creating content in reusable XML entities), web-based collective authoring, educational publishing etc. structured authoring approaches on the one hand, is making the transition from print to digital publishing faster and, on the other hand, is speeding up the process of content delivery and localization, resulting in lower costs and happier customers. The need for change is driven by the desire to better manage information assets (documents, creative ideas, illustrations, charts, graphics, multimedia, etc.) and eliminate costly processes that fail to facilitate the effective and consistent reuse/repurposing of content. (5) At the same time the demands on quality of content is increasing. Globalization requires a higher degree of multilinguality and multimodality from the outset. Thus, technology – and especially mobile and social technologies – are bringing us nearer to a solution of the time-honored information dilemma, whereby usually information is not readily available when needed and where needed and in which form needed. At the same time content redundancy shall be reduced. The key to this solution are the two principles of content management: single sourcing and resource sharing. The demand for quality, reusability and re-purposability of content is growing. (6) The above-mentioned demands gave rise to content management systems which at the same time provide the ability to reuse and repurpose content – author it once (single source), use it in all kinds of products (articles, reports, text books, standard operating procedures – SOP, etc.) and in all kinds of channels (print, eLearning, eBook, etc.) for all kinds of purposes (training, sales & marketing collateral, regulatory reporting, etc.). Reuse is a common

CELAN D2.1_fv1.1

37

practice in technical communications, where there is great opportunity to reuse specifications and user instruction across the delivery of user manuals, online help, and user training for a software or hardware product. Today, organizations are looking at how this can be expanded across other areas of the enterprise to get the maximum benefit from their content development efforts (and consistency across their branding). Uptake in industry&business: (1) Large-scale enterprises have already entered the path towards developing and using highly sophisticated content management systems (CMS) for their content, of which are beyond the horizons of SME, at present. Larger SME try it with single purpose tools or outsourcing in order to cope with their content problems. Smaller SMEs do not even know that such things like content resources online and technology to manage all kinds of content exist. (2) Recently there are results of activities to produce guidelines, manuals, policy guides etc. for professionally dealing with structured content. Some of them are developed with active cooperation of enterprises as users. Standards and certification:

(1) There are several technical committees (TC) at international level dealing with various – more or less generic – aspects of content interoperability. The standards developed by these TC do not (yet) take into account the specific requirements of eLearning and eAccessibility for persons with disabilities (PwD). (2) Under a quality management perspective ISO 8000 (series) Data quality is a starting point for developing standardized methods and principles with respect to data and information quality as well as content integration and interoperability. In addition there exists quite a number of standards (or parts of standards) referring to the methods for handling structured content. (see: CELAN D2.1 Annex 2 Chapter 3.2) (3) There are thousands of standards at international, European and national levels containing standardized items of structured content – often developed (and increasingly published in database format) by authorities other than the official standardizing organizations. (4) Certification systems for content quality are still at an early stage of development. But certification systems for “certified terminology manager” and the like are welcomed by individual experts as well as enterprises. (see: CELAN D2.1 Annex 2 clause 3.4) Additional gaps: (1) The general awareness for the fact that “content” is not just data, and that content structuring goes beyond syntactical structuring of content – namely semantically structured or identified content – is still underdeveloped. This refers particularly to the ICT experts graduated at the existing HEI or vocational training providers. (2) There is a large gap between existing standards and not so few best practices on the one side and the general information level on these topics in SME. First of all there is also a lack of experienced and vendor- or LSP-independent consultants in this field. Recommendations: There are four basic tenets critical to structuring information:

Defining information types (metadata) in a systematic way;

Identifying rules of content hierarchy;

Creating modular content units;

Applying standards consistently. It would need a promotion campaign for the existing methodology standards as well as standardized structured content, “soft pressure” on LTT developers to respect them and the promotion of as much as possible freely accessible (multilingual and multimodal) structured content.

CELAN D2.1_fv1.1

38

When creating online content for instance in eLearning systems, it is important that the content is modular, allowing students to learn a manageable amount of information while giving them the opportunity to create links between bundles of information. A good content authoring system should thus allow opportunities to link between content modules as well as linking to additional content, thus ensuring that students are able to take advantage of the surplus of information available in other digital and print resources.

CELAN D2.1_fv1.1

39

5 Evaluation of language services (LS) and language service providers (LSP)

Globalization has led to more contacts at any level and in any domain or field of application, which has triggered an exponentially growing demand for

Technical documentation/communication and other text creation, editing, re-purposing services,

Translation services,

Interpreting services,

Localization services,

Desktop publishing services (here as complementary to text-related, translation and localization services),

Language teaching and training,

Language industry consultancy services,

Communication services for persons with disabilities (PWD). Any of these services may use any of the language technology tools (LTT) mentioned in the previous chapter. This gave rise to the GILT conception, which responds to the demands and requirements of industry&business customers and their language service providers (LSP) to integrate the most pertinent systems and services – and increasingly also content. GILT originally comprised globalization (G11N), internationalization (I18N), localization (L10N) and translation, to which text technologies and desktop publishing (DTP) systems/tools and services have to be added. The GILT conception and system integration methods / architectures force managers and user – for their own benefit – to strongly consider the manifold aspects of interoperability up to content (i.e. data and information) quality on the basis of standards.

Figure 7: The GILT conception

In today’s global economy, businesses are faced with cultural and linguistic challenges and needs going beyond language. New requests to LSP may include: global marketing operations, products and documentation; software adaptation and/or localization, website and marketing content localization, multimedia, eLearning and training, machine translation (MT), localization testing, language quality services. Besides, many large LSPs offer industry

CELAN D2.1_fv1.1

40

solutions. (see: http://www.greenbook.org/marketing-research.cfm/five-tips-to-finding-the-right-language-

services-provider-07790) Nowadays, some LSPs offer a complete set of GILT services which include: localization quality assurance (QA) & testing, graphics & multimedia localization, localization process automation, desktop publishing of multilingual documentation, translation & linguistic review, localization engineering, online help engineering & testing, project management, I18N engineering and QA, localization consulting.

5.1 Text creation, editing, re-purposing services

Text technologies comprise in particular (CELAN D2.1 Annex 1):

Technical documentation (TD) and the respective TD systems,

Scientific and technical writing and the respective authoring tools,

Corpus technology (CT) and the respective tools/systems,

Desktop publishing (DTP) and the respective tools/systems. Not all of these correspond to distinct services. CT for instance is used today by many services, but integrated in certain technologies or tools. DTP on the other hand, is offered as a service at different levels of sophistication and specialization. The most prominent services which can be referred to here are

TD (also, especially in the US, called technical communication),

Scientific and technical writing and other text creation, editing and rep-purposing services.

They apply also to scientific writing (e.g. for publication in scientific journals), technical writing, advertising and promotion (if not done under technical communication), media presentations, high-level company documents (board documents etc.), presentations to investors, often quite specific in terms of domain terminology. When in conjunction with quality and safety management discussions concerning industrial production and services in the 1990s the Directive 98/37/EC of the European Parliament and of the Council of 22 June 1998 on the approximation of the laws of the Member States relating to machinery was published, only few people recognized that it will have a huge impact on technical documentation and the related localization and translation services. The Directive referred to technical documentation and user manuals as part of the product which makes an enterprise potentially liable for faults in the documentation (in original language and all localized/translated versions). The revised Machinery Directive 2006/42/EC aims at consolidating the achievements of the Machinery Directive in terms of free circulation and safety of machinery while improving its application. It was published on 9th June 2006 and the Member States had until 29th June 2008 to adopt and publish the national laws and regulations transposing the provisions of the new Directive into national law. The provisions of the new Directive became applicable on 29th December 2009. These legal provisions – similar to legislation in the US and other countries in the world – triggered the first standards on technical documentation, localization and translation under a quality management perspective. Today, standards concerning the quality of LI products and services are not only accepted, but increasingly also demanded by customers and LSP. Technical documentation (TD – also, especially in the US, subsumed under technical communication) has always been closely related to production and services development. It overlaps to some extent with technical writing, a form of technical communication which refers to styles of writing used in fields as diverse as computer hardware and software, engineering, chemistry, the aerospace industry, robotics, finance, consumer electronics, and biotechnology. Technical writers explain technology and related ideas to technical and

CELAN D2.1_fv1.1

41

nontechnical audiences. Under this perspective it is sometimes used in combination with scientific, technical and medical (STM) writing. But there are also other services on the market offering a range of writing and editing services, such as copywriting, ghostwriting, public relations (PR) writing, editing and revising, rewriting and reshaping of texts. This may go so far as to turning a text into presentational material or more or less laborious high-end publications e.g. by adding graphic design and illustrations (often comprising linguistic elements). The number – and in a few cases also the size – of text creation, editing and re-purposing services is growing. There are LSP only specialized on one or a few individual services. Common to all is the use of all kinds of language technology tools (LTT) available on the market. Probably the most thriving professions under this category are those related to technical documentation/communication. In spring 2008 (see: Hager 2008), tekom marked a minimum of 84.571 employees holding a position in the field of technical communication. A similar tekom study conducted in 2002 only calculated around 67.000 workers spending more than 75 percent of their work time on topics related to technical communication. This indicates a rise of 26 percent over the past six years.

Service providers offering technical communication services marked the strongest increase in their demand for skilled personnel. Forty-two percent of service providers had added at least one new team member over the past year.

Thirty-three percent of the surveyed technical communication departments in enterprises had marked an increase in employees. According to tekom, this increase is mainly due to the fact that the technical communications sector has become a specialized corporate division: A couple of years ago, technical documentation was left in the hands of the product developers. Nowadays, most companies employ staff with specialized skills in technical communication. Specialists in the field of technical communication are rare. tekom found out that there are currently up to 4,000 job vacancies on the German market. However, there is a dramatic shortage of qualified people. There are no exact figures showing the demand for skilled workers in developing markets around the world. But there is one thing that is for sure: there is no quality communication without qualified staff. Development trends: (1) Technical documentation/communication and other text creation, editing, re-purposing services in the beginning have largely been monolingual – i.e. carried out in a specific language. However, increasingly multilingual approaches are needed – in any case the LTT used are required to cope with multilingual content. In the course of this development towards closing the gap to localization and translation the GILT (globalization, internationalization, localization and translation) related LTT are converging in terms of technical requirements. GILT related LTT increasingly have to become interoperable or even integrated into product information management (PIM) systems. This also can be said of certain desktop publishing (DTP) systems and services. (2) There is a development push from PIM system developers to include all kinds of content of the organization. PIM refers to processes and technologies focused on centrally managing information about products, with a focus on the data required to market and sell the products through one or more distribution channels (e.g., web sites, print catalogs, electronic data feeds). A central set of product data can be used to feed consistent, accurate and up-to-date information to multiple output media such as web sites, print catalogs, ERP systems, and electronic data feeds to trading partners. PIM systems generally need to support multiple geographic locations, multi-lingual data, and maintenance and modification of product information within a centralized catalog to provide consistently accurate information to

CELAN D2.1_fv1.1

42

multiple channels in a cost-effective manner.(3) Advanced PIM systems allow enterprises that own staff and customers access – possibly filtered according to their requirements – all product related data anyplace, anywhere and anytime, not to mention any channel especially through mobile devices. The Internet, mobile technology and social web applications have their impact also on most of the other services, for instance in the form online writing, collective writing, self-publishing, etc. (4) For the sake of a coherent corporate language (CL) and in order to facilitate the translatability (incl. machine translatability) of texts into other languages, controlled natural language (CNL, in some application fields also called simplified language, plain language etc.) gets more and more widely used. This has an impact on LTT development, in order to support style guides, CL handbooks etc.

Uptake in industry&business:

The increase of such services (and technologies) over the last years indicates a growing need resulting in the increased uptake in industry&business. Because of the highly technical nature of the ICT involved the uptake is largely confined to large and medium-sized companies (with the exception of some highly specialized industry sectors). LSP are following this demand and offer services based on the respective new tools/systems to companies considering outsourcing.

Standards and certification: Quite a few standards already exist, some are under development, and some more are needed. There are standards for formal layout and syntactic features, but increasingly standards for semantic aspects are required. Before long, this will trigger certification schemes for the quality of the services rendered, the text produced, the skills/competences required, etc.

Additional gaps:

Controlled natural language approaches and the respective standards need to be further developed. Recommendations:

Technical documentation/communication goes beyond writing or supplying information – it is all about apprehending the right data according to the audience, then getting it in a format that best suits its needs. Effective technical documentation increment customer satisfaction and decreases support costs. Technical documentation leads to tangible benefits since it can make business processes more reliable. Well-written content can make intellectual capital even more valuable.

“The important thing when charging an external documentation service with creating your documents is that you reserve all rights regarding templates, style sheets, texts, and source code (especially in the case of online help / online documentation).

All processes should be designed in such a way so that they can be mirrored in your company. This means: All documentation processes should be documented.

Only software should be used that can be licensed by everybody. If any custom software is involved, it should also be sufficiently documented and available for use with an appropriate license”.

“Using external documentation services can increase your flexibility and reduce time to market”.

“Outsourcing documentation gives people back the time to focus on their core tasks”.

“Tasks that require special skills can be performed much more efficiently by a specialized service provider”.

See: Pros and Cons of Outsourcing Technical Documentation. Proper, user-friendly documentation ensures a good return on companies’ investment. (see:

CELAN D2.1_fv1.1

43

http://www.indoition.com/en/services/pros-cons-outsourcing-technical-documentation.htm, retrieved 2012-11-25)

5.2 Translation services

Although the demand for literary translation has increased over the years and is a substantial part of the publishing business, the demand for specialized translations has grown and is still growing exponentially. This growth has enhanced the development of translation technologies to improve the performance and quality of translation. In the past, there were mainly two types of translation services: literary and specialized translation. Literary translation, of course, still exists and has developed some specializations over the years. However, specialized translation activities have diversified tremendously over the last decades into many different kinds including:

Specialized translation in domains, like legal, technical, scientific, medical, etc.

Specialized translation in application fields, such as patent translation;

Media translation;

Subtitling and dubbing, etc. All of the above obviously further differentiate into new specializations which is not only due to the fast growing demand for translations. Specialized translation covers the translation of many kinds of technical and scientific texts and requires a high level of domain knowledge and mastery of the relevant terminology as well as textual, social, cultural and linguistic convention. Aside from linguistic skills, medical and pharmaceutical translation today requires specific training, legal-administrative and domain knowledge, because of the highly technical content, societally sensitive and legally regulated nature of many of these texts. Similar developments have taken place in the fields of legal, scientific and business translation where the demand and service providers has grown in line with the diversification of specialization. If the estimated value of translation industry in the EU really amounts to 4.7 billion Euros, the number of pages translated annually may well supersede 100 million. It is said that the institutions of the EU alone translate nearly 2 mio pages per year. This development has an impact on LTT and LCR development. The need of consistent terminology in highly specialized translation makes computer-assisted translation, the use of translation memories and sophisticated terminology databases a prerequisite. This is also a reason, why commercial translation technologies for machine translation continues to grow, which shows that translation service providers (and related services) are increasingly combining machine translation and translation memory with human pre- and post-editing. There are about 40 associations of professional translators in Europe which have a total of more than 16,000 members. (Pym e.a. 2012) Usually the number of non-affiliated translators in a country is more than that of association members. In Europe, there were an estimated 1,500 translation companies in 2005 (EUATC 2005). Their average turnover was in the region of 300,000 EUR at that time. According to the European Union of Associations of Translations Companies (EUATC 2006) the overall market share of translation companies represented approximately 25% of the total revenue of the market in 2006. In 2005 (EUATC) the combined turnover of the 15 biggest companies in the world represented 10% of the world market and 50% of the market for translation companies. As predicted by EUATC the market share has constantly increased over the years, while that of self-employed freelancers is dropping. This may have several reasons:

The volume of an increased number of translation projects has increased;

The number of translation projects with more languages involved has increased;

The time to delivery has decreased;

The translation companies can afford to use state-of-the-art LTT, including sophisticated translation/localization project management systems (TL-PMS);

CELAN D2.1_fv1.1

44

Large customers have outsourced their translation activities also under time to delivery and translation quality requirements.

The value added of outsourcing to independent LSP lies in the fact that large LSP:

Can work with and coordinate translators, editors and interpreters with appropriate linguistic and subject matter expertise;

Have the capability to manage large-scale multilingual projects;

Need to have a sophisticated content management (including TMS) in order to guarantee quality.

Some translation service providers have entered also the field of globalization, internationalization and localization services – even venturing into desktop publishing (DTP). Some localization service providers have engaged in or returned to translation, as margins in translation are higher for the moment. Generally speaking, the gap between translation services and the field of globalization, internationalization and localization services is closing – not least due to convergence tendencies in language technology development. Some hitherto high-end localization systems/tools have become affordable also for translation service providers. The larger the translation service provider, the more likely it is to use the whole range of LTT. The continual growth of languages on the internet is altering the way companies use the web to engage with a global audience. The Common Sense Advisory’s review of 1, 000 websites “Gaining Global Web Presence” showed that despite the slowing of economic growth generally, the investment in translation and localization services continued to grow – and in fact the average number of languages available on websites actually increased. Parallel to this development, customers of translation service providers are increasingly demanding with respect to the quality of translation services.

Development trends: (1) Some of the main trends of this highly fragmented market can be summarized as follows:

Large translation and localization service providers are growing fast – some of the largest ones acquiring one or two companies per month adding to their size;

The share of self-employed freelancers compared to the number of companies is growing, while their market share in terms of revenue is declining;

Translation eMarkets are evolving as intermediaries between customers and translators, which threaten the role of traditional translation service providers by offering customers a faster and less expensive service on the one hand and individual translators a larger market for their services (however less paid).

(2) As stated by Rinsche/Portera-Zanotti (2009) market consolidation represents a threat to smaller LSP and individuals rendering translation services. In 2009 the new breed of translation marketplaces was still at the stage of infancy. Since a couple of years some of the big translation and localization service providers try to reduce personnel costs (including those for outsourcing to small service providers and freelance) by using crowd sourcing methods over the Internet. (3) Globalization and the associated growth will provide the GILT sectors (globalization, internationalization, localization and translation) with an enormous boom. EUATC assumes that the translation market will observe an annual growth of approximately five percent during the next few years. Most of this is due to the exponential need for specialized translation. The numbers of human translators can never cope with this increase. GILT service providers respond in different ways, among others:

To ensure terminological consistency and to simplify terminology work companies are now making their translation databases accessible to other companies. Some 40 leading companies recently founded the so-called TAUS Data Association (TDA), which enables its members to share translation files. All members load their language combinations onto a server in the form of Translation Memories or multilingual

CELAN D2.1_fv1.1

45

glossaries and can in return download the language pairs of other members. This creates an immense volume of linguistic data.

To further develop machine translations (MT): large companies such as IBM, Sun, SAP, etc. are already achieving satisfying results with more sophisticated MTs, which – of course – only become printable after human post-editing.

More and more online MT services are becoming more common, such as Google Translate or Bing; increasingly different kinds of “instant translation” apps appear in mobile devices. This indicates that MT will in the near future become a common functionality in many devices and systems for private and commercial use.

In recent MT technology and services human intervention or controlled natural language (CNL, also called simplified language) approaches can work miracles.

Crowd sourced translations really come into their own in the social networking context

Diversify into niche market services, such as transcreation (used by advertising and marketing professionals referring to the process of adapting a message from one language to another, while maintaining its intent, style, tone and context as well as evoking the same emotions and carrying the same implications in the target language as it does in the source language).

(4) Automatic interpreting system as a combination of speech recognition/input rendered into written form, translated into another language and rendered in written form as well as by speech output has become very popular in mobile devices. Furthermore, there are many tools/apps providing instant translation online via mobile devices. Uptake in industry&business: As the demand for translations is reflected by the steady high growth of the translation market, industry has obviously turned to outsourcing. This is also reflected by the interviews carried out in the framework of CELAN. This trend goes at the expense of employed translators in enterprises. On the other hand, there are figures that indicate a growing demand on the job market for highly qualified GILT (globalization, internationalization, localization and translation) experts, e.g. 4000 in Germany alone (Hager 2008 – many of whom could be specialized translators). Standards and certification: (1) Although costs are an issue, reliability in terms of time to delivery of translations and translation quality seem to be equally important to customers. Therefore, standards-based certification of translation service providers has become an issue on the market. (2) There are new standards and new types of standards in the pipeline concerning (the measurability of) translation quality, the appropriateness of LTT, for certain personnel skills and competences (such as of terminology managers) etc. Standards concerning the quality of translations are not only accepted, but increasingly also demanded by customers and LSP. (3) As with any comparatively young industry, regulatory and accreditation authorities are still rare. The established technical certification authorities are too expensive for smaller LSP and OPE. Additional gaps: From the customers point of view the translation market suffers from a lack of transparency. Hence the call for more standards related to services, translation quality, individuals’ skills and competences, etc. Recommendations

“It is imperative to understand the depth of the translation opportunity. Managed correctly, it can enable key growth and competitive advantage. Managed poorly, it is a fragmented, unstructured expense that is hidden from the glare of executive review, and ripe for optimization. You cannot afford to let key expense areas with a material impact on your global growth go unchecked. Even if you are personally not able to run the

CELAN D2.1_fv1.1

46

program, you have the capacity to manage those who can, to be engaged in the planning, to ask informed questions, to conduct quarterly performance reviews, and to ensure maximum value is being generated for the translation spend”. (Lionbridge, 2009, Ten translation best practices)

Some vital issues that are necessary to take into account when using translation services are: General Knowledge about the translation process and quality assurance procedures; following an internal needs assessment; allocate appropriate resources for LSP or language expert employees; plan the translation process and be sure to supervise it completely.

“Finally, always look ahead to the future of translation in your industry. When you begin to think of translation as more than just compliance, but a tool to reach new markets, you’ll see that quality translations aren’t just necessary; they’re part of an investment strategy that will eventually have a major payoff”. (Adapted from Avant page, see: http://www.avantpage.com/services/)

5.3 Interpreting services

An interpreter is a person who converts a thought or expression in a source language into an expression with a comparable meaning in a target language in "real time". The interpreter's function is to convey every semantic element (tone and register) and every intention and feeling of the message that the source-language speaker is directing to target-language recipients. These are the following main modes of interpreting:

Conference interpreting (simultaneous interpreting),

Consecutive interpreting,

Community interpreting,

Telephone interpreting (over the phone interpreting),

Signing (sign interpreting). Dubbing – depending on the way it is done – can also be seen as a kind of interpretation. More and more combinations of these modes and new modes are required and emerging in practice. Conference interpreting is closely linked to conference organization. Individual conference interpreters are used to working and organizing themselves in teams. Some thus develop into interpreting service providers offering also other conference organizing services. Large interpreting service companies usually also offer a broad range of conference organization services. The interpretation market represents approximately 10% of the translation market. Within the interpretation industry, the over the phone interpreting (OPI) market has seen an enormous growth in recent years. The following figure illustrates the customer distribution of the interpretation market in general.

Figure 8: Market shares in interpreting services

CELAN D2.1_fv1.1

47

Interpreting services are using a specific mix of language technology tools/systems and increasingly also language and other content resources (LCR) for their services. Demand for interpreting services in Europe has increased and continues to do so. Development trends:

(1) Increasingly LTT are

Used also to prepare a conference interpreting job (e.g. to have fast access to LCR)

Developing means to detangle the need for the interpreter to be in close proximity with the speaker and the audience.

(2) Probably the most prominent new development in the field of interpreting is telephony interpreting (or over the phone interpreting, OPI). This is an on-demand service that allows individuals that speak different languages to communicate effectively through a human interpreter over the use of the telephone or of a conferencing system. (3) Automatic interpreting system becomes a new hot topic for interpreting services. It is a combination of speech recognition/input rendered into written form, translated into another language and rendered in written form as well as by speech output. In mobile technology it appears as apps – still at a basic level – in mobile phones and other mobile devices. Not only the disappearance of keyboards is pushing this development, but also the need for interpretation into sign language and other augmentative an alternative communication (AAC) means. Uptake in industry&business: Whereas SME usually are not aware of the difference between translation and interpreting, large industry knows and values interpreting services. Standards and certification: In addition to the international standard on the form, size and equipment of interpreters’ booths new standards concerning the quality of interpreting services are emerging. In addition there is a growing need for standards-based skills and competences certification for certain interpreting modes, such as community interpreting. Additional gaps: Migration and other factors of mobility within Europe and between Europe and the world is requiring more – and more modes of – interpreting, for which it mostly lacks job/role descriptions and social as well as legal and technical standards. Recommendations In view of the development trends over the last few years (whose development seems to accelerate), higher education institutions (HEI) and other training organizations for the training of interpreters are advised to design new schemes or adapt their schemes to train the next generation of interpreters. More and more combinations of interpreting modes and new modes are required and emerging in practice. Demand in society (such as community interpreting) and LTT drive the diversification in this field of language mediation. In consecutive interpreting differentiation is taking place in various forms, too. In interpreting at negotiations any appropriate interpreting technique may be used depending on the situation, the listeners' abilities, the number of languages involved, and the complexity of the discussions.

CELAN D2.1_fv1.1

48

5.4 Localization (L10N), globalization (G11N), internationalization (I18N) services Although often referred to as part of GILT, localization services developed out of technical documentation/communication rather than translation. G11N, I18N and L10N have developed with multilingual approaches from the outset. L10N developed distinguishing itself from translation, not least due to the formerly conservative nature of the translation services. Desk-top publishing systems (DTP) are more and more included in the GILT framework. GILT services are extensively using any of the language technology tools/systems necessary for their services. More than that, they belong to the most intensive users of LTT and LCR. Under the growing demand for interoperability and integration their combination with or integration into PIM systems can be observed. In general, localization (L10N) is the adaptation of a product or communication to a community of speakers with respect to cultural, linguistic, legal, political and other aspects. More specifically, it means to adapt computer software (software localization) to different languages, regional differences and technical requirements of a target market. Globalization (G11N) refers to a broad range of processes necessary to prepare and launch products and company activities internationally. In conjunction with launching a product globally, all the business issues associated with this decision has to be addressed, such as integrating localization throughout a company after proper internationalization of the product design. G11N goes much beyond localization and includes the revision of business processes, management procedures and even the adaptation of marketing tools, among other initiatives. Internationalization (I18N) was developed as an approach to facilitate localization into a multitude of target markets. It is the process of designing a software application so that it can be adapted to various languages and regions without engineering changes. In the ICT it is most common today that localization is the process of adapting internationalized software for a specific region or language by adding locale-specific components and translating text. I18N evolved out of translation and localization. The goal is for original content (or software) to be international ready. Deployment of I18N is considered strategically at the beginning of content development, not after original content is already developed. I18N can be implemented at different degrees depending on the goals of international distribution: Level 1: Minimum level

Application independent from any language/character set encoding

Application independent from any cultural conventions Level 2: User-visible text strings

Hard-coded text

Concatenations/Variables

Intensified use of Tools Level 3: Support for non-western languages

Unicode

Essential to penetrate Asian market Level 4: Highest level

Multi-national products

Support for processing and storing data originating from different locales Needless to mention that the higher the I18N level, the more LTT has to be used for I18N implementation. In connection with internationalization, localization pertains to or is concerned with anything that is not global and is bound through specified sets of constraints of:

Linguistic nature including natural and special languages and associated multilingual requirements,

Jurisdictional nature, i.e., legal, regulatory, geopolitical, etc.,

Sectoral nature, i.e., industry sector, scientific, professional, etc.,

CELAN D2.1_fv1.1

49

Human rights nature, i.e., privacy, disabled/handicapped persons, etc.,

Consumer behavior requirements,

Safety or health requirements. Localization has diversified among others into:

Manual localization,

Catalogue localization,

Software localization,

Website localization,

Games localization. People all over the world treat the Internet as their main location for information and services. As these people do not speak the same language, website localization has become one of the primary tools for business global expansion. True localization includes more than translating documentation. Each localization effort may include: Integration into country or language specific operating systems (OS), user-interfaces (UI), graphics, colours, technical support numbers, fonts used, warranty statements.

www.projectconnections.com/.../LocalizationGuidelines.doc “If you plan on distributing your application to an international audience, there are a number of things you will need to keep in mind during the design and development phases. Even if you do not have such plans, a small effort up front can make things considerably easier should your plans change in future versions of your application”. “Sensitivity towards cultural and political issues is an especially important topic when developing world-ready applications. In general, these items would not prevent your application from running; instead, they are items that may create such negative feelings about the application that customers may seek alternatives from other companies”. “Automated translation tools can significantly cut down on localization vendor's costs. But automatic translation tools only work if standard phrases are being used. Many localization vendors are paid per word. Consider the amount of money that can be saved if one standard phrase can be easily, or automatically translated into multiple languages”. From: http://msdn.microsoft.com/en-us/library/aa292604(v=vs.71).aspx

The localization, internationalization and globalization industry consists of many different types of services (and LSP) providing outsourced language technology and services or in-house support within multinational companies. This may comprise:

(1) Localization service providers These LSP adapt products so they seem natural to a particular region's residents. This process considers language, culture, customs, and other characteristics of the target locale.

(2) Technology developers Language technology facilitates efficient use of multilingual content and accelerates time-to-market solutions from TM and CAT tools to machine translation and global content management systems.

(3) In-house localization and translation departments Many multi-national companies have in-house teams that coordinate translation strategy and implementation for their companies internally, most often in partnership with supplier partners.

(4) Translation companies With more than 6,700 languages are spoken in 230 countries worldwide; translation providers make the world feel a little smaller.

(5) Research analysts, publications, and training institutes The globalization industry includes top research analysts, publications, and training companies.

CELAN D2.1_fv1.1

50

(6) Globalization and internationalization consultants Consulting covers the revision of business processes and management procedures and the adaptation of marketing tools.

In contrast to the translation services’ showing good market performance, the global financial crisis since 2008 is having a slightly negative impact on the localization service providers which are nearer to the market for concrete products and services. The responses to the latest Global Business Confidence Survey of Common Sense Advisory were mostly positive, although there was a drop in expected demand among some respondents. For some companies blessed by growth rates ranging from 20% ~ 40% over the past years, growth might not be as aggressive as in the past. Others complain that their customers are taking longer to pay – but that is a different story. (see: http://www.commonsenseadvisory.com/Default.aspx?Contenttype=ArticleDetAD&tabID=63&Aid=542&moduleId=391) Development trends:

(1) Markets are evolving quickly; numerous technologies, trends and models in localization are emerging and disappearing, which makes it difficult to keep pace with all the changes. (2) New niche markets such as community localization and crowd sourcing are being explored. Open source is a new model of software development, which over the recent years is used in localization service sector. (3) The general evolution of mobile technologies and social web approaches are also impacting localization services, e.g. in the form of mobile website localization. (4) New needs for localization are emerging, such as the localization of product descriptions, product classifications and other kinds of structured content. (5) Quite a while ago, large-volume manuals were replaced or supplemented by CD-ROM. Today the trend goes in direction of replacing manuals by online help, frequently asked questions, support services, or whole manuals accessible online. Uptake in industry&business:

(1) Producing/manufacturing industry – not to mention software development – recognized the need for localization in technical documentation/communication quite early. Later it became a topic of marketing, promotion, after sales services etc. In some cases today, localization has become an integral part of product development and parallel product documentation. Given the fact that an ever-growing number of languages (or cultural communities) have to be considered, the cost factor (while improving quality) is the central issue for management. (2) SME proved to be quite pragmatic and keen in solving their localization issues: e.g. by finding ways and coordination methods to “outsource” the problem to their foreign branch offices or subsidiaries. There are big differences depending on the company size, the degree of liability-sensitivity of the products or services offered, or just company tradition. (3) Public or semi-pubic corporations, such as railway companies (maybe also higher education institutes), are among the slowest to recognize the benefits of localization. Standards and certification:

(1) The fundamental problem of standardization with respect to localization is that several aspects of localization belong to different turfs of technical committees (TC), which poses a coordination and communication problem. In this connection it is encouraging to see that GILT-related standardization experts are cooperating across those boundaries with the intention to solve present and future standardization issues. (2) Given the fact that standardization efforts are scattered over the scopes of a number of TC, certification schemes increasingly will be based not on one standard, but on a number of standards – sometimes of different TC-domains and of different standards organizations. (3) The need for more standards (and the revision of existing standards or coordination of standardization activities across TC boundaries) is increasing.

CELAN D2.1_fv1.1

51

(4) Multiple, sometime overlapping standards are available from different international organizations including the W3C, the International Organization for Standardization (ISO), Organization for the Advancement of Structured Information Standards (OASIS), European Telecommunications Standards Institute (ETSI), the Unicode consortium and the now defunct Localization Industry Standards Association (LISA). The gap here is often just to understand how the standards interplay. (see: http://www.cngl.ie/drupal/sites/default/files/papers3/fp26-filip.pdf) (5) Certification – in terms of professional competences/skills certification – has made big progress in some countries. In Germany more than 7% of professionals of technical documentation are certified. (Baumert&Straub 2012) Additional gaps:

(1) The emergence of global DVD markets will necessitate the convergence of screen translation (subtitling and dubbing) and localization in order to deliver multilingual digital multimedia content efficiently. (2) A kind of cultural gap within the GILT industry seems to continue: there is not enough discussion between players involved in each of the G, I, L and T silos. As digital technology continues to produce new content, the GILT industry has to ensure that its accumulated collective knowledge is usefully applied. Without this step, GILT will suffer further fragmentation and lack of standardization. (see: http://www.albaglobal.com/article1114.html. Retrieved 2012-10-18) (3) There is a need for more formal education and training in localization. Recommendations The benefits of I18N include among others:

Quick use of software application (or content resp.) in multiple locales

Reduced time and cost, especially for localization because content is already ready to be localized

Higher revenues and profits from other markets

Single source code for all applications

Simpler maintenance

Improved quality and code architecture

Adherence to important standards

5.5 Desktop publishing (DTP) services (complementary to GILT services) Although word processing and other office automation (OA) software have evolved to include some, though by no means all, capabilities previously available only with professional printing or desktop publishing it needs experts for using DTP professionally. Under the impact of multilingualism many DTP systems have been enhanced by translation technology and text technology tools. Thus, for the sake of system integration and content interoperability, multilingual DTP services have become an important business. As a consequence, there is a trend among DTP services to venture into the field of the GILT services (viz. globalization, internationalization, localization and translation services). Vice versa, GILT services frequently also include professional DTP in their services. Graphicline (see: http://graphicline.co.za/dtp_overview retrieved 2012-10-18) has discontinued all DTP and printed material publishing sector activities as from 01 February 2012. Among the main reasons were: reduced demand for high quality reprographic services, competition from low-cost (low quality) services, and the ease advertisers, publishers and printers today have to produce their own material. In the past nearly all repro was undertaken by specialised agencies or at least by specialised persons. The industry has changed to the point very little of this work is outsourced today. At the same time costs have continued to rise – professional software applications (/vendor name/), consumables (high quality proofing paper, high resolution larger format printers and printing supplies), industry standard DTP systems (/vendor name/) are at an all-time high, while recoverable rates have

CELAN D2.1_fv1.1

52

remained fairly static during the period. … There seemed little reason to upgrade out of date equipment and software in order to remain active in a shrinking market. … in order to continue with DTP we would have had to purchase new software, or hire new versions at high rates. … Graphicline historically provided DTP and Pre-Press services to a wide range of customers, including advertising agencies, printing firms, businesses and the public. Incredibly contrary a US business report (see: http://www.powerhomebiz.com/vol13/desktop.htm. Retrieved 2012-10-18) which probably is explaining, why high-quality DTP services are going out of business. It says that DTP is a business that offers real opportunities – and can easily be set-up in a home office. … According to recent estimates by business consulting firms this market has grown from roughly 3 million in annual sales in 1985 to almost 3 billion in 1991 ... Desktop publishing is a term used to connote a new way of publishing documents - using desktop computers as against the traditional method of mechanical, real metal type, and scissors and glue. … Today virtually any business use desktop publishing for some purpose. Desktop publishers prepare graphic materials such as: brochures, flyers, full-page advertisements, business forms, Web pages, logos, CDs and cassette covers, catalogs, newsletters, books, proposals, and much more. Some desktop publishers also perform word processing services for their clients. While some desktop publishers prepare almost any kind of graphic material, many specialize in one or more, such as newsletters. Development trends:

(1) In a world of converging technologies and technology-based services of

technical documentation/communication and other text creation, editing, re-purposing activities,

globalization, internationalization, localization and translation (GILT) integration,

desktop publishing (DTP), the larger LSP in these fields must invest heavily in the respective technologies, management systems and human resources in order to stay in business. In the course of this development DTP services could lose their nature as a distinct business. (2) Smaller DTP service providers can only lower price and quality or find new niche markets, such as high-quality product catalogues for trade fairs, whose content – in most cases a highly messy content – comes from databases or all kinds of programs. (3) Both can be found on the market of DTP services: sad stories of going out of business and success stories with positive perspectives. Uptake in industry&business: (1) DTP services are struggling, but the uptake in industry&business in general is positive. If high quality is required, not only large-scale enterprises, but even SMEs are increasingly outsourcing DTP jobs. (2) While quality requirements may have lowered in general, for some end-products they have increased to a degree that traditional reprographic services and DTP could never cope with. Standards and certification: In order to ensure interoperability between common OA software, translation technology, text technologies and localization systems on the one hand and DTP systems on the other hand, standards-based methods and conversion tools are necessary. Additional gaps: Today, multilingual DTP competences are often required in translation services and even more so in localization services and technical documentation, which is largely due to the challenges posed by globalization, internationalization and localization with respect to efficiency, speed and guaranteed quality of the results of language services. In order to ensure interoperability between common OA software, translation technologies and

CELAN D2.1_fv1.1

53

localization systems on the one hand and DTP systems on the other hand, standards-based conversion tools are necessary. Recommendations: Today, DTP systems help creating a professional-looking end result, with complex layout and design – often in connection with databases (of language and other content resources). DTP is used for publishing at all levels, from small-circulation documents such as local newsletters to books, magazines and newspapers. Its methods provide more control over design, layout, and typography than word processing does. Therefore, it allows individuals, businesses, and other organizations to self-publish a wide range of printed matter – from menus and local newsletters to books, magazines, and newspapers – without the sometimes-prohibitive expense of commercial printing. As professional use of DTP needs trained experts, DTP services are thriving, although more and more features of DTP systems are integrated into office automation (OA) tools/systems. Multilingualism in publishing has an impact on DTP services, which venture into the field of the GILT services.

5.6 Language teaching&training (LT&T)

LT&T outside of the official educational system have tremendously developed over the last three decades and differentiated among others according to:

Enterprise-oriented language training,

Target groups of language teaching,

Types of language courses,

Skills/competences of teachers/trainers,

Skills/competences of learners,

Teaching of languages for special purposes (LSP),

Computer-based language learning. There is an array of different kinds of organizations carrying out such LT&T, from traditional non-profit vocational education organizations, such as Volkshochschulen (a sort of adult high schools in Germany, Austria, and Switzerland), adult education in the Scandinavian countries, via official cultural institutions such as the British Council, Institut Française, Goethe Institutes, etc. to clubs and associations. Other informal/non-formal language training has emerged for special target groups, such as migrants. Private for profit LT&T services have become a booming business. Some are operating at a world-wide scale. Target groups and different needs have increased to such an extent that the overall demand cannot be met by traditional teaching methods and settings. From the information gathered it is difficult to judge, to which extent language learning through

Public educational institutions

Private educational organizations

Enterprise strategies

Individuals’ initiative is meeting today’s demands given changing lifestyles and technology developments. New ways of LT&T services are emerging or spreading, such as language travels services. For the sake of transparency in this confusing lot of different services on the market, there are calls for service standards and certification schemes. Much of the current USD 83 billion spent today in language training is heavily fragmented and offered in a variety of forms. Within these arenas there are several methods and concepts that are presented as viable and promising ways to learn a language; however

CELAN D2.1_fv1.1

54

there is no clear accepted method to learn a second language. Although the language acquisition market may be significantly large at $83 billion USD, the current language training spending within the eLearning market is still marginal. Although the tertiary sector is beginning to show examples of opening its eLearning systems and content to the public, the following statement is particularly relevant also for language learning (see Blackhall 2011):

“The authoring, storage, delivery and reuse of educational content is rapidly becoming a significant problem in the tertiary education sector where significant content is generated for the plethora of courses delivered each year. Effectively being able to manage this authoring process (authoring, storage, delivery and reuse) will offer significant advantages for the tertiary education sector. The challenges being faced in the content authoring process in tertiary education sector can be summarised as follows:

Little or no archiving of content (each lecturer redevelops content).

Tools used are content developer specific.

Content types supported depend on the platform used by each developer.

Important standards are not necessarily supported (i.e. WCAG, SCORM, etc…).

Content is typically recreated for each delivery mode (i.e. PDF, PowerPoint slides, lecture notes, etc…).

Content cannot be updated easily.”

Development trends:

(1) Although LT&T services are a booming business there are some dark sides of the coin:

Teachers/trainers are faced with a multitude of skills/competence requirements of teaching more and more different target groups with different needs;

Society has developed in such a way that the daily time account available for spending on language learning in traditional settings is diminishing;

Nevertheless the demand (in terms of professional needs or individual interest) is increasing, which results in booming language learning (ICT-based) methods/tools for individuals and (web-based) language learning portals of all sorts.

(2) It looks as if mobile technologies will also revolutionize language teaching and learning. In the age of networking technologies and social media main trends are directed either to improve or to use current technologies for mLearning with smarthphones, tablets and hybrids, as well as taking advantage of Web 2.0 tools/apps, multimedia tools/apps, communication tools, augmented reality and 3D systems. (3) Open Educational Resources (OERs) are becoming more widely available, and many universities are opening their content to the public. This also applies to language learning opportunities. Simultaneously, language eLearning solutions are aiming at providing teaching technology that is lightweight, dynamic, scalable and modern. (see: https://atutorspaces.com/reasons.php) Uptake in industry&business: (1) In the past companies (and other organizations) more or less had only a few ways to raise the level of language competences in the organization:

Employ foreigners, migrants, people from bi-/multi-lingual family background,

Employ people having undergone formal foreign language training at public or private educational institutions,

Encourage or even organize formal foreign language training for their employees. (2) Children now turn to social media by default. This makes it a great – albeit currently underused – tool for language learning/teaching. Increasingly enterprises can count on some basic foreign language skills of their prospective employees. This makes it easier for enterprises to require basic foreign language competences from their job applicants, while reducing efforts to organize language training.

CELAN D2.1_fv1.1

55

(3) Computers, the Internet and ICT make the student as well as the teacher more mobile. For instance, it is possible to meet people who want to learn the same language or to meet teachers for the respective language who do not have to live necessarily in the same country as the student (e.g. tutorials via Skype). Furthermore, there are platforms in the internet (e.g.palabea.com, busuu.com) where language learners can register for formal courses or in order to find partners / friends for practicing their foreign languages. Standards and certification: (1) Given the broad range of different kinds of LT&T services and their increase in terms of numbers and different types new service standards are under development, which will become the basis for certification schemes. (2) Standards for evaluating and testing the language competence levels according to the Common European Framework of Reference for Languages (CEFR) taking into account different target groups are lacking. Additional gaps: (1) Although computers and the Internet are increasingly available, using a computer at home for learning purposes is still relatively low. ICT is widely promoted at a central level for teaching and learning, but a large implementation gap still remains – not to mention the non-interoperability between the plethora of LT&T systems. (2) The educational field developers look for the most impactful use of technology worldwide in support of learning, but language learning in this connection is still undervalued. (3) The Common European Framework of Reference for Languages (CEFR) needs further differentiation so that it can serve as a basis for a refined testing methodology (and the respective testing tools) to verify the “real language competence” of the person for given purposes. Recommendations (1) Language competence levels change – increase, are maintained or decrease – over time. It could be worthwhile to conceive incentive strategy for employees to maintain or even develop their language competence levels and prove it – e.g. by self-evaluation through testing portals or by formal testing – so that enterprises get confirmed value for staff expenses. (2) From a teaching and learning theory perspective it is important to separate the content creation process from the specific tool being used to deploy the content, not only to focus attention on the process of creating truly compelling and interactive learning objects, but also to ensure that the content can be easily shared and reused without being locked into a specific authoring tool or learning management systems (LMS).

5.7 Language-related industry consultancy services LI-related consultancy services can refer to all or any of

Overall language policy of an enterprise,

Introduction or improved use of language technology tools,

Most efficient use of language services and LSP,

Most effective use of terminology and other LCR,

Introduction or outsourcing of language learning/teaching, or to the

Pertinent standards and certification schemes. From the interviews it could be gathered that there is a great need for “neutral and objective” (i.e. LTT vendor- or LSP-independent) advice and consultancy in view of:

The high complexity of the LI,

The confusing (in transparent and sometimes exaggerated) offers of the LI,

The need for language policies/strategies.

CELAN D2.1_fv1.1

56

The quality of the consultancy service does not always correspond to the bombastic offering when the service is contracted. Nevertheless, because of the fragmented nature of the LI and its rapid development, potential industry&business customers as well as stakeholders of the language industry need advice with respect to market developments up to the formulation of language policies/strategies. Documents as the PIMLICO report (Hagen, 2011) and Rinsche/Portera-Zanotti (2009) provide convincing insights about the magnitude of the language industry and the kind of services that play a main role in this one. “A quick recovery and continued steady growth of the market was forecasted with an approximate value of the language industry of 16.5 billion € in 2015”. Equally, the need for consultancy services is supposed to grow, but there are still few LI consultants independent from any LTT vendor or LSP on the market. On the contrary, all large and most medium-sized LTT developers and LSP offer guidance on their homepages. To some extent this is also done through training and promotion activities.

Development trends:

(1) Consultancy services concerning language technologies, language and other content resources (LCR), language-related services, content management, standardization and certification up to comprehensive language consultancy services, are gradually emerging. Only few consultancy services also include aspects of communication with persons with disabilities (PwD). (2) Given the increasing need for consultancy services also with respect to LI products and services, not to mention standards, certification and policy aspects, large consultancy companies jump on the wagon whether or whether not they have the knowhow for such services in the field of the LI. (3) Especially large LSP are urged by the market to provide comprehensive consultancy services in working closely with their customers and in acquisition activities. Uptake in industry&business: (1) Consultancy services offered by LSP and/or LTT developers may comprise many useful things, such as: assessment in education, building human capacity, identifying standards, enhancing learners/teachers/customers’ prospects, improving language standards, policy making/adoption, certification, quality assurance, technology management, market research, development of examination systems. (2) Big LSP – often also LTT developers – offer comprehensive service packages that may include complete LTT and language service solutions, such as translation, transcription, voice over, interpretation, editing and proof reading, content development and maintenance, localization (including software localization), content management and desktop publishing. They find their customers among those enterprises which have already quite an experience with LI products and services. (3) As soon as industry&business recognizes a need for LI products and/or services, most enterprises immediately recognize that they are at a loss without external help – i.e. a consultant. Large-scale enterprises can afford good external consultants or even intensive investigations outsourced. An SME in most cases will not have the time and means to spend on larger investigation or on an external consultant. Standards and certification: (1) Given the fact that the EU recognizes the substantial contribution of management consultancy services to the European economy and wants to create a ‘borderless’ market for services, stakeholders of management consulting services saw good reasons to create a European Service Standard: EN 16114:2011 “Management Consultancy Services”. Including representatives of the management consulting services sector (such as International Council of Management Consulting Institutes – ICMCI, the European Federation of Management Consultancies – FEACO, etc.), CEN/TC 381 (Project Committee “Management consultancy services”) started its activities in 2008 and published the standard in September 2011. EN 16114:2011 applies to all management consultancy services providers, whatever the area of

CELAN D2.1_fv1.1

57

specialization or size of the business. The standard however does not prevent management consultancy providers from using own methods and approaches which will encourage innovation and differentiation. The latter are important values of the management consultancy service providers’ proposals. Besides, there exist handbooks or guidelines – sometimes at government level – concerning consultancy services in general. (2) There are no formal standards and standards-based certification schemes in the field of LI related consultancy services. (3) The topic of language in connection with the translation of consultancy contracts appears in several formal standards. Additional gaps: (1) Many self-claimed consultants are – often not transparent – liaised to a LTT developer or LSP. Therefore, and because of fear of confidential or even secret information slipping out of the enterprise there is deep-rooted suspicion at potential customers concerning the trustworthiness of consultants and their un-biased advice. (2) LI-related topics (including standardization) are not sufficiently taught/trained in the framework of ICT education. The same applies to topics related to assistive technologies (including standardization) and augmented and alternative communication (AAC), which are surprisingly interrelated. (see: item 5.8) Recommendations

Even consultancy services – not least with the aim to help realize the “single market” in Europe – have become subject to standardization efforts. The European service standard EN 16114:2011 “Management Consultancy Services” (developed by CEN/TC 381 <Project Committee> “Management consultancy services” including representatives of the management consulting services sector <such as International Council of Management Consulting Institutes – ICMCI, the European Federation of Management Consultancies – FEACO, etc.>) applies to all management consultancy services providers. But there is no mention of multilinguality or diversity management in EN 16114_2011. Thus there are no formal standards and standards-based certification schemes in the field of consultancy services related to language policies/strategies, corporate language/culture etc. SA8000®: 2008 “Social accountability 8000” does not refer to this either – not to mention persons with disabilities (PwD). Therefore, efforts should be organized to

To include multilingualism and diversity management in major management theories and training programmes,

To develop a – preferably international – standard on guidelines for formulating a language policy (e.g. at national level) and/or language strategies (e.g. in enterprises – taking as a starting point the international standards ISO 29838:2010 Terminology policies — Development and implementation / Politiques terminologiques — Élaboration et mise en oeuvre),

To develop a – preferably international – standard on guidelines for formulating a corporate language.

On the basis of such formal standards (i.e. de jure standards), vendor-independent standards-based certification schemes should be established.

5.8 Communication services for persons with disabilities (PwD) Not least due to the emergence of the assistive technologies and different forms of augmented and alternative communication (AAC), new types of communication services for PwD are appearing. They refer to

The development of LTT (or the adaptation of existing LTT) for use by PwD,

The development of LCR (or the adaptation of existing LCR) for use by PwD,

The development of LS (or the adaptation of existing LS) for use by PwD.

CELAN D2.1_fv1.1

58

Sign interpreting – in one language or between languages – can already be called “traditional” in this connection. Beside sign interpreting, there are new developments on the market, such as instant subtitling of PowerPoint presentations at conferences, dubbing of movies etc. Communication in this connection goes beyond language in the common sense. Blissymbolics for instance was conceived as an ideographic writing system called Semantography consisting of several hundred basic symbols, each representing a concept, which can be composed together to generate new symbols that represent new concepts. Blisssymbols differ from most of the world's major writing systems in that the characters do not correspond at all to the sounds of any spoken language. There are several other communication means of this kind used with or without ICT-assistance. Under a broad perspective of “interoperability" and “localization” (L10N), mobility and accessibility require many same or similar requirements for software and content development. For instance many requirements for mobility are prerequisites in ambient assisted living (AAL) for PwD or ageing persons (some of them to a smaller or larger degree – sometimes even multiple – disabled). Many requirements for accessibility are shared or could/should be shared with those for eLearning. The importance of the assistive technologies (for assisting people with special needs, viz. persons with disabilities – PwD) is rising on the radar of politics – as a growing societal, ethical and economic necessity. The application of principles derived from “Accessible Design”, “Universal Design”, “Design for All” and “Design for Society” is becoming increasingly mainstream. The usability of products, services and environments as perceived and experienced by end-users is a key driver in product development and not confined to technical know-how alone. In this connection communication aspects (among people, between people and their devices and among the devices) will play an indispensable role, as recognized by national governments and the EU Commission. Increasingly it is necessary to combine AAC with LCR or embed them in text. “Total Conversation” (http://hub.eaccessplus.eu/wiki/Total_Conversation) is a comparatively new conception showing the definite need to combine linguistic and non-linguistic items of structured content. This applies also to the ICT support for rendering communication services for PwD. Thus services for disabled people have become quite common in many countries, including – telecommunications, such as subtitling, sign language on TV and audio description (known as television access services), help people with hearing or visual impairments to understand and enjoy television. Others are developed for PwD who cannot use a spoken or sign language (e.g. Braille, Touch, Blissymbols) or access to voice telephony. User interface design geared towards PwD also proved useful for other purposes – especially in eLearning.

Development trends:

(1) Assistive technologies – e.g. in the form of ambient assisted living (AAL) or augmentative and alternative communication (AAC) – are on the verge of developing into a true assistive industry. (2) As of October 2012, the United National Convention on the Rights of Persons with Disabilities (UNCRPD) has 154 signatories and 125 parties, including the European Union (which ratified it to the extent responsibilities of the member states were transferred to the European Union). This implies that the provisions of the Convention become legal and binding at national and EU level. Within the framework of the European Disability Strategy 2010-2020 (COM 2010 636 final) the EU Commission committed to explore the merits of adopting regulatory measures to ensure accessibility of goods and services, including

CELAN D2.1_fv1.1

59

measures to step up the use of public procurement, through a ‘European Accessibility Act’ – a business-friendly proposal that will substantially improve the proper functioning of the internal market for accessible goods and services. Uptake in industry&business: Employers who do not make their vacancies known to people with disabilities are missing out on a significant talent pool to fill their vacancies. In fact most employers do not employ PwD for several reasons: psychological reluctance, worries about social compatibility, lack of knowledge about the state-of-the-art of assistive technologies, labour laws and regulations, etc. Even large employers rather pay fines than employ PwD. But in some countries there are already public and private job exchanges, job portals etc. not only helping the PwD, but also employers to develop accessibility policies and implement them in a positive way for the enterprises. Standards and certification: (1) Participants at the ICCHP 2010 (International Conference on Computers Helping People with Special Needs) confirmed that existing training and formal studies are not sufficient – even if certified under given certification/attestation systems – with respect to the skills and qualifications necessary for becoming familiar with the issues involved in global content interoperability and particularly in eAccessibility&eInclusion. (2) Therefore, the “Recommendation on software and content development principles 2010” was formulated at ICCHP 2010 and thereafter endorsed by several technical committees in standardization: (see D2.1 Appendix 3)

“Software should be developed and data models for content prepared in compliance with the basic requirements for the development of fundamental methodology standards concerning semantic interoperability to facilitate the adaptation to different languages and cultures (localization) or new applications (re-purposing), the personalization for different individual preferences or needs, including those of persons with disabilities. These requirements should also be referenced in all pertinent standards.”

(3) The increasing high priority for standards developed for or having an impact on persons with disabilities (PwD) is one of the few far-reaching new tendencies in standardization and certification. Additional gaps: (1) Many people with disability can see the advantages of the Internet and e-mail, but are unable to use them, mainly due to a lack of accessible information and training within the area. (2) Assistive technology – and in particular AAC – related topics (including standardization) are not sufficiently taught/trained in the framework of ICT education. The same applies to LI-related topics, which are surprisingly interrelated. (see item 5.7) Recommendations Software should be developed and data models for content prepared in compliance with the basic requirements for the development of fundamental methodology standards concerning semantic interoperability to facilitate the adaptation to different languages and cultures (localization) or new applications (re-purposing), the personalization for different individual preferences or needs, including those of persons with disabilities. These requirements should also be referenced in all pertinent standards. (see Appendix 3: “Recommendation on software and content development principles 2010”)

CELAN D2.1_fv1.1

60

6 Evaluation of LI sector-internal services Sector-internal services are services rendered by and for LTT developers, LCR developers and LSP. They can be roughly subdivided into:

LTT development (and installation) and maintenance services,

Training services,

Advice and consultancy services. LTT developers:

May take contracts from LSP or LCR developers for development on demand or joint development of LTT, or for leasing software,

Provide advice to LSP or LCR developers,

Train LSP or LCR developers. LSP:

Are often doing software localization for LTT developers,

Provide advice to LTT or LCR developers,

Train other LSP. Nearly all of them are providing help to the users – however, increasingly trying to avoid direct user support and contact by implementing auto-repair functionalities, providing user help functions already in the system or offered through their websites, offering online training or webinars, etc. Complex CMS system development (incl. some learning CMS) are mostly done for one or a few clients and require intensive exchange of knowhow and extensive training for different levels of users at the customer’s side. Today in certain European countries, virtually all universities with translation studies train their students in the use of terminology management systems (TMS) – some even in CAT tools/systems – often in cooperation with the respective developers or their distributors. Many translators’ associations are offering training in the application of CAT systems – again more often than not in cooperation with the respective LTT developers or their distributors. Latest information on the development of CAT systems can be gained from the websites of translators associations as well as of associations for technical documentation or technical communication – not to mention a number of networks, such as forums or blogs of CAT-users, etc. Last but not least, the LTT developers increasingly see to it that their users find the information necessary for a good usage of the tools/systems, for avoiding user mistakes, performing maintenance to a certain degree themselves, etc. CMS and other complex LTT need intensive customer care and support, e.g. in the form of:

Online services,

Consulting Services: Receive professional support services for every stage of the implementation process,

Product Support: Support specialists take personal ownership of your software implementation,

Customer Service: Customer Care is ready to assist you with any request. The team is available by phone, e-mail, and online chat to deliver the highest level of customer care,

Community Extranet: Convenient access to software upgrades, discussions, product information and an online mechanism to post and review support tickets,

Forums, blogs, magazines, user groups, webinars etc. Development trends: It is a sign of maturing that sector-internal services of all sorts are emerging and thriving in the LI. At the same time – under cost pressure – the time, efforts and expenses spent on such services have to be reduced as much as possible. The latter is a factor to improve

CELAN D2.1_fv1.1

61

tools/systems in such a way that the need for after-care services is reduced – e.g. by providing an array of state-of-the-art customer support services within the LI. Given the fact that “consultancy services” are a means of acquisition, it is to be expected that LTT developers and LSP will further develop and refine their “consultancy services”. Uptake in industry&business: Various forms of cooperation between LTT developers on the one side, and LSP and LCR developers on the other side, have emerged and are further developing. Often this is done in cooperation with a partner from industry&business. Especially for large-scale enterprises which can afford to carry out thorough investigations and a systematic planning, this often turns out to be a win-win situation for all partners involved. SME more often than not cannot afford to spend time and costs for a thorough investigation and planning. In lack of vendor-neutral consultants LI-enterprises – as industry&business at large – take vendors’ consultancy services as inevitable. Many enterprises have become wary of this fact; others just tend to “trust”. Standards and certification: For sector-internal services the existing LI-pertinent standards as well as standards-based certification schemes apply. Additional gaps: The tendency of a comparative lack of mutual trust among enterprises in the LI has become evident among others through the interviews. Given the fact that the LI in all its facets has developed and diversified thus rapidly over the last years, there is a certain knowledge deficit among LI market players – especially on the side of small enterprises – concerning the state-of-the-art of the LI. Recommendations In the course of the maturing LI endeavours to improve the professionalization of the LI product and service providers in terms of management, marketing, collaborative business, training, … not least with a view to gaining more standing “against” general ICT need to be stepped up. Greater participation in pertinent standardization activities would certainly help to overcome deficits in the recognition of the LI as an industry sector of its own. There is a need for activities to train and qualify vendor independent LI consultants, which would largely improve “trust” in the field of the LI.

CELAN D2.1_fv1.1

62

7 Evaluation of LI-related standards, guidelines and certification relevant to industry&business The investigation of industry&business-relevant standards and guidelines in the fields of LI was subdivided into five aspects (see: CELAN D2.1 Annex 2):

General standardization framework relevant to CELAN,

Basic standards related to the ICT infrastructure with particular impact on the LI,

Specific standards pertaining to language technologies, resources, services and LI related competences and skills, (in this connection it was found out that astonishingly language requirements for LTT had a lot in common with accessibility requirements in general),

Business-relevant language policies and strategies concerning language, standardization, certification and accessibility,

Business-relevant language policies and strategies have been taken out of the investigation of industry&business-relevant standards and guidelines and are evaluated in chapter 8. Under the topic “General standardization framework relevant to CELAN” the first task of that investigation was to identify the official standardizing bodies and other SDO which are developing international standards relevant to the CELAN project. In addition to ISO, IEC, ITU, ETSI and CEN the following SDO were found qualifying as developers of standards pertinent to the LI:

World Wide Web Consortium (W3C) and in particular its Internet Engineering Task Force (IETF),

Institute of Electrical and Electronics Engineers (IEEE),

Organization for the Advancement of Structured Information Standards (OASIS),

ASTM International (formerly known as the American Society for Testing and Materials).

A list of other SDO (mostly industry consortia) was added to the investigation. (see: CELAN D2.1 Annex 2, Appendix 3) The second task was to analyze the kinds of documents falling under the term “standard”, which range from the standards of standardizing bodies (including basic standards, publicly available specifications, technical reports, codes of practice, etc.) via more or less normative guidelines, recommendations to best practices of all kinds of SDO, some of which are considered as quasi-standards. The results of standardizing work can be manifold:

Standards documents of different nature: from basic standards, publicly available specifications, technical reports, via codes of practice, to normative guidelines, recommendations, best practices;

Standards documents with different focus: methodology standards (probably already comprising more than 50% of all standards), terminology standards (or a part on terms and definitions in subject standards), product/process/service standards, interface standards, testing standards, standardized coding systems, data standards.

Many existing standards are a mixture of the above. (see: CELAN D2.1 Annex 2, item 1.2) The third task was to broadly categorize the identified standards and standardization activities into primary and secondary standards from the point of view of the LI. Secondary standards (such as general hardware and software standards) were not considered in this investigation. However, it is not always easy to clearly differentiate the categories. The following categories were considered as primary standards:

General standards for the ICT infrastructure (including LTT),

Basic and specific standards for the LI, About 200 standards (and standards documents) have been identified and evaluated.

CELAN D2.1_fv1.1

63

The investigation of the selection of standards is based on existing documents and information (including that of standardizing bodies and SDO from their websites) available on the Internet. The information was collected and classified

Following the objectives of the CELAN project and in particular the CELAN Typology of LI products and services developed for this purpose;

Bearing in mind the needs of small and medium-sized enterprises (SMEs) that want to globalize either now or in the near future. (see: CELAN D2.1 Annex 3)

The following documents set the background to identify relevant standards in the area and quote descriptions and features of individual standards after having determined the tasks described above: (1) Monica Monachini e.a. The Standards’ Landscape Towards an Interoperability

Framework. The FLaReNet proposal. Building on the CLARIN Standardisation Action Plan (July 2011) http://www.flarenet.eu/sites/default/files/FLaReNet_Standards_Landscape.pdf

(2) Kara Warburton. Standards and Guidelines for the Language Industry (2009) (see also: Kara Warburton. Standards and Guidelines for the Language Industry. Language Technologies Research Centre (March 2006/Revised Feb. 2007 http://www.crtl.ca/dl119&%3Bdisplay)

(3) Gerhard Budin. Identification of problems in the use of LR standards and of standardization needs (2009) (FLaReNet Deliverable D 4.1 16)

(4) Nuria Bel e.a. Standardization Action Plan for CLARIN, 2009 http://www.clarin.eu/node/2841 It was recognized that the standards referred to in these background documents are classified from different perspectives according to the objectives and purposes stated by different groups interested in their development, implementation and research or updating. Not every standards consortium or organization we know about could be included. However, in this identification process formal and de-facto standards, as well as guidelines and more or less mandatory recommendations were considered. Besides, best practices that have become a sort of model to be followed were taken into consideration, if they can help users to pave the way to success. Out of the general standards for the ICT infrastructure (including LTT) only the most important standards (and standardization activities) valid for the whole field of the ICT with a bearing on the LI will be mentioned in clause 7.1. The specific standards pertaining to language technologies, resources, services and LI related competences and skills will be further evaluated in clause 7.2.

It was recognized that several general and basic standardization activities having a bearing on the LI in various “vertical” fields of standardization – e.g. Health informatics (ISO/TC 215), Optics and photonics (ISO/TC 172), Assistive products for persons with disability (ISO/TC 173), Environmental management (ISO/TC 207) and others – are increasing. Often these activities do not sufficiently know or respect the rules and regulations at the level of the general standardization framework relevant to the LI.

Members of the World Trade Organization (WTO) are obliged by the Agreement on Technical Barriers to Trade (TBT, one of the legal texts of the WTO Agreement, sometimes referred to as the Standards Code) to ensure that technical regulations, voluntary standards and conformity assessment procedures do not create unnecessary obstacles to trade. Annex 3 of the TBT Agreement is the Code of Good Practice for the Preparation, Adoption and Application of Standards which is known as the WTO Code of Good Practice. Therefore, great efforts are undertaken to harmonize existing standards at national, regional and international levels so that they do not compete with or even contradict each other. In accepting the TBT Agreement, WTO Members agree to ensure that their central government standardizing bodies accept and comply with this Code of Good Practice and agree also to

CELAN D2.1_fv1.1

64

take reasonable measures to ensure that local government, non-governmental and regional standardizing bodies do the same. Thus the TBT Agreement had a big influence on the harmonization of existing

Regulations governing standardization in general,

Regulations governing certification in general. No wonder that the TBT has been recognized also in LI-related standardization activities, first of all in LTT development and language services related to globalization, internationalization, localization and translation (GILT), as a powerful driver for the harmonization of competing standards.

7.1 Basic standards related to the ICT infrastructure with particular impact on the LI As already mentioned in the introduction, there would be no language industry (LI), if there were no language technologies (LT) – and the development of technologies necessitates standards. As the LI has reached a certain level of maturity, the general standardization framework has become as complex as the LI at large. But there are attempts to coordinate LI-related standardization at a strategic level, at least among several standardizing bodies at international level. These standardizing bodies and their member organizations are competing with many other standards developing organizations (SDO), most of them industry consortia, also claiming international “authority”. Many standards are multipart documents covering a range of aspects. Some may have to be mentioned as general basic standards as well as LI-specific standards (possibly with certain parts only). Standardization in the information technology and all fields of their applications (eApplications) is not as coherent as would be desirable. On the other hand, the development of the ICT is fast and standardization activities can hardly catch up. From the point of view of language service providers (LSP) this multitude of standards – and what is worse: mutually incompatible and non-interoperable implementations of some standards – really pose big problems in their daily work. Smaller LSP can rarely cope with this situation; larger LSP need to invest heavily in LTT and ICT in general in order to overcome non-interoperability problems. Under the category “Basic standards related to the ICT infrastructure with particular impact on the LI” standards the following issues were identified:

Standards concerning character (glyph) coding, etc. (incl. international standards, European specific character requirements, European culturally specific ICT requirements),

Standards related to the coding of names of countries, languages and scripts,

Standards related to the application of coding (incl. keyboard standards, ordering rules, optical character recognition (OCR), speech-to-written and written-to-speech conversion),

Standards related to data modeling (incl. generic standards concerning data modeling, standards about semantic structuring),

Standardized protocols, formats and schemas,

Standards related to the quality of data and information,

Information and Documentation (I&D) standards,

Standards related to mobility and accessibility,

Certification based on standards.

Only after the pertinent legal and technical standards are widely accepted and also respected in the education and training of ICT, LTT and the respective technology application experts, certification will acquire its beneficial and even decisive role in separating the wheat from the chaff on the market.

CELAN D2.1_fv1.1

65

7.2 Specific standards pertaining to LTT, LCR, LS and LI-related competences and skills Under the category “Specific standards pertaining to language technologies, resources, services and LI related competences and skills” standards with the following issues were identified:

ISO/TC 46 standards about the conversion of written scripts which may apply to any of the above-mentioned categories

Standards concerning language technologies and language technology tools (LTT)

Standards concerning language and other content resources (LCR) (incl. structured content and unstructured content) covering

o Standards concerning LCR methodology and technology (incl. basic methodology standards, ISO/TC 37/SC 4 standards for syntactic and semantic annotation)

o Standards comprising standardized content (incl. collections of standardized structured content useful or even necessary to understand standardizing activities, standardized structured content called metadata, standardized structured content which attribute values to other items of structured content, LCR of standardized structured content per se)

Standards concerning the quality of language services and LSP

Standards concerning LI-related competences and skills In general, it can be stated that there are lots of individual initiatives which still require more maturity for their proper integration into the language industry sector such as OAXAL:

"OAXAL is certainly useful already from a conceptual and strategic point of view, as it invites decision makers in industry not to take a look at each individual standard in an isolated way but rather to look at the whole model from a workflow and integration perspective. Then one can decide which building blocks or components are actually relevant for a particular implementation and application scenario." (FLARENET STANDARDS LANDSCAPE, page 17, pdf)

Overcoming fragmentation as well as the capability for integration and interoperability has become the main objective in the standardization world – especially in the fields of the ICT. Key aspects such as promotion, implementation, awareness, and dissemination to the target communities may contribute to overcome the barriers and provide more dynamism to the language industry standardization processes. With respect to LTT requirements it was recognized that they have a lot in common with accessibility requirements. However, LT-related and assistive technologies related competences and skills are only marginally taught in the regular education/training of ICT experts. Therefore, there is need for such education/training as well as for pertinent eCertification schemes.

7.2.1 Specific standards pertaining to LT Standards, falling under the LT category, are mainly referring to LI activities and services supported by or geared towards the development of tools/systems (for details see: CELAN D2.1 Annex 2):

Multilingual Web sites,

Single source publishing, inheritance, topic-based authoring and content reuse,

Semantic markup language specifying the meaning of the elements in a document,

XML-based file format for spreadsheets, charts, presentations and word processing documents,

Authoring Techniques for XHTML & HTML internationalization: Specifying the language of content 1.0,

UTS #35. LDML – Locales Data Markup Language,

Cascading Style Sheets (CSS 2.1),

CELAN D2.1_fv1.1

66

XML Localization Interchange File Format 1.2, (XLIFF),

Mobile Web Best Practices 1.0. Basic Guidelines. W3C Recommendation 29 July 2008.

In addition to various standards on individual aspects, there are more and more efforts geared towards the holistic conception of the GILT (globalization, internationalization, localization and translation) tools/systems and services, such as OAXAL (Open Architecture for XML Authoring and Localization Reference Model).

In June 2012 ISO/TC 37 and ETSI ISG LIS agreed that the translation/localization community would benefit from the adoption of industry standards as International Standards and undertook measures to renew the former LISA arrangements and include a number of standardizing activities or standards in the program of work of ISO/TC 37.

7.2.2 Specific standards pertaining to LCR The exponentially growing volumes of structured and even more so unstructured content – more often than not in different language versions – pose a huge challenge to software and content developers as well as to LSP. Content resources

Are of many different kinds,

Are not confined to language resources,

May comprise or even consist of non-linguistic content (logos, formulas, icons, audiovisual content, etc.).

For the purpose of CELAN the LCR were broadly subdivided into LCR of unstructured content and LCR of structured content. The industry customers – especially SMEs – most probably are totally unaware of the quantitative and qualitative phenomena related to LCR. Concerning texts the Text Encoding Initiative (TEI) Consortium, which emerged out of series of large-scale international projects, collectively develops and maintains a standard for the representation of texts in digital form, namely the TEI Guidelines for Electronic Text Encoding and Interchange. The TEI Guidelines (published under an open-source license http://www.tei-c.org/Guidelines/) define and document a markup language for representing the structural, text rendition, and conceptual features of texts by specifying encoding methods for machine-readable texts. Since 1994, the TEI Guidelines have been widely used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation. Since around 2005 some of the TEI Guidelines are introduced into the standardization work programme of ISO/TC 37/SC 4 Language resource management. Items of structured content (at the level of lexical semantics) may comprise linguistic and non-linguistic representations of concepts, which can be designative (such as designations in terminology: comprising terms, symbols and appellations) or descriptive (such as various kinds of definitions, explanations or non-verbal representations), or hybrid. Standards or guidelines in this category may refer to the methods of language resource management or may contain standardized (or otherwise regulated) content items (such as standardized vocabularies). For the purpose of CELAN the LCR-related standards have been differentiated into:

Standards concerning LCR methodology,

LCR containing standardized content. (see: CELAN D2.1 Annex 2 part 3.2)

CELAN D2.1_fv1.1

67

7.2.3 Specific standards pertaining to language services The Directive 98/37/EC of the European Parliament and of the Council of 22 June 1998 on the approximation of the laws of the Member States relating to machinery (and its revision Machinery Directive 2006/42/EC) refer to technical documentation and user manuals as part of the product which makes an enterprise potentially liable for faults in the documentation (in original language and all localized/translated versions). This triggered a great number of new standards in technical documentation/communication, and later also service quality in the fields of translation and localization, such as EN 15038:2006 Translation services – Service Requirements. Localizers as LSP often deal with internationalization practice in content creation, the distribution of content to LSP and the onward distribution to individual translators. Therefore, the improved efficiency of this process requires technical integration through improved standards for the metadata that accompanies content through the resulting workflow. While the complex and fast changing nature of content itself presents a challenge, so does the fragmentation of standardization efforts in this area.

7.2.4 Specific standards on LI related competences and skills and certification schemes “eCertification” (or ICT certification) refers to – mostly certification based on provider-specific training – certification activities that have started already in the 1980s. Today, there are deficits in the respective training and certification schemes with respect to aspects related to LI and assistive technologies. On the other hand, LI-related training and certification more and more includes eCertification aspects.

In the course of the investigation on LI-related standards a number of standards and standards-like guidelines have been identified. (CELAN D2.1 Annex 2) Unfortunately, these standards seem to have relatively little influence on the proliferation of qualification and certification schemes on the market that in most cases are not standards-based. While it can be recognized that certification provides value in both the labour and product segments of the ICT market, the HARMONISE report [CEPIS 2007] describes over 600 often overlapping qualifications from over 60 providers as a "certification jungle", causing confusion to prospective users. The rapid growth in these industry qualifications has been driven by the market over recent years, indeed this market barely existed 15 years ago. They usually relate to a more specific set of skills, including for specific products, and are generally more practical in their approach than traditional academic qualifications. The Language Industry Certification System (LICS®), a joint venture between AS+Certification, a subsidiary of the Austrian Standards Institute, and the International Network for Terminology (TermNet) organizes independent, third-party certification services for the language industry. LICS® is also collaborating with the European Certification and Qualification Association (ECQA). The Globalization and Localization Association (GALA) and equivalent associations at national or regional level (such as the German Professional Association for Technical Communication and Information-Development – tekom, for the German speaking communities) develop more and more sophisticated training and certification schemes for various skills and competences in the fields of technical documentation /technical communication, which – if more than one language is required – inevitably overlaps with localization and translation. This can in fact be considered a success story as more than 7% of professionals of technical documentation have been certified in terms of professional competences/skills certification in Germany alone. (Baumert&Straub 2012) Given the fact that many of the aspects dealt with in the preceding chapters and in the following chapter are not sufficiently taught during the education of computer scientists and

CELAN D2.1_fv1.1

68

software engineers at higher-education institutions (HEI), extra-HEI training schemes as well as properly accredited certification of the thus acquired skills/competences are definitely a need.

7.3 LI-relevant certification In this clause certification schemes concerning LTT, LCR and LSP are dealt with. Certification of skills and competences is covered under 7.2.4. The more “normative” standards are, the easier it is to establish certification schemes based on standards. Thus, “certification”, defined as a procedure by which a first, second or third party gives written assurance that a product, process or service conforms to specified requirements, has become a determining issue in connection with quality-related standards. Certification involves a number of documented processes, at the end of which there is a documented assessment result.

“Accreditation” – different from certification (or registration) – is a third-party attestation (i.e. a formal recognition) by an accreditation body that a conformity assessment body (i.e. certification body) is competent to carry out ISO 9001:2008 or ISO 14001:2004 certification in specified business sectors. The European co-operation for Accreditation (EA) is the network of the national accreditation bodies in Europe. In simple terms, accreditation is like certification of the certification body. Certificates issued by accredited certification bodies are perceived on the market as having increased credibility. Given a situation, where the proliferation of ICT related certification schemes – in most cases self-certification of proprietary tools/systems or services – cause confusion, the LI’s needs and requirements are even less respected in vertical standardization activities carried out in the multitude of technical committees (TC), which are highly autonomous in their operation. Even if it is shortsighted in most cases, industry plays with this “fragmentation” aiming at short-term benefits. There are exceptions, such as the standards for video and multimedia, but the LI-related standards are certainly not in the limelight of industrial development – because they are too fundamental (i.e. not immediately visible and understandable) and often considered “philosophical”. In general standards-based certification in the fields of the LI may apply to

Systems (incl. software development and/or integration, adaptation, conversion etc.) with respect to technical interoperability and performance of LTT,

Data/content (incl. data models, data structures, exchange formats etc.) with respect to content interoperability and data quality of LCR,

(Services concerning the above – usually carried out specific to the respective developer or provider),

Language services with respect to the quality of the service carried out by LSP,

Skills and competences of experts,

Training of experts (e.g. for terminology management, legal translators and interpreters, product data manages, auditors, consultants, service providers, etc.) in order to acquire certain skills/competences,

The training service (possibly separately also for the training material) rendered. As it was recognized that LT-related and assistive technologies related competences and skills have much in common, but are only marginally taught in the regular education/training of ICT experts, there is need for such education/training as well as for pertinent eCertification schemes.

CELAN D2.1_fv1.1

69

8 Evaluation of LI-related policies/strategies Chapter 8 of this Deliverable aims at showing the interdependence of certain policies directly or indirectly influencing developments of more or less all fields of the language industry – up to the development of competences and skills for experts working in these fields. There are policies and strategies concerning language, standardization, certification and accessibility, which to some extent mutually refer to or sometimes depend on each other – although there may be gaps in this framework of mutual reference as shown in Figure 2 repeated from above)

FIG. 2: Relations between the various aspects of the LI

LI related policies and strategies can be formulated at:

International level

European level

National (and sometimes subnational) level

Individual organization level. In Canada for instance a number of roadmaps were prepared. Critical requirements for the language industry were identified to achieve a competitive advantage and maximize global market share. On the basis of these roadmaps a technology development strategy was defined to meet these requirements (including the respective standardization aspects). The capability of integration and interoperability of systems, content and services is one of the most prominent critical requirements. It also addresses the problem of fragmentation (including both LI and more specifically the LI-related standardization activities).

The Canadian example shows that LI and the respective standardization efforts are expected to result in concrete economic advantages (indirectly creating also new jobs). It would be advisable to mention the LI and LI-related standardization efforts more prominently in EU policy documents as well as in the policies at national level. Appropriate certification schemes would emerge on the basis of quality-oriented standards (developed through well-coordinated standardization efforts) and fill the deficits identified in the pertinent ICT education and training.

Although the LI in all its facets is booming, it is considered as “niche” market by mainstream ICT. However, in view of the needs of industry&business all software and content

CELAN D2.1_fv1.1

70

development should occur under the perspective of capability for integration and interoperability. The ICCHP Recommendation 2010 (see Appendix 3) states:

“Software should be developed and data models for content prepared in compliance with the above-mentioned requirements to facilitate the adaptation to different languages and cultures (localization) or new applications (re-purposing), the personalization for different individual preferences or needs, including those of persons with disabilities. These requirements should also be referenced in all pertinent standards.”

Whenever assistive technologies concern communication (inter-human and human-machine etc.) they share common grounds with the language technologies. This is because the LI and assistive technologies (in particular AAC) have several aspects in common. It also looks as if they suffer from similar problems:

Multi-category (no single category “language industry”),

Difficult to explain (strongly academic driven),

Fragmentation,

Under-organized (in terms of associations, partnering etc.). Under positive developments one can recognize that

The requirements for the capability of integration and interoperability are taken more and more serious,

o in the LI because of customer demands especially in large industry which triggers a concentration process among the big LTT developers and LSP,

o in the assistive technologies where the concept of accessibility is moving to “Access for All” from selected “Disabilities”;

The big ICT developers have or are in the process of implementing the basic ICT requirements for both, the LI and the assistive technologies;

Legal frameworks exist or are under discussion. Under negative aspects one can summarize that both LI and assistive technologies

Are not part of mainstream ICT (also in education and training),

Are spread over many technical committees at international level (and even more so in other SDO),

Are not known or not accepted or refused to be taken as essential (especially by the SME and micro-enterprises in the LI).

This chapter deals with the evaluation of

Standardization, certification and language policy development considered as a service to the public/society at large (8.1),

Business-relevant language policies and strategies (8.2).

8.1 Standardization and certification as a service to the public/society at large From national or societal perspective standardization, certification and language policy development can also be considered as a service to the public/society in general and to industry&business in particular:

LI related (particularly formal) standardization activities are a service to industry&business and to society at large and in particular to the language industry, as they are geared towards the most efficient use of LTT, LCR and language services by industry&business and individual users;

Similarly the development of (particularly standards-based) certification schemes for LTT, LCR and LSP as well as LI pertinent skills and competences is a service, as these standards support the pursuit of a high degree of quality, reducing costs and potential for conflict thus also helping to establish a high level of trust;

On the same grounds the formulation of an explicit language policy is a service, as it helps to establish a better understanding for national development strategies (e.g. if

CELAN D2.1_fv1.1

71

there is an official national language policy) or enterprise globalization/localization strategies (if there is an enterprise-specific general language policy/strategy).

Individual enterprise specific language policy/strategy will be dealt with in part 8.2 of this Deliverable. The rapidly growing market of the language industries has led to a differentiation of demands on the customer side, and language industry on the other side concerning LI products and services. This and the general demand for system integration and interoperability have triggered the need for language policies, standardization, as well as for quality assessment systems (see: CELAN D2.1 Annex 2). This development has also had – or should have – a great impact on the competences and skills taught at higher educational institutions (HEI, including the respective academic certification systems) as well as on the language and other content resources, and, last but not least, also on the language teaching and training services.

8.1.1 Standardization (as a service to the public/society) Technical standardization – whether carried out in the framework of formal standardization organizations (such as ISO and IEC and their national member bodies) or by other SDOs – is also considered as a service to industry and society at large. The standardization of a “product, process or service” in a broad sense covers any material, component, equipment, system, interface, protocol, procedure, function, method or activity. Thus, in the language industries standardization can apply to all kinds of aspects of suitability of systems and tools, the methods and quality of services, the quality and interoperability of language and other content resources, as well as to the assessment and certification schemes based on standards. The general framework for European standardization policy is provided by the following basic documents:

Directive 98/34EC of the European Parliament and of the Council laying down a procedure for the provision in the field of technical standards and regulations,

Decision No 1673/2006/EC of the European Parliament and of the Council of 24 October 2006 on the financing of European standardization,

General guidelines for the co-operation with the European Standards Organizations,

which can be accessed under: http://ec.europa.eu/enterprise/policies/european-standards/documents/general-framework/index_en.htm

The general guidelines for the co-operation between CEN, CENELEC and ETSI and the European Commission and the European Free Trade Association, adopted and signed on 28 March 2003, are a purely political document. Therein, all the partners confirm their common understanding about the role of European standardization, about its principles such as openness, transparency and impartiality and about their willingness to cooperate, on the basis of these principles, in support of European policies. The role of “language” (in connection with LI products and services) as a non-technical barrier is not addressed in these policy documents. The 2010-2013 Action Plan for European Standardisation (see: http://ec.europa.eu/enterprise/policies/european-standards/standardisation-

policy/implementation-action-plan/index_en.htm) which defines the most important standardization initiatives and actions in the European Commission services in 2010-2013, only refers to language in respect to the translation of standards. The same applies to the Proposal for a Regulation of the European Parliament and of the Council on European Standardisation and amending Council Directives 89/686/EEC and 93/15/EEC and Directives 94/9/EC, 94/25/EC, 95/16/EC, 97/23/EC, 98/34/EC, 2004/22/EC, 2007/23/EC, 2009/105/EC and 2009/23/EC of the European Parliament and of the Council {SEC(2011) 671 final}

CELAN D2.1_fv1.1

72

{SEC(2011) 672 final}. The latter mentions “translation, where required, of European standards or European standardisation deliverables used in support of Union policies and legislation into the official Union languages other than the working languages of the European standardization bodies or, in duly justified cases into languages other than the official Union languages”. There is no mention of language in connection with LI products and services in the “Commission Staff Working Document. Annual European standardisation work programme 2012” {SWD(2012) 42 final}. The 2006 ICT Standardisation Work Programme complemented the Action Plan for European Standardization of 2005 by dealing more in detail with ICT matters. It was followed by a number of annual ICT standardization work programmes. The consultation activities in conjunction with the WHITE PAPER Modernising ICT Standardisation in the EU – The Way Forward {COM (2009) 324 final}, led to the recent ICT Standardization Work Programme 2010-2013 (see: http://ec.europa.eu/enterprise/sectors/ict/standards/work-

programme/index_en.htm). No mention of language in connection with LI products and services can be found in these documents. This means that LI products and services have not yet been recognized as a major subject of standardization policy in the EU. This contrasts to the fragmentation especially of the language technology systems/tools (LTT) fields and their centrifugal development tendencies against the need for integration and interoperability on the users’ side. Actually there is definitely a need for

More standardization activities,

Harmonization of existing standards,

Activities to enforce standards (e.g. among others by means of certification).

(see: chapter 7 for details) Furthermore, there is also quite some reluctance to apply standards in the ICT industry. At the seminar "Quality of Semantic Standards" held at the University of Twente (The Netherlands) on 2012-04-05 revealing statements come from ICT representatives:

Voicing opposition against FOSS (free and open source software);

Requesting more involvement of the ICT in the standardization activities;

Software vendors have every right to block standardization as long as there is no viable business model included;

Standards system is producing standards for its own sake (which could also be analogously said of many players in the ICT).

ICT enterprises and experts (including the LI) do not always care much about existing standards, guidelines and policies etc. referring to language. This is in line with a general attitude of decision makers in industry&business or in the public sector (i.e. their customers) considering language as “trivial”. These decision makers are even less aware of standards and policy guidelines etc. concerning language. Nevertheless, in large industry corporations standardization has become a strategic management issue. A network of researchers and teachers of standardization has established a network, which – in close cooperation with ISO – has launched a web-based series of lectures on “Standardisation in Companies and Markets” using e-learning techniques available worldwide on the Internet. (For more information see: www.pro-norm.de) The lecture contents on standardisation in English targets:

Universities and colleges of higher education

Standards bodies

Government and business organisations

Training institutes (vocational academies)

Global corporations

Individuals.

CELAN D2.1_fv1.1

73

Obviously, it will need soft power (in terms of convincing, standardization activities etc. and hard power (in terms of legislation) to implement the requirements of the LI and accessibility to be implemented in software and content development in such a way that the fragmentation is overcome, software and content developers are motivated to follow reliable standards, and SMEs both in the LI and industry&business at large can afford the LI products and services.

8.1.2 Certification (as a service to the public/society) As quality is an important cost, image and market success factor, certification is defined as a procedure by which a first, second or third party gives written assurance that a process, product, service, skill or competence conforms to specified requirements. If these requirements are specified in a standard pertaining to the language industries (LI), the certification process would assess the standards compliance of the respective LTT, language service or LSP, LCR or an individual’s skill/competence (up to the respective training and training material). Even the successful implementation of a language policy could be certified. Successful implementations of pertinent standards-based certification schemes in the LI are for instance LICS, the Language Industry Certification System (see: www.lics-certification.org) and the European Certification and Qualification Association (ECQA) (see: www.ecqa.org). Based on Council Resolution 85/C 136/01 of 7 May 1985 on a new approach to technical harmonization and standards a regulatory framework gradually emerged concerning

The recognition of organizations competent in industrial standardization and certification;

The harmonization of standards for the sake of a free flow of goods and services;

The exchange of information on standards as well as the coordination of standardizing activities.

Explicitly “legislative harmonization” is limited to essential safety requirements (or other requirements in the general interest) with the products put on the market which must conform and can therefore enjoy free movement throughout the European Union”. This leaves standardization and certification activities to the recognized organizations. However, both

The Machinery Directive 98/37/EC (replaced by the “Revised Machinery Directive 2006/42/EC” which does not introduce any radical changes compared with the old but aims at consolidating the achievements of the Machinery Directive while improving its application) and

Council Directive 85/374/EEC of 25 July 1985 on the approximation of the laws, regulations and administrative provisions of the Member States concerning liability for defective products (amended in 1999)

recognized that the documentation of the product (or service) is integral part of the product. This made manufacturers (and distributors) aware of the possible risk of liability if technical documentation contains faults. A product is deemed safe if it complies with the pertinent European standard established according to the procedures of the Directives. In the absence of such regulations or standards, the product's compliance is determined according to:

National standards

Codes of good practice as regards health and safety

Current state of the art

Consumers' safety expectations.

This affects the documentation of the product (or service) as integral part of the product and thus, also language services and, ultimately, also the tools and content resources used by the language service providers (LSP). In connection with the LI, standards compliance may refer to

CELAN D2.1_fv1.1

74

Language technology tools/systems (LTT),

Language and other content resources (LCR),

Language services and their provision by language service providers (LSP),

eCertification of LI experts,

Training of LI experts and the respective training material.

8.2 Language policy (as a service to the public/society) A language policy can refer to any of the aspects mentioned above. This clause aims at identifying policies directly or indirectly influencing developments of more or less all fields of the language industry – up to the development of competences and skills for experts working in these fields. In the beginning, language policies usually were rather defensive in nature. At a strategic level today, the positive potential of systematic language policies/strategies – e.g. in support of information, knowledge or innovation policies, as well as of educational strategies, etc. – has become a bit more widely recognized. With this greater awareness, countries and language communities are increasingly feeling the need to formulate systematic language policies/strategies (comprising also terminology planning strategies) in order to improve their competitiveness. This trend coincides with the requirement that today’s accelerated globalization needs to be complemented by accelerated localization, i.e. adaptation of products and services to comply with local cultural and linguistic norms. An ever-increasing body of empirical evidence indicates that there is a critical relationship between individuals’ opportunity to use their mother tongue in a full range of cultural, scientific and commercial areas, and the socio-economic well-being of their respective language communities. Estimates of the number of languages existing today vary between 6,000 and 7,000 (not counting a far larger number of dialects and local variants), of which about 50% are belonging to the “endangered languages”. No wonder that language issues at international level are mostly mentioned in connection with human rights, such as in

The Universal Declaration of Human Rights (1948),

The International Covenant on Civil and Political Rights (1966),

The International Covenant on Economic, Social and Cultural Rights (1966),

The Declaration on the Rights of Persons belonging to National, Ethnic, Religious and Linguistic Minorities (resolution 47/135 of 18 December 1992) of the United Nations (UN).

The European Charter for Regional or Minority Languages (1992) stresses the value of multiculturalism and multilingualism, and recognizes that the protection and encouragement of minority languages is quite compatible with maintaining the status of official languages. (see: http://conventions.coe.int/treaty/en/Treaties/Html/148.htm) In countries or regions where two or more language communities co-exist and interact, a language policy should reflect this situation in order to find solutions to controversial issues. In Europe multilingualism is a major cross-cutting theme encompassing social, cultural, economic and educational spheres. Linguistic diversity within Europe is considered an added value for the development of economic and cultural relations between the European Union and the rest of the world. Therefore, The European Commission has joined forces with Member States’ governments, the European Parliament, the European regions and social partners with the aim to:

Give citizens the chance of learning two languages in addition to their mother tongue from an early age;

Create friendlier societies, where different communities and individuals engage in dialogue with one another;

Strengthen the role of languages in improving employability and competitiveness.

CELAN D2.1_fv1.1

75

In order to achieve its medium to long-term objectives the Commission promotes multilingualism throughout the whole range of its policies. The EU language policy and support measures is summarized in An Inventory of Community actions in the field of multilingualism – 2011 update to be found on the EU Commission’s portal. (http://ec.europa.eu/languages/pdf/inventory_en.pdf) There are many efforts in Europe to assist SMEs to overcome language barriers in business. In http://ec.europa.eu/languages/languages-mean-business/useful-links/index_en.htm information on languages, business, jobs and related topics is provided up to a “Test your company's level”. (see: http://ec.europa.eu/languages/languages-mean-business/test-your-level/index_en.htm) In 2005, UNESCO commissioned Infoterm to formulate Guidelines for Terminology Policies Formulating and implementing terminology policy in language communities (CI-2005/WS/4), which largely draw on the experiences with language policies. These Guidelines were taken up as the basis for the international standard ISO 29383:2010 Terminology policies – Development and implementation.

8.3 Accessibility policies and strategies (as a service to the public/society) Language- and LI-related standardization and certification have much in common with standardization related to accessibility. That is, why the ICCHP Recommendation (MoU/MG 2012) states that “Software should be developed and data models for content prepared /inn compliance with requirements/ to facilitate the adaptation to different languages and cultures (localization) or new applications (re-purposing), the personalization for different individual preferences or needs, including those of persons with disabilities.”

8.3.1 Accessibility policies at international and European level

Accessibility of ICT systems/tools and content resources is of growing concern in many countries of the world, in particular in Europe. The “United Nations Convention on the Rights of Persons with Disabilities” (UNCRPD), which was adopted in December 2006, is the basic international framework addressing the rights of persons with disabilities. It has been signed in June 2012 by 153 countries and ratified into national law by 115. The UNCRPD addresses the rights of persons with disabilities in general and contains, as article 9, a section about accessibility.

In the European Union PwD represent 80 million persons (more than 15% of the population). These and other figures indicate that the rate of PwD might well double, too, over the next decades. At EU level there are among others the following policy documents:

M376 (2005). Standardization Mandate to CEN, CENELC and ETSI in support of European Accessibility requirements for public procurement of products and services in the ICT domain,

M420 (2007). Standardization mandate to CEN, CENELEC and ETSI in support of European Accessibility requirements for public procurement in the built environment,

M473 (2010). Standardization mandate to CEN, CENELEC and ETSI to include “Design for All” in relevant standardization initiatives,

MeAc (Ed.) (2007). MeAC – Measuring Progress of eAccessibility in Europe. Assessment of the Status of eAccessibility in Europe. Bonn

The Proposal for a Directive of the European Parliament and of the Council on public procurement [COM (2011) 896 final 2011/0438 (COD)] cites the WHO/WB Report [WHO/WB 2011, p. XI] under Article 40 Technical specifications:

“to create enabling environments, develop rehabilitation and support services, ensure adequate social protection, create inclusive policies and programmes, and enforce new and existing standards and legislation, to the benefit of people with disabilities and the wider community.”

CELAN D2.1_fv1.1

76

According to the 2010-2013 ICT Standardisation Work Programme accessibility is at the core of the European Disability Action Plan and the UN Convention on the Rights of Persons with Disabilities (UNCRPD, article 3). Furthermore, article 9 of the UN Convention provides that State Parties shall take appropriate measures to develop, promulgate and monitor the implementation of minimum standards and guidelines for the accessibility of facilities and services open or provided to the public. Among the areas to be covered are information and communications, including information and communication technologies and systems. To give an example: substantial restrictions for PwD to use services with electronic communication must be overcome in relation to chat and mobile telephony. “A special need for services within electronic communications and the postal sector has been identified for several groups within the areas of mobility and flexibility, positioning, the possibility of saving and remembering information, and the need for support with pictures”. Therefore, the EU Commission supports several large-scale R&D projects about different aspects of eAccessibility/eInclusion and Design for All (DfA). Sooner or later all signatories of the UNCRPD will have to take eAccessibility/eInclusion into account not only in legislation, but also in enforcing the legal provisions.

8.3.2 Accessibility-related standardization activities at European level

At European level there are the following standardization initiatives: ETSI TR 102612: 2009 Human Factors (HF) – European accessibility requirements for public procurement of product and services in the ICT domain. Sophia Antipolis: ETSI. ETSI EN 301 549 V0.0.34 (Draft 2011-11) Human Factors (HF) – public procurement of ICT products and services in Europe (under development). The content of the latter so far is largely based on ETSI TR 102612: 2009. In the “eAccess+” network (http://www.eaccessplus.eu/), it is intended to collect and provide guidance to the existing state of the art in eAccessibility and pertinent standards. The Network’s tool is the eAccessibility HUB (http://hub.eaccessplus.eu), a wiki which describes resources available, brings them into a bigger context and links to the original – not just another collection, but a bridge to the originals. Concerning accessibility standards for the development of ICT systems/tools there are four interdependent aspects to be mentioned here:

Standards for programming,

Standards for the design of user interfaces,

Standards concerning accessible content,

Training of future system/tools developers and ICT service providers. Standards for programming One of the main developers of standards related to the quality, development project management, evaluation, testing etc. of ICT systems/tools is ISO/IEC-JTC 1/SC 7. Up to 10% of their around 150 standards published or under development refer to these meta-aspects. They should mention somewhere the needs for and requirements of multilinguality, multimodality and accessibility – but they rarely do. They do not even cross-reference to accessibility standards. Some of the standards that can be mentioned here are:

ISO/IEC/IEEE 16326:2009 Systems and software engineering – Life cycle processes – Project management

ISO/IEC 15288:2008 Systems and software engineering – System life cycle processes

ISO/IEC 24748 (multipart) Systems and software engineering – Life cycle management

ISO/IEC 25000 (series) Software engineering – Software product Quality Requirements and Evaluation (SQuaRE)

CELAN D2.1_fv1.1

77

ISO/IEC 25000:2005 Software Engineering – Software product Quality Requirements and Evaluation (SQuaRE) – Guide to SQuaRE

Standards for the design of user interfaces (usability standards) Some of these standards also refer to user interface and usability. The Joint Technical Committee (JTC) Special Working Group “Accessibility” (JTC 1/SWG 1 or SWG-A) tries to identify the accessibility related standardization needs and coordinate the respective activities. International standards on the design of user interfaces are developed by several technical committees, such as ISO/TC 159/SC 4, and within the framework of the W3C such as the Guidelines of the Web Accessibility Initiative (W3C/WAI):

WCAG – Web Content Accessibility Guidelines 2.0

ATAG – Authoring Tool Accessibility Guidelines

UAAG – User Agent Accessibility Guidelines

WAI-ARIA – Accessible Rich Internet Applications

EARL - the Evaluation And Report Language

Within the framework of ISO and IEC the output of standards related to usability in connection with eAccessibility/eInclusion is very scattered:

ISO 9241-171:2008 Ergonomics of human-system interaction – Part 171: Guidance on software accessibility,

ISO 9241-20:2008 Accessibility guidelines for information/communication technology (ICT) equipment and services,

ISO/IEC 13066-1:2011 Information Technology – Interoperability with Assistive Technology (AT) Part 1: Requirements and recommendations for interoperability,

ISO/IEC 24756:2009 Information technology – Framework for specifying a common access profile (CAP) of needs and capabilities of users, systems, and their environments,

ISO/IEC TR 29138 (multipart) Information technology – Accessibility considerations for people with disabilities.

In order to obtain an overview, the following document is particularly useful: ISO/IEC TR 29138-2: Information technology – Accessibility considerations for people with disabilities – Part 2: 2009 Standards inventory [currently organized in 6 categories subdivided into 102 Accessibility focused and 191 Related standards]. Standards concerning accessible content Probably the standards nearest to the existing needs are developed by ISO/IEC/JTC 1/SC 36 Information technology for learning, education and training. Three of them are under development:

ISO/IEC FCD 20016-1 Information technology for learning, education and training – Language accessibility and human interface equivalencies (HIEs) in e-learning applications – Part 1: Framework and reference model for semantic interoperability,

ISO/IEC TR 24725 (multipart) Information technology for learning, education and training (ITLET),

ISO/IEC NP 29188 Information technology – Individualized adaptability and accessibility in e-learning, education and training.

The following standards can also be mentioned here:

ISO/IEC 24751 (multipart) Information technology – Individualized adaptability and accessibility in e-learning, education and training

ISO/IEC 19766: 2007 Guidelines for the design of icons and symbols to be accessible to all users – Including the elderly and persons with disabilities.

CELAN D2.1_fv1.1

78

Training of future system/tools developers and ICT service providers The Recommendation on software and content development principles 2010 (see Appendix 5) was formulated in a special workshop at 12th International Conference on Computers Helping People with special needs (ICCHP 2010, Vienna, Austria, July 2010) and thereafter endorsed by several technical committees in the field of standardization and finally by the Management Group (MoU/MG) of the ITU-ISO-IEC-UN/ECE Memorandum of Understanding concerning eBusiness. ICCHP 2010 participants confirmed that existing training and formal studies are not sufficient – even if certified under given certification/attestation systems – with respect to the skills and qualifications necessary for becoming familiar with the issues involved in global content interoperability and particularly in eAccessibility&eInclusion. This means that there is a need for the training of skills/competences in the field of assistive technologies (incl. AAC), for pertinent standards and certification schemes based on these standards.

8.4 Business-relevant language policies and strategies Enterprises in Europe using a holistic approach to formulate and implement a language policy proved to be significantly more successful than others. Such a holistic approach could cover principles and rules concerning among others:

Language policies of the target markets to be taken into account,

The extent to which language technologies are useful/necessary,

The kinds of language services to be used in which way,

The kinds of language and other content resources to be used,

Whether – and if so, which – standards need to be applied,

Whether certification is considered essential/necessary,

Whether– and if so which – language skills/competences of staff are important to the enterprise, as well as how they can be secured, e.g. by language teaching and training programmes,

Whether – and if so, this – needs of communication with persons with disabilities (PwD) need to be taken into account.

However, the market situation with respect to LI products and services for industry&business that want to use them is very complex:

The size of an enterprise using LTT and LCR (and maybe outsourcing to LSP) has a significant impact on the costs available to be spent on LTT and language related activities;

The nature of the enterprise: belonging to the producing, trading or service industries strongly influences the approach to use LTT, LCR and LSP;

Private industry is highly fragmented with lots of smaller or larger markets with special requirements concerning the use of LTT, LCR and LSP;

The sector of the LTT, LCR and LSP again in itself is complex;

Given the day-to-day needs enterprises – in particular SME – might tend to look for quick solutions rather than for the optimal solutions based on a systematic analysis. Thus, they often miss economic potentials and opportunities.

Enterprise language policies/strategies can largely be subdivided into:

Those for the whole enterprise,

Those for enterprises belonging to the LI (i.e. LSP as well as LCR and LTT development).

8.4.1 Overall enterprise language policies/strategies Most of the activities (and literature about such activities) concerning language strategies and policies are stemming from LSP as well as LCR and LTT developers – in particular from

CELAN D2.1_fv1.1

79

the point of view of translation and localization service providers (and other translation and localization technology users). Most studies or other papers on overall enterprise language policies/strategies view “language” in connection with:

Corporate language and corporate identity (and the respective policy),

Communication management (and the respective policy),

Information & knowledge management (and the respective policy),

Research and development management (and the respective policy),

Human resource management (and the respective policy),

Globalization strategies (and the respective policy),

Standardization and certification (and the respective policy), etc. Some documents focus on specific aspects of a global language policy/strategy, such as on

Globalization&localization: Globalization Industry Primer (Lommel/Ray, 2007),

Terminology: Knowledge, Brands and Customer Loyalty – Terminology as a Critical Success Factor (RaDT, 2010),

Terminology: Successful terminology management in companies. Practical tips and guidelines: Basic principles, implementation, cost-benefit analysis and systems overview (Schmitz/Straub, 2010),

Terminology policy: Guidelines for Terminology Policies of UNESCO.

Interpretation and translation: Language services toolkit (Australia, s.a. – geared towards public agencies in the health sector),

Language learning: Business language for a global economy (APEC, 2010), and even the international standard ISO 29383:2010 “Terminology policies – Development and implementation” which covers an important aspect of language policy geared towards language communities as a whole or individual organizations Increasingly it is recognized that a terminology policy or strategy – whether at national or at enterprise level – can be highly beneficial for enterprises and other organizations. Therefore, in 2010 the International Standard ISO 29838:2010 “Terminology policies — Development and implementation / Politiques terminologiques — Élaboration et mise en oeuvre" has been prepared by ISO/TC 37 "Terminology and other language and content resources. It is based on the Guidelines for Terminology Policies Formulating and implementing terminology policy in language communities (CI-2005/WS/4), for which in 2005 Infoterm was commissioned by UNESCO. While the UNESCO Guidelines largely draw on the experiences with language policies, ISO 29383:2010 places terminology policies in the broader context of institutional strategic management.

There exists a broad literature and all kinds of information on language policies/strategies for enterprise in the LSP as well as LCR and LTT development sector. Pertinent scientific or industry associations are making efforts to analyse how LSP services can be made more accessible, affordable and interoperable. Obviously the costs for translation and localization as well as LTT development/adaptation/lease are a heavy economic burden for many enterprises making use of them. The EU Commission has recognized that many R&D barriers concerning the development of LTT and LCR as well as related services relate not necessarily to genuine technology factors, but to deficiencies concerning:

Business models for certain LTT and LCR developers as well as LSP,

Market distortions and high costs of LTT, LCR and language services,

Integration of LTT, LCR and language services,

The lack or under-development of pertinent standards,

The lack of language competences in the ICT sector, etc.

CELAN D2.1_fv1.1

80

There are national language policies or strategies which also comprise industry aspects and economic factors. The EU Commission over the last years is particularly keen on showing the relationship between language and business performance in industry. (see: Internationalization of European SMEs. Final Report, Brussels: European Union, 2010)

8.4.2 Language policies/strategies for enterprises belonging to the LI Sooner or later LI enterprises have to make decisions concerning the languages they are going to deal with. LT developers may decide to focus on LTT for certain languages/scripts. If localization (and the respective documentation) beyond this scope becomes necessary, this

Could be entrusted or outsourced to the respective customer or contracted to a specialized LT developer or LSP, or

May necessitate hiring the respective experts. Even if they offer services in all languages, LSP usually carry out contracts in-house only in a limited number of languages. Therefore, they have to develop an outsourcing policy. This situation may dramatically change with the further emergence of translation marketplaces.

CELAN D2.1_fv1.1

81

9 Indication of the current uptake of LI offers in the business community The “Language industry mind map” (see: Appendix 2) and the “Format for preliminary consultations with LI experts” (see: D2.2 Appendix 1) were tested with more than 40 experts from LI-enterprises and academia. The resulting CELAN Typology of LI products and services (D2.1 Annex 3) (after modifications stemming from enterprises’ feedback) was found to be a good means to guide further investigations as well as to prepare the interviews with enterprises. In the course of the investigations within the framework of WP2.1 related reports and studies, such as PIMLICO (Hagen, 2011) and ELAN (Hagen, 2006) were consulted. However, the CELAN project started off from a much broader perspective: including also the whole range of the language industry. According to the project proposal WP2.1 “will seek to indicate the current uptake of the respective LI offers within the business community”. The very fact that the LI in all its facets is thriving, proves that LI-enterprises meet customers’ language-related needs and demands. However, D2.1 can show that the resources offered by the LI are at different levels of complexity and sophistication and in fact address an array of customer needs/demands of different levels of complexity and sophistication depending on – among others – the size of the enterprise, its degree of specialization, the industry sector it belongs to and the customer demands of the target markets it aims at. The above-mentioned language-related needs can be broadly categorized into:

Inter-personal communication in industry&business environments,

Technical documentation/communication in written form (including non-linguistic representations),

Written or multimedia material for the promotion, advertising and publicizing the enterprise and its products or services,

Information gathering (incl. also business intelligence). Under each category an array of individual needs can be found. In terms of number and degree of complexity, the needs may grow in line with the growth of the company, the respective industry, the markets, the degree of market integration and other reasons. Growing language-related needs trigger higher demands for LI products (such as LTT and LCR) and services. As was recognized soon from the beginning, large-scale enterprises (including multinationals) commercially active at a regional or world-wide scale have the largest – and fastest growing – needs. Therefore, their demands towards the LI are usually higher in terms of quantity and quality as well as of complexity and sophistication. These large-scale enterprises also can afford top-notch LI solutions either by developing them or by outsourcing (or a combination of both). Therefore, the uptake of LI resources is highest in this sphere. However, not even at large-scale enterprise level all kinds of LTT, LCR and LS are necessary in each enterprise. The smaller the enterprise the more likely it is that the number of perceived needs as well as their degree of complexity and sophistication decreases. However, at the same time the potential number of combinations of needs (with different degrees of complexity and sophistication) increases all across industry&business. In view of the increasing number of globalizing enterprises this fact offers one of the explanations for the exponential growth of the LI. Given the fact that the Internet is driving globalization, it is evident that the uptake of LI-related products and services is among the highest in the ICT at large. Even within the LI sector there are various language-related needs and demands, which are met by a growing number of sector-internal services as pointed out in this deliverable.

CELAN D2.1_fv1.1

82

10 Conclusions Task 2.1 was carried out in close coordination with Task 2.2 from the very beginning. Thus, desk research supported by preliminary consultations for the interviews resulted in a comprehensive overview (see: CELAN D2.1 Annex 1) of

Language technology (LT) and language technology tools/systems (LTT),

Language and other content resources (LCR),

Linguistic/language services (LS) and language service providers (LSP), incl. also language training and assessment not covered by formal education as well as consultancy services,

Guidelines and standards and standards-based certification schemes (see: D2.1 Annex 2),

Business-relevant language policies/strategies. Since the number and variation of LTT and LCR as well as LSP is increasing virtually by the day, the “CELAN Typology of Language Industry Products and Services” was developed as a comprehensive meta-catalogue providing guidance through the jungle of fast increasing LI products and services. Ample indications for the current uptake of respective LI resources within the industry&business community were collected bearing in mind the increasing language- and LI-related needs (and demands stemming from the needs) of industry&business triggered by accelerated globalization and the development of the Internet (and other global networks) as technological driving forces of globalization. The results of the investigations in WP2 show that the LI has become quite complex over the last ten years. There are numerous and many different kinds of LI products (comprising LTT and LCR) as well as services (LS) and LSP on the market. Therefore, the demand for qualified language experts has definitely risen, on the one hand. However, the meaning of “qualification” in this connection has changed, on the other hand. Today it refers to the right qualification mix for a given purpose. For instance, a linguistically speaking low-qualified ICT expert might have the right language competences/skills to take care of the internationalization and localization aspects during LT development. Qualified language experts will more than ever need a high qualification in using LTT – not to mention new kinds of handling content. It became evident that the success of an SME on global markets was sometimes attributed to the use of LT (or LSP) rather than to language competences/skills. In other cases, it was directly attributed to the application of ICT, although LT deserved the merits. This shows that both language experts and LI in general have an image problem.

CELAN D2.1_fv1.1

83

References (documents):

Andreas Baumert and Daniela Straub. Über sieben Prozent der Mitglieder zertifiziert. Leistung soll sich lohnen. In: technische kommunikation, 34(2012)6, pp. 11-14

APEC. Business Language for a Global Economy. Asia-Pacific Economic Cooperation: 2010 see: http://hrd.apec.org/index.php/Business_Language_for_a_Global_Economy

Nuria Bel e.a. Standardization Action Plan for CLARIN, 2009. Retrieved 2011-09-10 from: http://www.clarin.eu/node/2841

Lachlan Blackhall. Educational Content Authoring Tools. A report written for the College of Engineering and Computer Science, The Australian National University. 2011 http://17dynamics.files.wordpress.com/2011/03/educational_content_authoring_tools_report_distribution.pdf

The Budapest Observatory. Publishing Translations in Europe – Trends 1990-2005. In: Making Literature Travel report series. Aberystwyth University (Wales): Literature Across Frontiers, 2010 (Based on analysis of the Index Translationum database)

Gerhard Budin. Identification of problems in the use of LR standards and of standardization needs (2009) (FLaReNet Deliverable D 4.1 16 Oct)

CEPIS (ed.). Survey of Certification Schemes for ICT Professionals across Europe towards Harmonization (HARMONISE). http://www.cepis-harmonise.org, September 2007, Project of CEPIS Council of European Professional Informatics Societies, final report.

Council for German Language Terminology (RaDT). Terminology: Knowledge, Brands and Customer Loyalty – Terminology as a Critical Success Factor. 2010 (RaDT brochure)

Deloitte. Enterprise Content Management (ECM). Rapid evolution. Deloitte Development LLC, 2012 http://www.deloitte.com/view/en_US/us/Services/consulting/technology-consulting/74d439dfb863c210VgnVCM1000001a56f00aRCRD.htm

European Commission. Communication from the Commission: E-Learning – Designing tomorrow’s education. Brussels: European Commission, 2000. COM (2000) 318 final

European Commission. Internationalization of European SMEs. Final Report. Brussels: European Commission, 2010

European Charter for Regional or Minority Languages OFE COUNCIL. – 1992. Strasbourg: Council of Europe http://conventions.coe.int/treaty/en/Treaties/Html/148.htm

European Commission. An Inventory of Community actions in the field of multilingualism (Commission staff working paper SEC 2011 926 final). Luxembourg: Publications Office of the European Union, 2/2011 (http://ec.europa.eu/languages/pdf/inventory_en.pdf)

European Commission. Studies on translation and multilingualism: Mapping Best Multilingual Business Practices in the EU. Brussels: Directorate-General for Translation, 2011 (see: http://bookshop.europa.eu/en/mapping-best-multilingual-business-practices-in-the-eu-pbHC3111018/)

European Commission. Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions. European Disability Strategy 2010-2020: A Renewed Commitment to a Barrier-Free Europe (COM 2010 636 final)

Holger Fock, Martin de Haan, Alena Lhotová. Comparative income of literary translators in Europe. European Council of Associations of Literary Translators (CEATL). CEATL, Brussels 2007/2008

Goulburn Valley Primary Care Partnership. Language services toolkit. Australia, s.a. GSVadvisors. Education Sector Factbook 2012. See: http://gsvadvisors.com/wordpress/wp-

content/uploads/2012/04/GSV-EDU-Factbook-Apr-13-2012.pdf Stephen Hagen. Report on Language Management Strategies and Best Practice in

European SMEs: The PIMLICO Project. Tipik Communication Agency (BE) and Semantica Ltd. (UK), April 2011

Stephen Hagen e.a. Effects on the European Economy of Shortages of Foreign Language Skills in Enterprise (ELAN Study) An analysis of the use of languages and the effect of linguistic deficiencies in European exporting businesses in 29 countries, Brussels: European Commission & London: National Centre for Languages (CILT), December 2006

Astrid Hager. How can companies prepare themselves for the changing market? The translation market in ten years’ time – a forecast. In: TCWorld 2008-11/12, pp.14-16

CELAN D2.1_fv1.1

84

2010-2013 ICT Standardisation Work Programme for industrial innovation. European Commission Enterprise and Industry Directorate-General, Service Industries Key Enabling Technologies and ICT, 2nd update – 2012

International Information Centre for Terminology (Infoterm). Guidelines for Terminology Policies Formulating and implementing terminology policy in language communities. Paris: UNESCO, 2005 (CI-2005/WS/4)

Lionbridge ed. White Paper 2009. The Future of Localization. A Varied Spectrum of Service Models. See: http://de-de.lionbridge.com/Translation.aspx?pageid=1387, retrieved 2012-11-25

Arle Lommel and Rebecca Ray. Globalization Industry Primer. An introduction to preparing your business and products for success in international markets. Geneva: Localization Industry Standards Association (LISA), 2007

Learning Content Management Systems 2006-11-13. Retrieved 2012-10-12 from WikiEducator (see: http://wikieducator.org/Exemplary_Collection_of_tools_and_standards_for_producing_open_educational_content)

Ben Martin. Terminology Management Driving Content Management. Content enterprise-wide, multi-lingual content management solutions. In: Frieda Steurs ed. TAMA 2001. Terminology in Advanced Microcomputer Applications. Sharing Terminological Knowledge Terminology for Multilingual Content. Wien: TermNetPublisher, 2001 (on CD)

Wolfgang Maass and Tobias Kowatsch eds. Semantic technologies in content management systems. Trends, Applications and Evaluations. Springer Verlag, 2012

Monica Monachini e.a. The Standards’ Landscape Towards an Interoperability Framework. The FLaReNet proposal. Building on the CLARIN Standardization Action Plan (July 2011) Retrieved 2011-09-10 from: http://www.flarenet.eu/sites/default/files/FLaReNet_Standards_Landscape.pdf

MoU/MG. Recommendation on software and content development principles 2010 (Formulated at the ICCHP 2010 and endorsed by ISO/TC 37 and other technical committees). MoU/MG/12 N 476:2012 (rev.) (MoU/MG recommends standards developers concerned with software and content development to consider the “Recommendation on software

and content development principles 2010”) Paul Muljadi. Elearning. Overview and topics. Retrieved 2011-12-11 from:

http://de.scribd.com/doc/75349828/E-Learning David Myron. What's Fueling Speech Technology's Growth. Retrieved 2012-10-15 from:

http://www.speechtechmag.com/Articles/Column/Editor's-Letter/Whats-Fueling-Speech-Technologys-Growth-83611.aspx (referring to Global Industry Analysts (2012)

Nagy, A. (2005). The Impact of E-Learning, in: Bruck, P.A.; Buchholz, A.; Karssen, Z.; Zerfass, A. (Eds). E-Content: Technologies and Perspectives for the European Market. Berlin: Springer-Verlag, pp. 79–96

OECD. E-Learning in Tertiary Education: Where Do We Stand? Paris: Organization for Economic Co-operation and Development OECD, 2005

Paratiritirio [παρατηρητηρίο]. Best Practices of the use of Information and Communication Technologies in the Public & Private Sector. Deliverable D6 Best Practices at International Level for the application of ICT at SMEs, 2007

Anthony Pym, François Grin, Claudio Sfreddo, Andy L. J. Chan, The Status of the Translation Profession in the European Union Studies on translation and multilingualism. Final Report. DGT 7/2012

Francesca Riggio. Dubbing vs. subtitling. In: MultiLingual (Industry Focus), October/November 2010, p. 31-35 http://www.1stoptr.com/admin/UpImage/Dubbing_vs_Subtitling.pdf

Adriane Rinsche and Nadia Portera-Zanotti. Study on the size of the language industry in the EU. Study report to the Directorate General for Translation of the European Commission. Brussels: EU Commission, 2009 (DGT-ML-STUDIES 08)

Klaus-Dirk Schmitz and Daniela Straub. Successful terminology management in companies. Practical tips and guidelines: Basic principles, implementation, cost-benefit analysis and systems overview. TC and more GmbH, 2010

Clemens C. Steiner. SMEs go global. Global expansion: Strategic necessity for small and medium-sized enterprises (SMEs). Identification of strategic opportunity/necessity for

CELAN D2.1_fv1.1

85

globalization & evaluation of related critical success factors. Wien: Service GmbH der Wirtschaftskammer Österreich, 2003

Frieda Steurs ed. TAMA 2001. Terminology in Advanced Microcomputer Applications. Sharing Terminological Knowledge Terminology for Multilingual Content. Wien: TermNetPublisher, 2001 (on CD-ROM)

The T-Index Study: T-Index, the markets that matter on the web

See: www.translated.net/en/languages-that-matter United National Convention on the Rights of Persons with Disabilities (UNCRPD), adopted

on 13 December 2006, and opened for signature on 30 March 2007 U.S. International Trade Commission. Small and Medium-Sized Enterprises: U.S. and EU

Export Activities, and Barriers and Opportunities Experienced by U.S. Firms 7/2012 Kara Warburton. Standards and Guidelines for the Language Industry (2009) (see also: Kara

Warburton. Standards and Guidelines for the Language Industry. Language Technologies Research Centre. March 2006/Revised Feb. 2007. http://www.crtl.ca/dl119&%3Bdisplay)

World Trade Organization (WTO). WTO Agreement on Technical Barriers to Trade (TBT). (with its present form entering into force with the establishment of the WTO at the beginning of 1995) Retrieved 2011-12-11 from http://www.wto.org/english/tratop_e/tbt_e/tbtagr_e.htm#Agreement

World Health Organization (WHO) and the World Bank (WB). World report on disability. Geneva: World Health Organization, 2011

References (standards and legislation):

Directive 98/34EC of the European Parliament and of the Council laying down a procedure for the provision in the field of technical standards and regulations

Decision No 1673/2006/EC of the European Parliament and of the Council of 24 October 2006 on the financing of European standardization

Proposal for a Directive of the European Parliament and of the Council on public procurement. (Text with EEA relevance). (COM 2011 896 final) cites the WHO/WB Report (WHO/WB 2011, p. XI)

ISO/IEC Guide 2 (2004). Standardization and related activities – General vocabulary Directive 2006/42/EC of the European Parliament and of the Council of 17 May 2006 on

machinery, and amending Directive 95/16/EC (recast) (Text with EEA relevance) (revision of the Directive 98/37/EC of the European Parliament and of the Council of 22 June 1998 on the approximation of the laws of the Member States relating to machinery)

List of Annexes

Annex 1: Overview on language industry products and services Annex 2: Business-relevant standards and guidelines in the fields of the language industry Annex 3: CELAN Typology of language industry products and services

Appendixes:

1 Tables A and B showing the online market potential through multilingual websites 2 Language industry mind map 3 Recommendation on software and content development principles 2010

CELAN D2.1_fv1.1

86

Appendix 1: Tables Table A: Online market potential sorted by language

Table B: Online market potential sorted by country

CELAN D2.1 2012-05-27 fv0.1

87

Appendix 2: LI Mind map Background This Appendix 2 shows the conception that permitted to draw a landscape of the development of the human language technologies (LT), their applications and services. Most of the items found in the literature were organized from major categories to specific ones according to five broad areas. This categorization was tested with numerous stakeholders in the language industry (LI) and paved the way to establish the CELAN Typology of LI products and services. (1.1.1 ~ 1.1.3)

CELAN D2.1 2012-05-27 fv0.1

88

(1.2.1 ~ 1.4)

(2.1.1 ~ 2.2.6)

CELAN D2.1 2012-05-27 fv0.1

89

(2.3.1 ~ 2.4.4)

(3.1 ~ 5.4)

CELAN D2.1 2012-05-27 fv0.1

90

Appendix 3: Recommendation on software and content development principles 2010

MoU/MG/12 N 476 Rev.1

Date: 26 March 2012

Recommendation on software and content development principles 2010

Formulated at the ICCHP 2010 and endorsed by ISO/TC 37 and other technical committees Purpose

This recommendation addresses decision makers in public as well as private frameworks, software developers,

the content industry and developers of pertinent standards. Its purpose is to make aware that multilinguality,

multimodality, eInclusion and eAccessibility need to be considered from the outset in software and content

development, in order to avoid the need for additional or remedial engineering or redesign at the time of

adaptation, which tend to be very costly and often prove to be impossible.

Background In software development, globalization

1, localization

2 and internationalization

3 have a particular meaning and

application. In software localization they have been recognized as interdependent and of high importance from a

strategic level down to the level of data modeling and content interoperability.

In 2005 the Management Group of the ITU-ISO-IEC-UN/ECE Memorandum of Understanding on eBusiness

standardization adopted a statement (MoU/MG N0221), which defines as basic requirements for the development

of fundamental methodology standards concerning semantic interoperability the fitness for

- multilinguality (covering also cultural diversity),

- multimodality and multimedia,

- eInclusion and eAccessibility,

- multi-channel presentations,

which have to be considered at the earliest stage of

- the software design process, and

- data modeling (including the definition of metadata),

and hereafter throughout all the iterative development cycles.

The above requirements are a prerequisite for global content integration and aggregation as well as content

interoperability. Content interoperability is the capability of content to be combined with or embedded in other

(types of) content items and to be extensively re-used as well as re-purposed for other kinds of eApplications. In

order to achieve this capability, software must support these requirements from the outset. The same applies to the

methods and tools of content management – including web content management.

Recommendation Software should be developed and data models for content prepared in compliance with the above-mentioned

requirements to facilitate the adaptation to different languages and cultures (localization) or new applications (re-

purposing), the personalization for different individual preferences or needs, including those of persons with

disabilities. These requirements should also be referenced in all pertinent standards.

1

Globalization) refers to all of the business decisions and activities required to make an organization truly international in scope and outlook. G11N is the

transformation of business, processes and products to support customers around the world, in whatever language, country, or culture they require. 2 Localization is the process of modifying products or services to account for differences in distinct markets. Therefore, L10N is an integral part of G11N,

and without it, other globalization efforts are likely to be ineffective. The interdependence of G11N and L10N has also been coined glocalization. 3 Internationalization is the process of enabling a product at a technical level for localization. An internationalized product does not require remedial

engineering or redesign at the time of localization. Instead, it has been designed and built from the outset to be easily adapted for a specific application after the engineering phase.

_______