Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized...

Post on 25-Oct-2020

5 views 0 download

Transcript of Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized...

Initialisms and specialized collocations in a LSP corpus from Environment, Human Genome, Medicine and

Economics texts in Spanish and English

John Jairo GIRALDO University of Antioquia (Colombia)

Pedro PATIÑO FSK, NHH Norwegian School of Economics

(Norway) / University of Antioquia (Colombia)

Definition of initialism

An initialism is a lexical reduction unit formed from the initial alphanumeric characters of a lexical unit of syntagmatic structure, forming a sequence whose pronunciation can be spelled or syllabic or both; e.g.: PCR; TS; TEP; Grb2.

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Definition of specialized collocation

A MWE composed of at least a term that serves as the base and its collocates, where the constituents:

  are in direct syntactic relation to each other;   can be unpredictable and semi-compositional;   have an internal and statistical tendency of preference.

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Corpus data

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

•  Comparable corpus of specialized texts in English and Spanish from the subject fields of economics and medicine.

Subject field English Spanish

Medicine 445.662 1.291.981

Economics 16.386.493 554.282 •  Data has been PoS tagged and encoded with IMS CWB.

Corpus data (IULA)

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Subject field Spanish

Human genome 999.950

Environment 999.876

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Iniatialism + Noun (Spanish)

1. Human Genome

DNA basura

DNA polimerasa

DNA espaciador

2. Environment

---

Noun + Initialism (Spanish)

1. Human Genome

Estudios PCR

Tecnología PCR

2. Environment

Prueba DBO

Complejo EDTA

Análisis DPD

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + Noun (English)

1. Medicine

Noun + Initialism (English)

1. Medicine

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + Noun in the Economics corpus

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Adj + Initialism in the Economics corpus

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Adj + Adj + Initialism in the Economics corpus

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Iniatialism + Adj (Spanish)

1. Human Genome

PCR arbitraria

ADN autosómico

ARN bacteriano

2. Environment

OD consumido

OD inicial

Adj + Initialism (Spanish)

1. Human Genome

----

2. Environment

----

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + Adj + N (English) 3. Medicine

Adj + Initialism (English)

3. Medicine

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + verb (Spanish) 1. Genoma

PCR es una técnica

El PGH constituye

Los RFLP proporcionan

El VIH pertenece

2. Environment

La UE regula

la DPD produce

El ICONA presenta

Verb + Initialism (Spanish) 1. Genoma

generar ATP

inducir ACs

sintetizar ARN mensajero

2. Environment

siguen DBO

siendo NASA el organismo

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + verb (English) 3. Medicine

4. Economics

Verb + Initialism (English) 3. Medicine

4. Economics

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + Adv (Spanish)

1. Human Genome

---

2. Environment

---

Adv + Initialism (Spanish)

1. Human Genome

ADN completamente equivocado

2. Environment

---

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

.

Due to its nominal nature, initialisms may be modified with adjectives, as evidenced with corpus data.

The contrast between the two subject fields shows that the phenomenon is way more common in HG, where 9 initialisms appear modified by Adj, in contrast to only 1 case for Env.

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

.

These are the most common Spanish Adj modifying initialisms found in the Human Genome and Environment Corpus:

humano nuclear bacteriano genómico celular maduro ribosómico circular clonado codificado cromosómico

eucariótico extraño híbrido mensajero minisatélite proviral quimérico recombinante repetitivo superenrollado unicatenario

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

. 2  ocur:  

Célula,  complejo,  familia,  región,  sistema,  tecnología  

secuencia  (7  ocurrences),  gen  (5  ocur.),  locus/loci  (6  ocur.),  virus  y  polimorfismo  (4  ocur.),  marcador  y  genoma  (3  ocur.),    

IniGalisms  may  combine  with  other  grammaGcal  categories:  

In  the  sample  from  the  HG  and  Env,  there  are  22  cases  with  Adj  (19  in  HG  and  3  in  Env).  

By  decreasing  order,  the  words  that  combine  more  frequently  with  iniGalisms  are:  

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

.

Regarding initialisms and verbs, we have found 25 initialisms acting as subject of a verb in HG and 15 en Env.

These verbs also occur where initialisms are the object:

1) Phraseological verbs (agregar, construir, contener, desarrollar, emplear, fabricar, formar, introducir, producir, secuenciar, sintetizar, tener, transcribir);

2) Logical relationship verbs (generar, inducir, ser),

3) Discourse performative verbs (denominar, detectar, encontrar expresar, realizar, seguir, utilizar).

4) Initialisms and their relation with other specialized lexical items might be useful to disambiguate the sense of an initialism, improve MT lexicons, etc.

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

.

Analyzed initialisms indicate that they are mostly the object of the verbs “contener” and “denominar”, which appear in 4 and 3 initialisms, respectively.

The most productive phenomena regarding collocations and initialisms are the one formed by nouns, adjectives and verbs. Adverbs are less relevant.

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Danke schön! Thank you!

Comments, questions, suggestions, criticism?

johnjairo.giraldo@gmail.com

pedro.patino@nhh.no

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria