Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized...

22
Initialisms and specialized collocations in a LSP corpus from Environment, Human Genome, Medicine and Economics texts in Spanish and English John Jairo GIRALDO University of Antioquia (Colombia) Pedro PATIÑO FSK, NHH Norwegian School of Economics (Norway) / University of Antioquia (Colombia)

Transcript of Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized...

Page 1: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations in a LSP corpus from Environment, Human Genome, Medicine and

Economics texts in Spanish and English

John Jairo GIRALDO University of Antioquia (Colombia)

Pedro PATIÑO FSK, NHH Norwegian School of Economics

(Norway) / University of Antioquia (Colombia)

Page 2: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Definition of initialism

An initialism is a lexical reduction unit formed from the initial alphanumeric characters of a lexical unit of syntagmatic structure, forming a sequence whose pronunciation can be spelled or syllabic or both; e.g.: PCR; TS; TEP; Grb2.

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Page 3: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Definition of specialized collocation

A MWE composed of at least a term that serves as the base and its collocates, where the constituents:

  are in direct syntactic relation to each other;   can be unpredictable and semi-compositional;   have an internal and statistical tendency of preference.

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Page 4: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Page 5: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Corpus data

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

•  Comparable corpus of specialized texts in English and Spanish from the subject fields of economics and medicine.

Subject field English Spanish

Medicine 445.662 1.291.981

Economics 16.386.493 554.282 •  Data has been PoS tagged and encoded with IMS CWB.

Page 6: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Corpus data (IULA)

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Subject field Spanish

Human genome 999.950

Environment 999.876

Page 7: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Iniatialism + Noun (Spanish)

1. Human Genome

DNA basura

DNA polimerasa

DNA espaciador

2. Environment

---

Noun + Initialism (Spanish)

1. Human Genome

Estudios PCR

Tecnología PCR

2. Environment

Prueba DBO

Complejo EDTA

Análisis DPD

Page 8: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + Noun (English)

1. Medicine

Noun + Initialism (English)

1. Medicine

Page 9: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + Noun in the Economics corpus

Page 10: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Adj + Initialism in the Economics corpus

Page 11: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Adj + Adj + Initialism in the Economics corpus

Page 12: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Iniatialism + Adj (Spanish)

1. Human Genome

PCR arbitraria

ADN autosómico

ARN bacteriano

2. Environment

OD consumido

OD inicial

Adj + Initialism (Spanish)

1. Human Genome

----

2. Environment

----

Page 13: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + Adj + N (English) 3. Medicine

Adj + Initialism (English)

3. Medicine

Page 14: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + verb (Spanish) 1. Genoma

PCR es una técnica

El PGH constituye

Los RFLP proporcionan

El VIH pertenece

2. Environment

La UE regula

la DPD produce

El ICONA presenta

Verb + Initialism (Spanish) 1. Genoma

generar ATP

inducir ACs

sintetizar ARN mensajero

2. Environment

siguen DBO

siendo NASA el organismo

Page 15: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + verb (English) 3. Medicine

4. Economics

Verb + Initialism (English) 3. Medicine

4. Economics

Page 16: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Initialisms and specialized collocations

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Initialism + Adv (Spanish)

1. Human Genome

---

2. Environment

---

Adv + Initialism (Spanish)

1. Human Genome

ADN completamente equivocado

2. Environment

---

Page 17: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

.

Due to its nominal nature, initialisms may be modified with adjectives, as evidenced with corpus data.

The contrast between the two subject fields shows that the phenomenon is way more common in HG, where 9 initialisms appear modified by Adj, in contrast to only 1 case for Env.

Page 18: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

.

These are the most common Spanish Adj modifying initialisms found in the Human Genome and Environment Corpus:

humano nuclear bacteriano genómico celular maduro ribosómico circular clonado codificado cromosómico

eucariótico extraño híbrido mensajero minisatélite proviral quimérico recombinante repetitivo superenrollado unicatenario

Page 19: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

. 2  ocur:  

Célula,  complejo,  familia,  región,  sistema,  tecnología  

secuencia  (7  ocurrences),  gen  (5  ocur.),  locus/loci  (6  ocur.),  virus  y  polimorfismo  (4  ocur.),  marcador  y  genoma  (3  ocur.),    

IniGalisms  may  combine  with  other  grammaGcal  categories:  

In  the  sample  from  the  HG  and  Env,  there  are  22  cases  with  Adj  (19  in  HG  and  3  in  Env).  

By  decreasing  order,  the  words  that  combine  more  frequently  with  iniGalisms  are:  

Page 20: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

.

Regarding initialisms and verbs, we have found 25 initialisms acting as subject of a verb in HG and 15 en Env.

These verbs also occur where initialisms are the object:

1) Phraseological verbs (agregar, construir, contener, desarrollar, emplear, fabricar, formar, introducir, producir, secuenciar, sintetizar, tener, transcribir);

2) Logical relationship verbs (generar, inducir, ser),

3) Discourse performative verbs (denominar, detectar, encontrar expresar, realizar, seguir, utilizar).

4) Initialisms and their relation with other specialized lexical items might be useful to disambiguate the sense of an initialism, improve MT lexicons, etc.

Page 21: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

Concluding remarks

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

.

Analyzed initialisms indicate that they are mostly the object of the verbs “contener” and “denominar”, which appear in 4 and 3 initialisms, respectively.

The most productive phenomena regarding collocations and initialisms are the one formed by nouns, adjectives and verbs. Adverbs are less relevant.

Page 22: Initialisms and specialized collocations in a LSP corpus from ...Initialisms and specialized collocations John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria

Danke schön! Thank you!

Comments, questions, suggestions, criticism?

[email protected]

[email protected]

John Jairo GIRALDO - Pedro PATIÑO 19th European Symposium on LSP. Centre for Translation Studies, University of Vienna

July 2013, Vienna, Austria