Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents...

15
Language Technology and AI computational models human languages Jörg Tiedemann Department of Modern Languages / Digital Humanities, University of Helsinki

Transcript of Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents...

Page 1: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

Language Technology and AI

computationalmodels

human languages

Jörg TiedemannDepartment of Modern Languages /

Digital Humanities, University of Helsinki

Page 2: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

Language Technology and AI

unde

rsta

ndin

g

computationalmodels

human languages

Jörg TiedemannDepartment of Modern Languages /

Digital Humanities, University of Helsinki

Page 3: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

Language Technology and AI

unde

rsta

ndin

g

meaning

computationalmodels

human languages

Jörg TiedemannDepartment of Modern Languages /

Digital Humanities, University of Helsinki

Page 4: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

Language Technology and AI

unde

rsta

ndin

g

speaking / writing

meaning

computationalmodels

human languages

Jörg TiedemannDepartment of Modern Languages /

Digital Humanities, University of Helsinki

Page 5: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

Language Technology and AI

unde

rsta

ndin

g

speaking / writing

meaning

computationalmodels

human languages

human - computerinteraction

communicationsemanticreasoning

informationmining

translation

knowledgeaggregation

intelligentinteractive systems

Jörg TiedemannDepartment of Modern Languages /

Digital Humanities, University of Helsinki

Page 6: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

meaning

sourcelanguage

targetlanguage

Found in TranslationNatural Language Understanding with cross-lingual grounding

unde

rsta

ndin

g

speaking / writing

translations assemantic mirrors

Page 7: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

meaning

sourcelanguage

targetlanguage

enco

der decoder

dense vector-basedrepresentation

neuralMT

language data

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Learning Algorithm

human translations

Found in TranslationNatural Language Understanding with cross-lingual grounding

unde

rsta

ndin

g

speaking / writing

translations assemantic mirrors

Page 8: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

meaning

Learning Algorithm

enco

der decoderneural

MT

language dataData-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

human translations

addsignificant linguistic diversity

dense vector-basedrepresentation

massivelyparallelcorpora

source and target languages

Found in TranslationNatural Language Understanding with cross-lingual grounding

Page 9: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

continuous semantic space

meaning

Learning Algorithm

neuralMT

language dataData-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

human translations

variation enforces stronger abstraction

meaning grounded in translation

source and target languages

massivelyparallelcorpora

Found in TranslationNatural Language Understanding with cross-lingual grounding

Page 10: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

continuous semantic space

meaning

Learning Algorithm

neuralMT

language dataData-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

Data-Driven Machine Translation

Hmm, every time he sees “banco”, he either types “bank” or “bench” but if he sees “banco de ”,he always types “bank”, never “bench”

Man, this is so boring.

Translated documents

human translations

variation enforces stronger abstraction

meaning grounded in translation

source and target languages

massivelyparallelcorpora

multilingual MTsemantic reasoning

emerging representations

Found in TranslationNatural Language Understanding with cross-lingual grounding

Page 11: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

Found in TranslationNatural Language Understanding with cross-lingual grounding

Goals:

http://blogs.helsinki.fi/language-technology/

Page 12: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

multilingual machine translation

Found in TranslationNatural Language Understanding with cross-lingual grounding

Goals:

http://blogs.helsinki.fi/language-technology/

Page 13: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

learning and interpretingsemantic sentence representations

multilingual machine translation

Found in TranslationNatural Language Understanding with cross-lingual grounding

Goals:

http://blogs.helsinki.fi/language-technology/

Page 14: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

learning and interpretingsemantic sentence representations

multilingual machine translation

abstract continuous meaning spaces of language

Found in TranslationNatural Language Understanding with cross-lingual grounding

Goals:

http://blogs.helsinki.fi/language-technology/

Page 15: Department of Modern Languages / Digital Humanities ... · PDF fileboring. documents Translation sees types ”, “bank”, “bench” boring. documents human translations variation

learning and interpretingsemantic sentence representations

multilingual machine translation

human-likelanguage understanding

abstract continuous meaning spaces of language

Found in TranslationNatural Language Understanding with cross-lingual grounding

Goals:

http://blogs.helsinki.fi/language-technology/