Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some...

52
Timo Honkela 27 May 2014 Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects [email protected] Symposium “Interfaces between Language, Literature and Culture: Research at Department of Modern Languages”

description

A presentation in the symposium “Interfaces between Language, Literature and Culture:  Research at Department of Modern Languages” at University of Helsinki, 19th of May, 2014

Transcript of Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some...

Page 1: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Timo Honkela

27 May 2014

Digital Preservation and Computational Modeling of Language and Culture:

Some Philosophical and Empirical Aspects

[email protected]

Symposium “Interfaces between Language, Literature and Culture:

Research at Department of Modern Languages”

Page 2: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Background

Page 3: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Natural language database interfacewith dependency-based compositional semantics

● H. Jäppinen, T. Honkela, H. Hyötyniemi & A. Lehtola (1988):A Multilevel Natural Language Processing Model. Nordic Journal of Linguistics 11:69-87.

What is the turnover of the ten largest stock exchange companies in forestry?

Morphological analysis

Dependency parsing

Logical analysis

Database query formation

Result from the SQL database

Page 4: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Classical example: Learning meaning from context:

Maps of words in Grimm fairy tales

Honkela, Pulkki & Kohonen 1995

Page 5: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Map of Finnish Science

Chemistry

Physics andengineering

Biosciences

Medicine

Culture and society

Page 6: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

WordICA

Timo Honkela, Aapo Hyvärinen, and Jaakko Väyrynen. WordICA - Emergence of linguistic representations for words by independent component analysis. Natural Language Engineering, 16(3):277–308, 2010.

Jaakko J. Väyrynen, Lasse Lindqvist, and Timo Honkela. Sparse distributed representations for words with thresholded independent component analysis. In Proceedings of IJCNN'07, pages 1031–1036, 2007.

Page 7: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Learning taxonomies

Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez Unanue and Timo Honkela (2012). Learning a taxonomy from a set of text documents. Applied Soft Computing, 12(3), pp. 1138--1148.

Page 8: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

CentralInterests:

Contextualityand

Subjectivity

Page 9: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Meaning is contextual

red winered skinred shirt

Gärdenfors: Conceptual Spaces

Hardin: Color for Philosophers

Page 10: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Meaning is contextual

SNOW -WHITE?

WHITE

Page 11: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Meaning is contextual

● “Small”, “big”● “White house”● “Get”● “Every” - “Every Swede is tall/blond”● etc. etc.

Another comment:

Strict compositionality cannot be assumed

Fuzziness

Page 12: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Meaning is subjective

Page 13: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Meaning is subjective

● Good● Fair● Useful● Scientific● Democratic● Sustainable● etc.

A proper theory ofmeaning has to takethis into account

Page 14: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008.

Intermediate conclusion

● Languages, including formal languages, should be considered as tools for coordination, storing and sharing knowledge in a compressed form – approximate and relative to the point of view taken

● Constructing a language or symbol system is an investment and spreading the language into use in a community is even a larger one

Page 15: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

DigitalHumanities

Page 16: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Digital humanities

● Research within humanities with the help of computers– Digital resources

– Computational models

● Basic motivation– One can already fly to moon and

build sophisticated factory products

– The most important open questionsin the world are related to humanitiesand social sciences

Page 17: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Digital Computational

Humanities

Contentstorage and

transfer

Contentanalysis

Page 18: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

● Heinz von Foerster in “Responsibilities of Compentence” (1972): “The hard sciences are successful because they deal with the soft problems; the soft sciences are struggling because they deal with the hard problems”

Page 19: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Tieteenalat järjestettynähakemusten englanninkielisten

osuuksien suhteellisen määrän mukaan(*)

Matematiikka 95.3

Farmasia 94.1

Kemia 93.7

Fysiikka 93.4

Biokemia, molekyylibiologia, mikrobiologia, perinnöllisyystiede ja biotekniikka

93.4

Solu- ja kehitysbiologia, fysiologia ja ekofysiologia 93.4

Tietojenkäsittelytieteet 93.0

Sähkötekniikka ja elektroniikka 92.8

Ympäristötekniikka 92.7

Geotieteet 92.1

Ekologia, evoluutiotutkimus ja systematiikka 92.1

Kone- ja valmistustekniikka 91.9

Metsätieteet 91.4

Avaruustieteet ja tähtitiede 91.0

Prosessi- ja materiaalitekniikka 90.8

Tilastotiede 90.7

Muu ympäristön ja luonnonvarojen tutkimus 90.1

Kliininen lääketiede 89.6

Ekotoksikologia, ympäristön tila ja ympäristövaikutukset 89.5

Ravitsemustiede 89.3

Psykologia 89.0

Liikuntatiede 88.9

Hoitotiede 88.9

Eläinlääketiede 88.5

Kansanterveystiede 88.1

Kielitieteet 87.6

Filosofia 87.3

Liiketaloustiede, talousmaantiede ja tuotantotalous 87.2

Hammaslääketiede 86.7

Kansantaloustiede 86.3

Rakennus- ja yhdyskuntatekniikka 85.9

Maatalous- ja elintarviketieteet 85.4

Ympäristöpolitiikka, -talous ja -oikeus 85.3

Maantiede 84.8

Arkkitehtuuri ja teollinen muotoilu 83.7

Viestintä- ja informaatiotieteet 83.1

Kasvatustiede 82.6

Valtio-oppi ja hallintotiede 82.2

Taiteiden tutkimus 81.6

Sosiaalitieteet 80.4

Kulttuurien tutkimus 79.3

Historia ja arkeologia 78.1

Teologia 77.0

Oikeustiede 70.8

(*) SuomenAkatemialleosoitettujenhakemustenkorpuksessa

Page 20: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Matematiikka 95.3

Farmasia 94.1

Kemia 93.7

Fysiikka 93.4

Biokemia, molekyylibiologia, mikrobiologia, perinnöllisyystiede ja biotekniikka

93.4

Solu- ja kehitysbiologia, fysiologia ja ekofysiologia 93.4

Tietojenkäsittelytieteet 93.0

Sähkötekniikka ja elektroniikka 92.8

Ympäristötekniikka 92.7

Geotieteet 92.1

Ekologia, evoluutiotutkimus ja systematiikka 92.1

Kone- ja valmistustekniikka 91.9

Metsätieteet 91.4

Avaruustieteet ja tähtitiede 91.0

Prosessi- ja materiaalitekniikka 90.8

Tilastotiede 90.7

Muu ympäristön ja luonnonvarojen tutkimus 90.1

Kliininen lääketiede 89.6

Ekotoksikologia, ympäristön tila ja ympäristövaikutukset 89.5

Ravitsemustiede 89.3

Psykologia 89.0

Liikuntatiede 88.9

Hoitotiede 88.9

Eläinlääketiede 88.5

Kansanterveystiede 88.1

Kielitieteet 87.6

Filosofia 87.3

Liiketaloustiede, talousmaantiede ja tuotantotalous 87.2

Hammaslääketiede 86.7

Kansantaloustiede 86.3

Rakennus- ja yhdyskuntatekniikka 85.9

Maatalous- ja elintarviketieteet 85.4

Ympäristöpolitiikka, -talous ja -oikeus 85.3

Maantiede 84.8

Arkkitehtuuri ja teollinen muotoilu 83.7

Viestintä- ja informaatiotieteet 83.1

Kasvatustiede 82.6

Valtio-oppi ja hallintotiede 82.2

Taiteiden tutkimus 81.6

Sosiaalitieteet 80.4

Kulttuurien tutkimus 79.3

Historia ja arkeologia 78.1

Teologia 77.0

Oikeustiede 70.8

Page 21: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Accessing and analyzing digital resources

Page 22: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Archives

Libraries

Universities

Citizens

Researchers

Media

DIGITALRESOURCES

Museums

Teachers

Artists

Companies

Societies

Municipalities

StateDecisionmakers

Journalists

Informationspecialists

Page 23: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Texts

Images

Videos

Computationalmodels

Numericaldata

DIGITAL RESOURCES

Speeches/convers.

Multimediadocuments

Interactivesystems

Computersoftware

Page 24: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Resource Meta data

DIGITAL RESOURCES

Page 25: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Resources

Content andinformationprofessional

Users ofthe contents

(professionalsand lay people)

Machine learningand

pattern recognitionsystems

Formal metadata

Languagetechnology

resources andsystems

Other forms of description

Page 26: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Resources

Users ofthe contents

(professionalsand lay people)

Other forms of description

Crowdsourcing

Importanceof openness

Page 27: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Resources

Machine learningand

pattern recognitionsystems

Formal metadata Other forms of description

ClassificationClustering

Importance ofthe availabilityof data

Page 28: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Challenge:

A tension between

the usability and standardizationof content descriptions

and

richness and evolution of language and its interpretation,genre and style variation, andcontextuality, subjectivity and

cultural dependence

Page 29: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

ComputationalMethods and

Tools

Page 30: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Mainframe computersPersonal computers

InternetMultimedia

Virtual realityWorld wide web

Social mediaMOOCs

Mobile devicesCloud services

Games and gamification3D printing

Big DataPattern recognition

Statistical machine learningRobotics

Page 31: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

...Statistics

Information theoryProbability theory

Dynamical systems theory...

Page 32: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Implications of machine learning

● Machines are not anymore simply doingwhat they are programmed to do

● Machine learning algorithms are programsin the traditional sense but theyenable evolving “behaviors” of the systembased on the “experience” that the systemgathers after having been programmed

● This makes it possible for the systems tohave a certain level of “conceptual autonomy”:they build their view on some phenomena basedon the data/texts/etc. that are given to them

Page 33: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Theories

Data

Models Hypotheses

Page 34: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Conceptual systems

Page 35: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Melissa Bowerman

Max Planck Institute for Psycholinguistics

Space under Construction

Language-Specific Spatial Categorization In First Language Acquisition

Lund University Cognitive Science2003

Page 36: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

DUTCH

INOP AANINOP AAN

Page 37: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

OPEN

open boxopen dooropen bagopen

envelope

open

mouthopen clamshellopen pair ofshutters

openlatcheddrawer open hand

open book

eyes open

open fan

Categorization of `opening’ in English and Korean.

'tear awayfrom base'

YELTA'remove barrier tointerior space'

PPAYTA

‘unfit’

TTUTA‘rise’

PELLITA'separate two partssymmetrically'

take offwallpaper

unwrappackage

spreadlegs apart

take offring

take cassetteout of case

sun rises

spread blanket outpeacock spreads tail

'spread out flat thing'

TTUTA

PHYELCHITA

Page 38: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

(Pye 1995, 1996)

PLATE STICK ROPE CLOTHES

può puòduàn(long rigid thing)

MANDARIN può

-q’upi:j(other hardthing)

rach’aqij (“tear”)

-tóqopi’j(long, flexiblething)

-paxi:j(rock, glass,clay thing)

K’ICHE’MAYAN

tear, ripbreakENGLISH breakbreak

http://www.mpi.nl/people/bowerman-melissa

http://www.mpi.nl/people/bowerman-melissa/publications

Page 39: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Processing multimodal information

Page 40: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Acknowledgements:

Finnish Broadcasting Company (YLE)

An example of automatic multimedia content analysis

users.ics.aalto.fi/jorma/scholar.google.com/citations?user=suHzeyIAAAAJ&hl=en

users.ics.aalto.fi/mikkok/elec.aalto.fi/en/about/careers/professors/mikko_kurimo/

Jorma Laaksonen

MikkoKurimo

Page 41: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Speakerrecognition

Video analysis / scene classification

Speech recognition(speech to text)

Page 42: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Video analysis / scene classification

Speakerrecognition

Speech recognition(speech to text)

OCR

Page 43: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Movement verbs

Page 44: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

David Bailey's thesis (1997):

Verbs related to hand movement

Page 45: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Point of view fromcognitive linguistics

● The meaning of linguistic symbols in the mind of the language users derives from the users' sensory perceptions, their actions with the world and with each other.

● For example: the meaning of the word 'walk' involves– what walking looks like– what it feels like to walk and after having walked

– how the world looks when walking (e.g. objects approach at a certain speed, etc.).

– ...

Page 46: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Abstract vs concrete grounding

Ronald Langacker

Page 47: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Multimodally Grounded Language Technology

A project funded by Academy of Finland2011-2014

Timo Honkela as the Principal Investigator

A collaboration betweendepartments of

* Information and Computer Science, and

* Media Technology

Page 48: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Consider how different languagesdivide the conceptual space

in different ways(cf. e.g. Melissa Bowerman et al.) Förger & Honkela 2013

Page 49: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Analysis ofsubjectivity

Page 50: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

GICA: Grounded IntersubjectiveConcept Analysis

Page 51: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Analysis of “health” in theState of the Union addresses

Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity. Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar.Proc. of IJCNN 2012.

Page 52: Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some Philosophical and Empirical Aspects

Thank you for your attention!