Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some...
-
Upload
timo-honkela -
Category
Technology
-
view
101 -
download
2
description
Transcript of Timo Honkela: Digital Preservation and Computational Modeling of Language and Culture: Some...
Timo Honkela
27 May 2014
Digital Preservation and Computational Modeling of Language and Culture:
Some Philosophical and Empirical Aspects
Symposium “Interfaces between Language, Literature and Culture:
Research at Department of Modern Languages”
Background
Natural language database interfacewith dependency-based compositional semantics
● H. Jäppinen, T. Honkela, H. Hyötyniemi & A. Lehtola (1988):A Multilevel Natural Language Processing Model. Nordic Journal of Linguistics 11:69-87.
What is the turnover of the ten largest stock exchange companies in forestry?
Morphological analysis
Dependency parsing
Logical analysis
Database query formation
Result from the SQL database
Classical example: Learning meaning from context:
Maps of words in Grimm fairy tales
Honkela, Pulkki & Kohonen 1995
Map of Finnish Science
Chemistry
Physics andengineering
Biosciences
Medicine
Culture and society
WordICA
Timo Honkela, Aapo Hyvärinen, and Jaakko Väyrynen. WordICA - Emergence of linguistic representations for words by independent component analysis. Natural Language Engineering, 16(3):277–308, 2010.
Jaakko J. Väyrynen, Lasse Lindqvist, and Timo Honkela. Sparse distributed representations for words with thresholded independent component analysis. In Proceedings of IJCNN'07, pages 1031–1036, 2007.
Learning taxonomies
Mari-Sanna Paukkeri, Alberto Pérez García-Plaza, Víctor Fresno, Raquel Martínez Unanue and Timo Honkela (2012). Learning a taxonomy from a set of text documents. Applied Soft Computing, 12(3), pp. 1138--1148.
CentralInterests:
Contextualityand
Subjectivity
Meaning is contextual
red winered skinred shirt
Gärdenfors: Conceptual Spaces
Hardin: Color for Philosophers
Meaning is contextual
SNOW -WHITE?
WHITE
Meaning is contextual
● “Small”, “big”● “White house”● “Get”● “Every” - “Every Swede is tall/blond”● etc. etc.
Another comment:
Strict compositionality cannot be assumed
Fuzziness
Meaning is subjective
Meaning is subjective
● Good● Fair● Useful● Scientific● Democratic● Sustainable● etc.
A proper theory ofmeaning has to takethis into account
Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008.
Intermediate conclusion
● Languages, including formal languages, should be considered as tools for coordination, storing and sharing knowledge in a compressed form – approximate and relative to the point of view taken
● Constructing a language or symbol system is an investment and spreading the language into use in a community is even a larger one
DigitalHumanities
Digital humanities
● Research within humanities with the help of computers– Digital resources
– Computational models
● Basic motivation– One can already fly to moon and
build sophisticated factory products
– The most important open questionsin the world are related to humanitiesand social sciences
Digital Computational
Humanities
Contentstorage and
transfer
Contentanalysis
● Heinz von Foerster in “Responsibilities of Compentence” (1972): “The hard sciences are successful because they deal with the soft problems; the soft sciences are struggling because they deal with the hard problems”
Tieteenalat järjestettynähakemusten englanninkielisten
osuuksien suhteellisen määrän mukaan(*)
Matematiikka 95.3
Farmasia 94.1
Kemia 93.7
Fysiikka 93.4
Biokemia, molekyylibiologia, mikrobiologia, perinnöllisyystiede ja biotekniikka
93.4
Solu- ja kehitysbiologia, fysiologia ja ekofysiologia 93.4
Tietojenkäsittelytieteet 93.0
Sähkötekniikka ja elektroniikka 92.8
Ympäristötekniikka 92.7
Geotieteet 92.1
Ekologia, evoluutiotutkimus ja systematiikka 92.1
Kone- ja valmistustekniikka 91.9
Metsätieteet 91.4
Avaruustieteet ja tähtitiede 91.0
Prosessi- ja materiaalitekniikka 90.8
Tilastotiede 90.7
Muu ympäristön ja luonnonvarojen tutkimus 90.1
Kliininen lääketiede 89.6
Ekotoksikologia, ympäristön tila ja ympäristövaikutukset 89.5
Ravitsemustiede 89.3
Psykologia 89.0
Liikuntatiede 88.9
Hoitotiede 88.9
Eläinlääketiede 88.5
Kansanterveystiede 88.1
Kielitieteet 87.6
Filosofia 87.3
Liiketaloustiede, talousmaantiede ja tuotantotalous 87.2
Hammaslääketiede 86.7
Kansantaloustiede 86.3
Rakennus- ja yhdyskuntatekniikka 85.9
Maatalous- ja elintarviketieteet 85.4
Ympäristöpolitiikka, -talous ja -oikeus 85.3
Maantiede 84.8
Arkkitehtuuri ja teollinen muotoilu 83.7
Viestintä- ja informaatiotieteet 83.1
Kasvatustiede 82.6
Valtio-oppi ja hallintotiede 82.2
Taiteiden tutkimus 81.6
Sosiaalitieteet 80.4
Kulttuurien tutkimus 79.3
Historia ja arkeologia 78.1
Teologia 77.0
Oikeustiede 70.8
(*) SuomenAkatemialleosoitettujenhakemustenkorpuksessa
Matematiikka 95.3
Farmasia 94.1
Kemia 93.7
Fysiikka 93.4
Biokemia, molekyylibiologia, mikrobiologia, perinnöllisyystiede ja biotekniikka
93.4
Solu- ja kehitysbiologia, fysiologia ja ekofysiologia 93.4
Tietojenkäsittelytieteet 93.0
Sähkötekniikka ja elektroniikka 92.8
Ympäristötekniikka 92.7
Geotieteet 92.1
Ekologia, evoluutiotutkimus ja systematiikka 92.1
Kone- ja valmistustekniikka 91.9
Metsätieteet 91.4
Avaruustieteet ja tähtitiede 91.0
Prosessi- ja materiaalitekniikka 90.8
Tilastotiede 90.7
Muu ympäristön ja luonnonvarojen tutkimus 90.1
Kliininen lääketiede 89.6
Ekotoksikologia, ympäristön tila ja ympäristövaikutukset 89.5
Ravitsemustiede 89.3
Psykologia 89.0
Liikuntatiede 88.9
Hoitotiede 88.9
Eläinlääketiede 88.5
Kansanterveystiede 88.1
Kielitieteet 87.6
Filosofia 87.3
Liiketaloustiede, talousmaantiede ja tuotantotalous 87.2
Hammaslääketiede 86.7
Kansantaloustiede 86.3
Rakennus- ja yhdyskuntatekniikka 85.9
Maatalous- ja elintarviketieteet 85.4
Ympäristöpolitiikka, -talous ja -oikeus 85.3
Maantiede 84.8
Arkkitehtuuri ja teollinen muotoilu 83.7
Viestintä- ja informaatiotieteet 83.1
Kasvatustiede 82.6
Valtio-oppi ja hallintotiede 82.2
Taiteiden tutkimus 81.6
Sosiaalitieteet 80.4
Kulttuurien tutkimus 79.3
Historia ja arkeologia 78.1
Teologia 77.0
Oikeustiede 70.8
Accessing and analyzing digital resources
Archives
Libraries
Universities
Citizens
Researchers
Media
DIGITALRESOURCES
Museums
Teachers
Artists
Companies
Societies
Municipalities
StateDecisionmakers
Journalists
Informationspecialists
Texts
Images
Videos
Computationalmodels
Numericaldata
DIGITAL RESOURCES
Speeches/convers.
Multimediadocuments
Interactivesystems
Computersoftware
Resource Meta data
DIGITAL RESOURCES
Resources
Content andinformationprofessional
Users ofthe contents
(professionalsand lay people)
Machine learningand
pattern recognitionsystems
Formal metadata
Languagetechnology
resources andsystems
Other forms of description
Resources
Users ofthe contents
(professionalsand lay people)
Other forms of description
Crowdsourcing
Importanceof openness
Resources
Machine learningand
pattern recognitionsystems
Formal metadata Other forms of description
ClassificationClustering
Importance ofthe availabilityof data
Challenge:
A tension between
the usability and standardizationof content descriptions
and
richness and evolution of language and its interpretation,genre and style variation, andcontextuality, subjectivity and
cultural dependence
ComputationalMethods and
Tools
Mainframe computersPersonal computers
InternetMultimedia
Virtual realityWorld wide web
Social mediaMOOCs
Mobile devicesCloud services
Games and gamification3D printing
Big DataPattern recognition
Statistical machine learningRobotics
...Statistics
Information theoryProbability theory
Dynamical systems theory...
Implications of machine learning
● Machines are not anymore simply doingwhat they are programmed to do
● Machine learning algorithms are programsin the traditional sense but theyenable evolving “behaviors” of the systembased on the “experience” that the systemgathers after having been programmed
● This makes it possible for the systems tohave a certain level of “conceptual autonomy”:they build their view on some phenomena basedon the data/texts/etc. that are given to them
Theories
Data
Models Hypotheses
Conceptual systems
Melissa Bowerman
Max Planck Institute for Psycholinguistics
Space under Construction
Language-Specific Spatial Categorization In First Language Acquisition
Lund University Cognitive Science2003
DUTCH
INOP AANINOP AAN
OPEN
open boxopen dooropen bagopen
envelope
open
mouthopen clamshellopen pair ofshutters
openlatcheddrawer open hand
open book
eyes open
open fan
Categorization of `opening’ in English and Korean.
'tear awayfrom base'
YELTA'remove barrier tointerior space'
PPAYTA
‘unfit’
TTUTA‘rise’
PELLITA'separate two partssymmetrically'
take offwallpaper
unwrappackage
spreadlegs apart
take offring
take cassetteout of case
sun rises
spread blanket outpeacock spreads tail
'spread out flat thing'
TTUTA
PHYELCHITA
(Pye 1995, 1996)
PLATE STICK ROPE CLOTHES
può puòduàn(long rigid thing)
MANDARIN può
-q’upi:j(other hardthing)
rach’aqij (“tear”)
-tóqopi’j(long, flexiblething)
-paxi:j(rock, glass,clay thing)
K’ICHE’MAYAN
tear, ripbreakENGLISH breakbreak
http://www.mpi.nl/people/bowerman-melissa
http://www.mpi.nl/people/bowerman-melissa/publications
Processing multimodal information
Acknowledgements:
Finnish Broadcasting Company (YLE)
An example of automatic multimedia content analysis
users.ics.aalto.fi/jorma/scholar.google.com/citations?user=suHzeyIAAAAJ&hl=en
users.ics.aalto.fi/mikkok/elec.aalto.fi/en/about/careers/professors/mikko_kurimo/
Jorma Laaksonen
MikkoKurimo
Speakerrecognition
Video analysis / scene classification
Speech recognition(speech to text)
Video analysis / scene classification
Speakerrecognition
Speech recognition(speech to text)
OCR
Movement verbs
David Bailey's thesis (1997):
Verbs related to hand movement
Point of view fromcognitive linguistics
● The meaning of linguistic symbols in the mind of the language users derives from the users' sensory perceptions, their actions with the world and with each other.
● For example: the meaning of the word 'walk' involves– what walking looks like– what it feels like to walk and after having walked
– how the world looks when walking (e.g. objects approach at a certain speed, etc.).
– ...
Abstract vs concrete grounding
Ronald Langacker
Multimodally Grounded Language Technology
A project funded by Academy of Finland2011-2014
Timo Honkela as the Principal Investigator
A collaboration betweendepartments of
* Information and Computer Science, and
* Media Technology
Consider how different languagesdivide the conceptual space
in different ways(cf. e.g. Melissa Bowerman et al.) Förger & Honkela 2013
Analysis ofsubjectivity
GICA: Grounded IntersubjectiveConcept Analysis
Analysis of “health” in theState of the Union addresses
Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity. Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar.Proc. of IJCNN 2012.
Thank you for your attention!