Timo Honkela: Semantic and pragmatics representations of large text corpora
-
Upload
timo-honkela -
Category
Education
-
view
85 -
download
6
Transcript of Timo Honkela: Semantic and pragmatics representations of large text corpora
![Page 1: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/1.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Timo Honkela
FIN-CLARIN Jubilee Seminar andNordic CLARIN Network SeminarUniversity of Helsinki, 9 Jun 2016
Semantic and pragmatic representations
of large text corpora
![Page 2: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/2.jpg)
2
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Agenda
● Digital humanities in Finland● Strategic role of humanities and
social sciences● Research using text corpora
![Page 3: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/3.jpg)
3
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Digital humanities in Finland
● Research in humanities and social sciences is increasingly using digitally stored resources and computational analysis tools
![Page 4: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/4.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Krister Lindénet al.
![Page 5: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/5.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Varieng - Research Unit for the Study of Variation, Contacts and Change in English
Big Data, Rich Data, Uncharted Data19–22 October 2015Helsinki, Finland
Terttu Nevalainen
Irma TaavitsainenTanja Säilyhttp://www.helsinki.fi/varieng/
http://www.helsinki.fi/varieng/people/varieng_saily.html
et al.
![Page 6: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/6.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Multilinguallanguage technology
Jörg Tiedemann
Mathias Creutz et al.
![Page 7: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/7.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Text mining historical newspapers
Mikko Tolonen
Kimmo Kettunen
![Page 8: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/8.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Citizen MindscapesAnalysis of large social media corporain order to increase understanding of
social and societal phenomena
![Page 9: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/9.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Educational efforts:e.g. Digital Humanities Hackathon
![Page 10: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/10.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
In many such research efforts andeducational activities, FIN-CLARINserves as an essential resourceand infrastructure.
![Page 11: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/11.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
In many such research efforts andeducational activities, FIN-CLARINserves as an essential resourceand infrastructure.
Let's celebrate andhave a moment
of applause
http://375humanistia.helsinki.fi/en/humanists/kimmo-koskenniemi
![Page 12: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/12.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Complexity associated withdifferent areas of science
Biological phenomena
Physical phenomena
Cultural phenomena
![Page 13: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/13.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Importance ofhumanities and social sciences
● As surprising it may at first sound, one can claim that humanities and social sciences are the most important ones
● These disciplines deal with topics like language and communication, social condition, historical developments, economy, etc.
● Due to the complexity, research in these areas is challenging; generalizations commonplacein physics are rarely possible
![Page 14: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/14.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Understandingthe phenomena
Theory andknowledgeformation
Qualitative Quantitative
Open data:corpora
Openmethods
Computationalresources
![Page 15: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/15.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Lars Borin
Linguistics hasbeen the first
e-science
![Page 16: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/16.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Challenges:
“Language is BIG”
“Human INTERPRETATION isinherently involved”
Importance of language:
”Language is involved in mostrelevant human activities”
![Page 17: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/17.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Example:
Complexity ofFinnish at thelevel of wordforms
Kimmo Koskenniemi (2013):Johdatus kieliteknologiaan,sen merkitykseen ja sovelluksiin(Introduction to language technology, its significance andapplications)
https://helda.helsinki.fi/bitstream/handle/10138/38503/kt-johd.pdf?sequence=1
![Page 18: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/18.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
> 6000 languages,many more dialects Billions of people
blogs.state.gov
en.wikipedia.org
A large number ofdifferent cultures
en.wikipedia.org A vast number of ways to relatelanguage, concepts andthe world to each other
![Page 19: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/19.jpg)
Simulating processes of language emergence and communication 19
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Language as a system
● Considering natural language as a signal and dynamic system at cognitive and social levels (also in its written form) rather than a symbolic and logical system
● Importance of embodiment (cf. e.g. Harnad) and embeddedness (cf. e.g. Edelman)
● Learning and pattern recognition processes are essential (as opposed to the theories presented e.g. by Chomsky, Fodor, Pinker); much of the learning is bound to be unsupervised
![Page 20: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/20.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Complexity of languageregarding different areas and levels
Structure:morphology and syntax
Meaning: semantics and pragmatics
![Page 21: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/21.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Complexity of languageregarding different areas and levels
Structure:morphology and syntax
Meaning: semantics and pragmatics
What are the nature,granularity, type,
metadata involved, etc.for different researchpurposes in different
areas of linguistics andother areas of humanities
and social sciences?
![Page 22: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/22.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Need toharmonize,build sharedterminologies,theories,frameworks, etc.
Need to modelcontextuality,
ambiguity, vagueness,history-dependence,
change, ambiguity,etc.
![Page 23: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/23.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Need toharmonize,build sharedterminologies,theories,frameworks, etc.
Need to modelcontextuality,
ambiguity, vagueness,history-dependence,
change, ambiguity,etc.
The same medium, language, isthe object of study as well as the
basis for theory formation,representing the ideas and resources, etc.
![Page 24: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/24.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Philosophy of scienceis essential to
understand whatis going on...
Data-driveninductive mode
Hypothesisdriven,
deductive mode
![Page 25: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/25.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
An old research example:
Data-driven emergenceof implicit word
categories that match withhuman syntactic
and semantic intuitions
![Page 26: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/26.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Classical example: Learning meaning from context:
Maps of words in Grimm fairy tales
Honkela, Pulkki & Kohonen 1995
![Page 27: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/27.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Research example:
Multimodallygroundedmodels
of meaning
![Page 28: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/28.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Labeling movements: Associatinghigh-dim. kinesthetic time series
with linguistic labels
Förger & Honkela 2014
![Page 29: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/29.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
RUNNING
WALKING
LIMPING
JOGGING
Förger & Honkela 2014
![Page 30: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/30.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Research example:
Tensor-based analysis ofsubjective aspect
of interpreting linguisticexpressions
![Page 31: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/31.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
GICA: Grounded IntersubjectiveConcept Analysis
Honkela et al. 2012
![Page 32: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/32.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Analysis of the word 'health'
Honkela et al. 2012
![Page 33: Timo Honkela: Semantic and pragmatics representations of large text corpora](https://reader031.fdocuments.in/reader031/viewer/2022022201/588b39b71a28ab5a5b8b5999/html5/thumbnails/33.jpg)
Timo Honkela, FIN-CLARIN Jubilee seminar, 9.6.2016
Ideas for building corpora
● Espansion of the contextual framework● Enriching metadata● Increasing multimodal data sources
that associate linguistic data with othermodalities
● Involving large number of peoplein labeling data to model variation
● Collecting data in real world contexts