New developments in multi-modal corpus analysis
Dawn Knight, Louise Mullany, Svenja Adolphs, Kevin Harvey, Daniel Hunt, Catherine Smith and Sarah Atkins
www.nottingham.ac.uk/english/research/cral
Communication
§ Beyond language: Communication as a ‘complex network’ of ‘semiotic channels’ (Brown,
1986: 409). These channels are multimodal
§ There are many possible, ‘d i fferent, independent, pragmatic and semantic functions’ of signs making them specific to their Type, Function and Context of use (Argyle, 1975)
§ Effective communication relies upon the receiver successfully detecting, processing and understanding these interactive ‘signs’ in its given context of use.
www.nottingham.ac.uk/english/research/cral
Corpus and context
www.nottingham.ac.uk/english/research/cral
Key aims
§ To record multiple modes of communication in a natural context.
§ To record both the individual & synchronised patterns of speech / head movements simultaneously, within the same frame of reference.
§ Recordings to be accurate and able to be replayed & annotated in the future.
www.nottingham.ac.uk/english/research/cral
Corpora at Nottingham
§ Mono-modal corpora § CANCODE § CANBEC § The Health Communication Corpus § Teenage Health Freak
§ Multi-modal corpora § The Nottingham Multi-Modal Corpus § The Nottingham Learner Corpora
www.nottingham.ac.uk/english/research/cral
Corpora at Nottingham
§ Heterogeneous corpora § CANELC § Feasibility corpora
§ Thrill § British Art Show (BAS)
www.nottingham.ac.uk/english/research/cral
Introductions
§ DReSS I: Digital Records for eSocial Science
§ HeadTalk
§ NMMC: The Nottingham Multi-Modal Corpus, 125,000 words of single speaker data, 125,000 words of dyadic conversations
§ DRS: The Digital Replay System, a next generation Computer Aided Qualitative Data AnalysiS (CAQDAS) tool
§ DReSS II: Analysing heterogeneous datasets
www.nottingham.ac.uk/english/research/cral
DReSS I: NMMC
www.nottingham.ac.uk/english/research/cral
HeadTalk
www.nottingham.ac.uk/english/research/cral
DReSS I: Tracking Gestures
www.nottingham.ac.uk/english/research/cral
Replay: DRS
Shortcomings of MM corpora
§ Design: Limited to video, audio and textual records which meet a specific research need and/or to answer particular questions.
§ Infrastructure: Strategies and conventions used to record, mark-up, code, annotate and interrogate multimodal corpora vary from one corpus to the next. No ‘standards’ exist.
§ Size: Multi-million word multimodal corpora do not exist as yet.
www.nottingham.ac.uk/english/research/cral
Shortcomings
§ Scope: Generally domain specific, mono-lingual and/or are of a specialist nature. Content is also often pre-planned or scripted, experimental and controlled.
§ Naturalness: Recording conditions, settings and obtrus ive equipment used may compromise the spontaneous/‘naturalistic’ status of the data.
§ Availability and (re)usability: No widely available, large scale corpus has been published to date.
www.nottingham.ac.uk/english/research/cral
Proposed solutions
§ Capturing, as far as possible, discourse from multiple perspectives over time and location, to represent ‘a day in the life’ of a language user: discourse across different spaces, places and ‘modes’ of interaction rather than in fixed and static locations.
§ This will allow us to examine the interactivity between the various modes and how they collaboratively create meaning, providing the impetus for generating richer descriptions of behaviour.
www.nottingham.ac.uk/english/research/cral
Multimodal corpora types
www.nottingham.ac.uk/english/research/cral
‘Spectrum of observation scenarios ranging from highly controlled to truly ethological’ (based on Oertel et al., 2010: 28).
DReSS II: Ubiquitous Corpora
www.nottingham.ac.uk/english/research/cral
CANELC
www.nottingham.ac.uk/english/research/cral
§ CANELC – The Cambridge and Nottingham eLanguage Corpus
§ A 1,000,000 word eLanguage corpus from the following digital resources:
§ Blogs – 250,000 words § Discussion Board content – 150,000 words § Emails (125,000 personal and 125,000 business) § SMS Messages – 100,000 words § Tweets – 250,000 words
Themes and Topics
www.nottingham.ac.uk/english/research/cral
Thrill
§ A 55,000 word corpus of fairground discourse, comprised of synchronised records of audio, video and sensory (i.e. heart rate) data.
§ 55 participants (mainly recorded in pairs) § 19 women, 26 men § Ages range from teens to late 50s § Over 11 hours video
www.nottingham.ac.uk/english/research/cral
Thrill
§ Data has been transcribed and divided into 4 key phases: § Pre-ride phase § The elevation of the ride § Start of the ride § Ride terminus
§ Aims: § To examine whether any patterns emerge in
specific language used within/ across the phases. § To outline and test approaches for the analysis of
ubiquitous data sets for linguistic enquiry.
www.nottingham.ac.uk/english/research/cral
Thrill
www.nottingham.ac.uk/english/research/cral
(Oh) my God
Phase 3 (Oh) my god is used 85 times by 21 different speakers. It occurs most often at phases 2 and 3 of the ride- ride elevation and movement.
Location based data
Early efforts: utilising separate recording devices to collect data ‘on the move’
www.nottingham.ac.uk/english/research/cral
Field Work Tracker
§ A bespoke mobile application which creates detailed location based logs.
§ This was developed to support the capture for qualitative analysis of fieldwork data, providing a cheap and simple multi-function recorder which allows for automated synchronisation of data.
§ Studies can be tracked from the users’ perspective or the researchers’ perspective.
§ As well as [automatically] recording locations, users can take photographs, audio recordings or movies and make text-based notes.
www.nottingham.ac.uk/english/research/cral
DRS and Field Work Tracker
Fieldwork Tracker application
www.nottingham.ac.uk/english/research/cral
British Art Show
www.nottingham.ac.uk/english/research/cral
British Art Show § 10+ hours of transcribed audio data collected
from 3 pairs of visitors (1 M-M, 1 M-F, 1 F-F), capturing:
§ Physical movements § Interactions focused on planning, logistics § Interactions focused on the socially negotiated
goal of viewing art § How they plan, negotiate & find each other § Variation in language through changing
contexts (home, street, gallery, friends & strangers)
www.nottingham.ac.uk/english/research/cral
British Art Show
§ Video clips were recorded by participants and researcher.
§ Photographs were also taken by participants. § The BAS study data was collected using the
Fieldwork Tracker application, thus have all the necessary synchronisation to enable DRS to, with ‘one click’, import all data from a Fieldwork Tracker session into a project in DRS.
www.nottingham.ac.uk/english/research/cral
British Art Show
Analysing data
§ DRS allows users: § To generate word frequency lists § Run concordance searches over multiple
different data sources. § View specific concordance outputs on a map. § Add metadata codes to map, allowing users to
query data by searching for co-occurrences of codes and/or lexical items.
§ Tabulate coded features. § Use coded elements of the map as a means
for drilling into the data.
www.nottingham.ac.uk/english/research/cral
Location data in DRS
www.nottingham.ac.uk/english/research/cral
Concluding remarks
§ Developing a ‘better, multifaceted picture of [language use in] context’ (Bazzanella, 2002: 239) is an on-going challenge.
§ This is crucial to the development of better descriptions of language-in-use and to the development of applications based on those descriptions.
§ The ability to generate more contextually sensitive descriptions of language in use will shed new light on the relationship between form and function.
www.nottingham.ac.uk/english/research/cral
Concluding remarks
§ Access to ubiquitous corpora inevitably requires us to rethink the notion of the unit of analysis in corpus linguistics research.
§ As we develop a better understanding of the nature of the co-dependencies between language and context, the focus of the unit of analysis may shift from the word or sequence of words, to a contextually defined episode of interaction which may include multiple modes of discourse and which is dynamic in nature.
www.nottingham.ac.uk/english/research/cral
Concluding remarks
§ Ongoing developments in this research space would represent a departure from traditional corpus linguistic approaches but it should strengthen the explanatory power of any results that emerge from the study of large principled collections of text in context.
www.nottingham.ac.uk/english/research/cral
Top Related