What can corpus phonetics tell us about ‘English’ phonology?

31
What can corpus phonetics tell us about ‘English’ phonology? Jane Stuart-Smith Glasgow University Laboratory of Phonetics (GULP) The SPADE Consortium English Corpus Phonetics and Phonology at ICAME (digital) 20 th May 2020, hosted by Trier University

Transcript of What can corpus phonetics tell us about ‘English’ phonology?

Page 1: What can corpus phonetics tell us about ‘English’ phonology?

What can corpus phonetics tell us about ‘English’

phonology?

Jane Stuart-SmithGlasgow University Laboratory of Phonetics (GULP)

The SPADE Consortium

English Corpus Phonetics and Phonology at ICAME (digital)

20th May 2020, hosted by Trier University

Page 2: What can corpus phonetics tell us about ‘English’ phonology?

phonetics

phonology

English corpus

Text over time and space

Page 3: What can corpus phonetics tell us about ‘English’ phonology?

Huge amounts of annotated speech exist…

• $$ ££

• Software

• Ethics

barriers

corpus phonetics: overcome barriers and scale up scientific study of speech

Page 4: What can corpus phonetics tell us about ‘English’ phonology?

Huge amounts of annotated speech exist…

Scientific and/or professional user questions, e.g.

• How variable are ‘English’ sounds across space/time?

• $$ ££

• Software

• Ethics

barriers

corpus phonetics: overcome barriers and scale up scientific study of speech

Page 5: What can corpus phonetics tell us about ‘English’ phonology?

August 2017 – July 2020

http://spade.glasgow.ac.uk/

Page 6: What can corpus phonetics tell us about ‘English’ phonology?

http://spade.glasgow.ac.uk/

Investigators

Page 7: What can corpus phonetics tell us about ‘English’ phonology?

http://spade.glasgow.ac.uk/

Postdocs & Doc

Michael McAuliffeSoftware development

Rachel MacdonaldProject manager James Tanner

Project PhDSubmitted 11 May!

Page 8: What can corpus phonetics tell us about ‘English’ phonology?

Arlie Coles (U. de Montréal)

Elias Stengel-Eskin (Johns Hopkins)

VannaWillerton(McGIll)

Michael Goodale, Sarah Mihuc(McGill)

and many more!Stacey HarkinKirsty McCahillMitchell McGeeEdward MarshallJulia MorenoJo Pearce Niamh WalkerEwa Wanat

Jordan HolleyPeter AndrewsKaylynn Gunter

Page 9: What can corpus phonetics tell us about ‘English’ phonology?

Software large-scale speech analysis

Data from ~40 datasets(socio)linguisticsurveys

Corpus phonetics in practice

Research ’English’ sounds over time and space?

Page 10: What can corpus phonetics tell us about ‘English’ phonology?

Datasets (speech corpora, lexicons)

Database

import

add measures & structure

querying

Set of linguistic objects

Data file (CSV)

export

Implementation• Python API• Graphical User Interface

McAuliffe et al. Proc. ICPhS 2019

Michael McAuliffe

Integrated Speech Corpus ANalysis (ISCAN)

Page 11: What can corpus phonetics tell us about ‘English’ phonology?

US and CanadaUK and Ireland

• 40 collected: public/private, 4 countries, 115 years • 25 processed: 30 dialects, ~4500 speakers, ~2060 hours• 18 measured

Datasetshttps://spade.glasgow.ac.uk/the-spade-consortium/

Page 12: What can corpus phonetics tell us about ‘English’ phonology?

What can we learn about English phonology?

StopsLiquids: Scottish rhotics

Vowels: Scottish patterns

Sibilants Vowels: formants

Vowel duration: voicing effect

Stuart-Smith et al. Proc. ICPhS 2019

Mielke et al. Proc. ICPhS 2019

Tanner et al. Toronto WP Ling 2019; Frontiers Comp. Slx 2020

https://spade.glasgow.ac.uk/news-outputs/

Vowels: dynamicsTanner PhD 2020

Page 13: What can corpus phonetics tell us about ‘English’ phonology?
Page 14: What can corpus phonetics tell us about ‘English’ phonology?

Stuart-Smith et al. Proc. ICPhS 2019

Updated analysishttps://osf.io/bknrg/

How does S-retraction vary across English dialects and speakers?

Page 15: What can corpus phonetics tell us about ‘English’ phonology?

Data

• stressed, word-initial /s str ʃ/ e.g seat, street, sheet

• 420 speakers

• 5 corpora ~ 10 dialects

• 98,000 tokens

• spectral Centre of Gravity (CoG)

Page 16: What can corpus phonetics tell us about ‘English’ phonology?

female male

colu

mbu

s

US

: W

est

US

: N

. C

itie

s

rale

igh

Ca

nad

a

Gla

sgow

Scot:

SW

Sco

t: E

Sco

t: H

i/Il

Sco

t: W

colu

mbu

s

US

: W

est

US

: N

. C

itie

s

rale

igh

Ca

nad

a

Gla

sgow

Scot:

SW

Sco

t: E

Sco

t: H

i/Il

Sco

t: W

4000

5000

6000

Dialect

Co

G (

Hz)

seat

sheet

US Canada Scotland

420 speakers~ 75k tokens

higher pitched

/s ʃ/ differ by dialect

Page 17: What can corpus phonetics tell us about ‘English’ phonology?

S-retraction differs by dialect

●●

0.6

0.7

0.8

0.9

1.0

rale

igh

colu

mbu

s

Gla

sgow

Sco

t: E

Ca

nad

a

Sco

t: W

Scot:

SW

US

: N

. C

itie

s

Sco

t: H

i/Il

US

: W

est

Corpus

Retr

actio

n r

atio fo

r /s

tr/

… on a continuum (not a dichotomy)

more like ‘s’

more like ‘sh’

US Scotland Canada

420 speakers~ 77k tokens

Page 18: What can corpus phonetics tell us about ‘English’ phonology?

Scot: Hi/Il Scot: W

Canada Glasgow Scot: SW Scot: E

columbus US: West US: N. Cities raleigh

−0.5 0.0 0.5 1.0 1.5 2.0−0.5 0.0 0.5 1.0 1.5 2.0

−0.5 0.0 0.5 1.0 1.5 2.0−0.5 0.0 0.5 1.0 1.5 2.0

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

Retraction ratio for /str/

De

nsity (

sp

ea

ke

rs)

S-retraction differs more by speaker

420 speakers~ 77k tokens

Den

sity

(sp

eake

rs)

Retraction ratio for /str/, e.g. street

USCanada

Scotland

Page 19: What can corpus phonetics tell us about ‘English’ phonology?

Tanner et al. Toronto WP Ling 2019under review, Frontiers in Computational Sociolinguistics

Page 20: What can corpus phonetics tell us about ‘English’ phonology?

How robust is the ‘English’ Voicing Effect?

Tanner et al. Toronto WP Ling 2019Tanner et al. Frontiers in Computational Sociolinguistics 2020

Page 21: What can corpus phonetics tell us about ‘English’ phonology?

Data

•Utterance final, CVC words e.g. beat, bead

• 1964 speakers

• 15 corpora ~ 30 dialects

• ~230,000 tokens

• Vowel duration

James Tanner

Page 22: What can corpus phonetics tell us about ‘English’ phonology?

bead > beat

bead = beat

1964 speakers~230k tokens

Voicing Effect differs by English dialectEs

tim

ated

Vo

icin

g Ef

fect

Siz

e

North AmericaUK & Ireland

Dialect

Page 23: What can corpus phonetics tell us about ‘English’ phonology?

bead > beat

bead = beat

1964 speakers~230k tokens

Voicing Effect differs by English dialectEs

tim

ated

Vo

icin

g Ef

fect

Siz

e

North AmericaUK & Ireland

Dialect

… and is much smaller than in lab speech

Page 24: What can corpus phonetics tell us about ‘English’ phonology?

bead > beat

bead = beat

1964 speakers~230k tokens

Scottish (no Voicing Effect expected)

African American Vernacular English (big Voicing Effect expected)

Voicing Effect differs by dialectEs

tim

ated

Vo

icin

g Ef

fect

Siz

e

North AmericaUK & Ireland

Dialect

Page 25: What can corpus phonetics tell us about ‘English’ phonology?

Voicing Effect differs more by dialect than by speakers

Amount of dialect variability

Am

ou

nt

of

spe

aker

var

iab

ility

Page 26: What can corpus phonetics tell us about ‘English’ phonology?

English speech over time and spacehttp://152.1.64.33/spade/latest/

Try out our Shiny app!

Page 27: What can corpus phonetics tell us about ‘English’ phonology?

What do we learn about English

phonology?

• confirm expected patterns from current-scale work

• identify new patterns of variability:

Certain features vary more by speaker, and less by dialect (e.g. sibilants)

Others vary more by dialect and less by speaker (e.g. vowel duration)

Why? e.g. Kleinschmidt (2018)

Page 28: What can corpus phonetics tell us about ‘English’ phonology?

Challenges

• Ethics (multiple countries, GDPR)

• Data (collection, processing)

• Software development

• Some measures elusive (stops, e.g. p t k)

Page 29: What can corpus phonetics tell us about ‘English’ phonology?

What next?

• analyse the sounds of ‘English’!

• prosody (intonation, voice quality, etc.)

• Expand ‘English’ to World/non-native Englishes

• Beyond English (ISCAN not language-specific)

Page 30: What can corpus phonetics tell us about ‘English’ phonology?

Thank you!

and to the organizers of this workshop

Page 31: What can corpus phonetics tell us about ‘English’ phonology?

Documentation

• GUI / server install: https://iscan.readthedocs.io/• Can sign up as tutorial user

• Python API: https://polyglotdb.readthedocs.io/