What can corpus phonetics tell us about ‘English’ phonology?

What can corpus phonetics tell us about ‘English’

phonology?

Jane Stuart-SmithGlasgow University Laboratory of Phonetics (GULP)

The SPADE Consortium

English Corpus Phonetics and Phonology at ICAME (digital)

20th May 2020, hosted by Trier University

phonetics

phonology

English corpus

Text over time and space

Huge amounts of annotated speech exist…

• $$ ££

• Software

• Ethics

barriers

corpus phonetics: overcome barriers and scale up scientific study of speech

Huge amounts of annotated speech exist…

Scientific and/or professional user questions, e.g.

• How variable are ‘English’ sounds across space/time?

• $$ ££

• Software

• Ethics

barriers

corpus phonetics: overcome barriers and scale up scientific study of speech

August 2017 – July 2020

http://spade.glasgow.ac.uk/



Investigators



Postdocs & Doc

Michael McAuliffeSoftware development

Rachel MacdonaldProject manager James Tanner

Project PhDSubmitted 11 May!


Arlie Coles (U. de Montréal)

Elias Stengel-Eskin (Johns Hopkins)

VannaWillerton(McGIll)

Michael Goodale, Sarah Mihuc(McGill)

and many more!Stacey HarkinKirsty McCahillMitchell McGeeEdward MarshallJulia MorenoJo Pearce Niamh WalkerEwa Wanat

Jordan HolleyPeter AndrewsKaylynn Gunter

Software large-scale speech analysis

Data from ~40 datasets(socio)linguisticsurveys

Corpus phonetics in practice

Research ’English’ sounds over time and space?

Datasets (speech corpora, lexicons)

Database

import

add measures & structure

querying

Set of linguistic objects

Data file (CSV)

export

Implementation• Python API• Graphical User Interface

McAuliffe et al. Proc. ICPhS 2019

Michael McAuliffe

Integrated Speech Corpus ANalysis (ISCAN)

US and CanadaUK and Ireland

• 40 collected: public/private, 4 countries, 115 years • 25 processed: 30 dialects, ~4500 speakers, ~2060 hours• 18 measured

Datasetshttps://spade.glasgow.ac.uk/the-spade-consortium/

https://spade.glasgow.ac.uk/the-spade-consortium/

What can we learn about English phonology?

StopsLiquids: Scottish rhotics

Vowels: Scottish patterns

Sibilants Vowels: formants

Vowel duration: voicing effect

Stuart-Smith et al. Proc. ICPhS 2019

Mielke et al. Proc. ICPhS 2019

Tanner et al. Toronto WP Ling 2019; Frontiers Comp. Slx 2020

https://spade.glasgow.ac.uk/news-outputs/

Vowels: dynamicsTanner PhD 2020

https://spade.glasgow.ac.uk/news-outputs/

Stuart-Smith et al. Proc. ICPhS 2019

Updated analysishttps://osf.io/bknrg/

How does S-retraction vary across English dialects and speakers?

https://osf.io/bknrg/

Data

• stressed, word-initial /s str ʃ/ e.g seat, street, sheet

• 420 speakers

• 5 corpora ~ 10 dialects

• 98,000 tokens

• spectral Centre of Gravity (CoG)

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

female male

colu

mbu

s

US

: W

est

US

: N

. C

itie

s

rale

igh

Ca

nad

a

Gla

sgow

Scot:

SW

Sco

t: E

Sco

t: H

i/Il

Sco

t: W

colu

mbu

s

US

: W

est

US

: N

. C

itie

s

rale

igh

Ca

nad

a

Gla

sgow

Scot:

SW

Sco

t: E

Sco

t: H

i/Il

Sco

t: W

4000

5000

6000

Dialect

Co

G (

Hz)

seat

sheet

US Canada Scotland

420 speakers~ 75k tokens

higher pitched

/s ʃ/ differ by dialect

S-retraction differs by dialect

●

●

●

●

●

●

●

●●

●

0.6

0.7

0.8

0.9

1.0

rale

igh

colu

mbu

s

Gla

sgow

Sco

t: E

Ca

nad

a

Sco

t: W

Scot:

SW

US

: N

. C

itie

s

Sco

t: H

i/Il

US

: W

est

Corpus

Retr

actio

n r

atio fo

r /s

tr/

… on a continuum (not a dichotomy)

more like ‘s’

more like ‘sh’

US Scotland Canada


Scot: Hi/Il Scot: W

Canada Glasgow Scot: SW Scot: E

columbus US: West US: N. Cities raleigh

−0.5 0.0 0.5 1.0 1.5 2.0−0.5 0.0 0.5 1.0 1.5 2.0

−0.5 0.0 0.5 1.0 1.5 2.0−0.5 0.0 0.5 1.0 1.5 2.0

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

Retraction ratio for /str/

De

nsity (

sp

ea

ke

rs)

S-retraction differs more by speaker


Den

sity

(sp

eake

rs)

Retraction ratio for /str/, e.g. street

USCanada

Scotland

Tanner et al. Toronto WP Ling 2019under review, Frontiers in Computational Sociolinguistics

How robust is the ‘English’ Voicing Effect?

Tanner et al. Toronto WP Ling 2019Tanner et al. Frontiers in Computational Sociolinguistics 2020

Data

•Utterance final, CVC words e.g. beat, bead

• 1964 speakers

• 15 corpora ~ 30 dialects

• ~230,000 tokens

• Vowel duration

James Tanner

bead > beat

bead = beat

1964 speakers~230k tokens

Voicing Effect differs by English dialectEs

tim

ated

Vo

icin

g Ef

fect

Siz

e

North AmericaUK & Ireland

Dialect

bead > beat

bead = beat


Voicing Effect differs by English dialectEs

tim

ated

Vo

icin

g Ef

fect

Siz

e


Dialect

… and is much smaller than in lab speech

bead > beat

bead = beat


Scottish (no Voicing Effect expected)

African American Vernacular English (big Voicing Effect expected)

Voicing Effect differs by dialectEs

tim

ated

Vo

icin

g Ef

fect

Siz

e


Dialect

Voicing Effect differs more by dialect than by speakers

Amount of dialect variability

Am

ou

nt

of

spe

aker

var

iab

ility

English speech over time and spacehttp://152.1.64.33/spade/latest/

Try out our Shiny app!

http://152.1.64.33/spade/latest/

What do we learn about English

phonology?

• confirm expected patterns from current-scale work

• identify new patterns of variability:

Certain features vary more by speaker, and less by dialect (e.g. sibilants)

Others vary more by dialect and less by speaker (e.g. vowel duration)

Why? e.g. Kleinschmidt (2018)

Challenges

• Ethics (multiple countries, GDPR)

• Data (collection, processing)

• Software development

• Some measures elusive (stops, e.g. p t k)

What next?

• analyse the sounds of ‘English’!

• prosody (intonation, voice quality, etc.)

• Expand ‘English’ to World/non-native Englishes

• Beyond English (ISCAN not language-specific)

Thank you!

and to the organizers of this workshop

Documentation

• GUI / server install: https://iscan.readthedocs.io/• Can sign up as tutorial user

• Python API: https://polyglotdb.readthedocs.io/

https://iscan.readthedocs.io/

https://polyglotdb.readthedocs.io/en/latest/

What can corpus phonetics tell us about ‘English’ phonology?

Documents

Transcript of What can corpus phonetics tell us about ‘English’ phonology?