1. Perception of complex sounds Types of complexity: spectrum/time/space Main purpose: auditory...

1. Perception of complex sounds

• Types of complexity: spectrum/time/space• Main purpose: auditory scene analysis• Overlaid function: communication• Examples:

– Cocktail party effects• Blind source separation: ICA(1) ICA(2)• F0 as stream separator: (example)• Streaming: loss or gain of information?• Temporal induction (example)

– Some acoustic decompositions• source/filter (demo) (another)• envelope/fine structure (demo)

http://www.cnl.salk.edu/~tewon/Blind/blind_audio.html

http://galilee.swan.ac.uk/home/eerichar/sfdemo/sfdemo.html

http://galilee.swan.ac.uk/home/eerichar/humoursynth/humoursynth.html

http://epl.meei.harvard.edu/%7Ebard/chimera/chimera_demos.html

2. Biology of acoustic communication

• Gibbons

• Humans– Laryngeal lowering– Torsional cerebral asymmetry– Categorical perception & vocabulary convergence

http://www.ling.umu.se/~rand/KatPer/index.eng.html

Gibbons

• Arboreal apes– tropical rain forests of southeast asia– 12 species in four (sub-)genera

• subgenera are somewhat more different than humans and chimps

– brachiation– monogamy

• like 3% of mammal species• 90% of bird species

Gibbons and us:Primate Phylogeny

Among the apes, only gibbons and humans have pair bonding.Also, only gibbons and humans sing…

Gibbon duetting

All species of gibbons are known to produce elaborate, species-specific and sex-specific patterns of vocalisation often referred to as "songs" (Haimoff, 1984; Marshall & Marshall, 1976). Songs are loud and complex and are mainly uttered at specifically established times of day. In most species, mated pairs may characteristically combine their songs in a relatively rigid pattern to produce coordinated duet songs. Several functions have been attributed to gibbon songs, most of which emphasise a role in territorial advertisement, mate attraction and maintenance of pair and family bonds (Geissmann, 1999; Geissmann & Orgeldinger in press; Haimoff, 1984; Leighton, 1987).

The female “great call”

The most prominent song contribution of female gibbons consists of a loud, stereotyped phrase, the great call. Depending on species, great calls typically comprise between 6-100 notes, have a duration of 6-30 s. The shape of individual great call notes and the intervals between the notes follow a species-specific pattern.

. A female song bout is usually introduced by a variable but simple series of notes termed the introductory sequence; it is produced only once in a song bout. Thereafter, great calls are produced with an interval of about 2 min. In the intervals, [are] so-called interlude sequences consisting of shorter, more variable phrases … The typical female song bout hence follows the sequential course ABCBCBCBC…,

Male duet contributions

As a rule, adult males do not produce great calls, but "male short phrases" only. Whereas female great calls remain essentially unchanged throughout a song bout, males gradually build up their phrases, beginning with single, simple notes. As less simple notes are introduced, these notes are combined to increasingly complex phrases, reaching the fully developed form only after several minutes of singing …

During duet songs, mated males and females combine their song contributions to produce complex, but relatively stereotyped vocal interactions… Both pair partners contribute to an introductory sequence at the beginning of the song bout (A). Thereafter, interlude sequences (B) and great call sequences (C) are produced in successive alternation…

During great call sequences the male becomes silent and does not resume calling until near or shortly after the end of the female's great call, when he will produce a coda.

Gibbon song samples

• Hylobates Lar– white-handed gibbon

– Female “great call” with male “coda”

• Hylobates Muelleri– gray gibbon

– Female “great call”

with male “coda”

H. Muelleri x H. Lar:

H. Lar

H. Lar x H. Muelleri:

H. Muelleri:

Hybrid Songs

Phylogeny of singing in primates

Singing is rare in mammals. It occurs in members of 26 species in four primate genera: Indri, Tarsius, Callicebus, Hylobates. These are 11% of primate species and 4% of primate genera. Since the four singing genera are widely separated, they are thought to have evolved singing independently.

In all singing primates, both males and females sing, and duetting usually if not always occurs. All singing primates are monogamous.

Most bird species sing; often bird song is mostly male; duetting bird species are also usually monogamous.

Gular sac

Some gibbons have developed a large “gular sac” apparently involved with breath control and/or resonance. Gular sac size and song complexity seem to correlate across species.

Symphalangus syndactylus(siamang):“the [siamang] duet is probably the

most complicated opus sung by a land vertebrate other than man…”

--Marshall and Sugardjito (1986)

Pharyngeal changes in hominids

Sexual dimorphism in larynx size and position

AC anterior commissure

VP tip of vocal process

AnAC angle of bilateral vocal folds at AC

GWP glottic width at vocal process level

LEG length of entire glottis

LAG length of anterior glottis

LPG length of posterior glottis

LMF length of membranous vocal fold

Male Female Ratio M/F

AnAC in degrees 16 25

LMF in mm 15.4 9.8 1.57

GWP in mm 4.3 4.2 1.02

LAG in mm 15.1 9.5 1.59

LPG in mm 9.5 6.8 1.40

LEG in mm 24.5 16.3 1.50

(Data from Hirano et al. 1997)

Sex and the larynx

Sex and F0

Verbal ability and lateralization

Verbal ability vs. relative hand skill in 12,000 11 yr old children.Verbal score = # of phonological, semantic, logical word sequence completions.Relative hand skill = (R-L)/(R+L)*100 [number of squares checked/minute]

Data from T.J. Crow et al. Neuropsychologia (1998)

Torsional cerebral asymmetries

Evolutionary history

There are four major reorganizational changes that have occurred during hominid brain evolution, viz.: (1) reduction of the relative volume of primary visual striate cortex area, with a concomitant relative increase in the volume of posterior parietal cortex, which in humans contains Wernicke's area; (2) reorganization of the frontal lobe, mainly involving the third inferior frontal convolution, which in humans contains Broca's area; (3) the development of strong cerebral asymmetries of a torsional pattern consistent with human right-handedness (left-occipital and right-frontal in conjunction); and (4) refinements in cortical organization to a modern human pattern, most probably involving tertiary convolutions. (this last 'reorganiziation' is inferred; in fact, there is no direct palaeoneurological evidence for it.)

-Ralph Holloway Evolution of the Human Brain (1996)

Encephalization

Progress?

For most relatively social adult fishes, birds and mammals, the range or repertoire size [of communicative displays] for different species varies from 15 to 35 displays.

-Encyclopedia Britannica, “Animal Communication”

After 450 million years…Cephelopods: 15-35 distinct displays

Non-human primates: 15-35 distinct displays

Primates are “more evolved” than molluscs

• More complex bodies and brains

• More complex social structures

• More complex and flexible behavior

• Longer lived

• Better at learning and problem solving

• BUT no real change in “vocabulary size”

• limited to a small repertoire of signals• whose categories are built in

– meanings change a bit according to the environment

• reference is immediate, not displaced• “theory of mind” abilities are nonexistent

– or at best very limited

• just like “lower” animals– including some invertebrates

Spontaneous communication in non-human primates is:

Possibilities

• Something about hominid development– e.g. increased brain size caused a “phase

transition”

• Something about evolution & communication– e.g. some aspects of language are evolutionarily

inaccessible• symbolic behavior• large vocabulary• “theory of mind”

The problem of vocabulary consensus

• 10K-100K arbitrary pronunciations• How is consensus established and maintained?

Genesis 2:19-20And out of the ground the Lord God formed every beast of

the field, and every fowl of the air; and brought them unto Adam to see what he would call them: and whatsoever Adam called every living creature, that was the name thereof. And Adam gave names to the cattle, and to the fowl of the air, and to every beast of the field...

Possible solutions

Initial naming authority (Adam)

Natural names (“ding-dong” etc.)

Explicit negotiation

????

Emergent structure

Buridan’s Ants make a decision

Percentage of Iridomyrex Humulis workers passing each (equal) arm of bridge per 3-minute period

Agent-based modeling

• AKA “individual-based modeling”

Ensembles of parameterized entities ("agents") interact in algorithmically-defined ways. Individual interactions depend (stochastically) on the current parameters of the agents involved; these parameters are in turn modified (stochastically) by the outcome of the interaction.

Key ideas of ABM

• Complex structure emerges from the interaction of simple agents

• Agents’ algorithms evolve in a context they create collectively

• Thus behavior is like organic form

BUT

• ABM is a form of programming,

so just solving a problem via ABM has no scientific interest

• We must show relevant general property of some wide class of models

• Paradigmatic example is Axelrod’s work on reciprocal altruism in the

iterated prisoner’s dilemma

Emergence of shared pronunciations

• Definition of success:– Social convergence

(“people are mostly the same”)– Lexical differentiation

(“words are mostly different”)

• These two properties

are required for successful communication

A simple sample model

• Individual belief about word pronunciation: vector of binary random variables

e.g. feature #1 is 1 with p=.9, 0 with p=.1

feature #2 is 1 with p=.3, 0 with p=.7

• (Instance of) word pronunciation: (random) binary vector

e.g. 1 0

• Initial conditions: random assignment of binary values to beliefs

• Channel effect: additive noise

• Perception: assign input feature-wise to nearest binary vector

i.e. categorical perception

• Conversational geometry: circle of errorless pairwise naming among N people

• Update method: linear combination of belief and perception

“leaky integration” of perceptions

It works!• Channel noise = .4• Update constant = .8• 10 people (#1 and #4 shown)

Gradient output = faster convergence

Instead of saying 1 or 0 for each feature, speakers emit real numbers

(plus noise) proportional to their belief about the feature.Perception is still categorical.Result is faster convergence, because better information is provided about

speaker’s internal state.

Gradient input = no convergenceIf we make perception gradient, then (whether or not production is

categorical) social convergence does not occur.

What’s going on?

• Input categorization creates “attractors” that trap beliefs despite channel noise

• Positive feedback creates social consensus• Random effects generate lexical differentiation• Assertion: any model of this general type needs

categorical perception to achieve social consensus with lexical differentiation

Divergence with population sizeWith gradient perception, it is not just that pronunciation beliefscontinue a random walk over time. They also diverge increasinglyat a given time, as group size increases.

20 people: 40 people:

Pronunciation differentiation

• There is nothing in this model to keep words distinct

• But words tend to fill the space randomly

(vertices of an N-dimensional hypercube)

• This is fine if the space is large enough

• Behavior is rather lifelike with word vectors of 19-20 bits

Homophony comparisonEnglish is plotted with triangles (97K pronouncing dictionary).

Model vocabulary with 19 bits is X’s.

Model vocabulary with 20 bits is O’s.

Conclusions

• As others have argued, categorical (digital) perception is

crucial for a communication system with many well-

differentiated words

• Previous arguments had mainly to do with individual

perception

• There may be arguments of equal force in the area of

collective phenomena

1. Perception of complex sounds Types of complexity: spectrum/time/space Main purpose: auditory...

Documents

Transcript of 1. Perception of complex sounds Types of complexity: spectrum/time/space Main purpose: auditory...