1. Perception of complex sounds Types of complexity: spectrum/time/space Main purpose: auditory...
-
Upload
zoe-maria-lindsey -
Category
Documents
-
view
217 -
download
3
Transcript of 1. Perception of complex sounds Types of complexity: spectrum/time/space Main purpose: auditory...
1. Perception of complex sounds
• Types of complexity: spectrum/time/space• Main purpose: auditory scene analysis• Overlaid function: communication• Examples:
– Cocktail party effects• Blind source separation: ICA(1) ICA(2)• F0 as stream separator: (example)• Streaming: loss or gain of information?• Temporal induction (example)
– Some acoustic decompositions• source/filter (demo) (another)• envelope/fine structure (demo)
2. Biology of acoustic communication
• Gibbons
• Humans– Laryngeal lowering– Torsional cerebral asymmetry– Categorical perception & vocabulary convergence
Gibbons
• Arboreal apes– tropical rain forests of southeast asia– 12 species in four (sub-)genera
• subgenera are somewhat more different than humans and chimps
– brachiation– monogamy
• like 3% of mammal species• 90% of bird species
Gibbons and us:Primate Phylogeny
Among the apes, only gibbons and humans have pair bonding.Also, only gibbons and humans sing…
Gibbon duetting
All species of gibbons are known to produce elaborate, species-specific and sex-specific patterns of vocalisation often referred to as "songs" (Haimoff, 1984; Marshall & Marshall, 1976). Songs are loud and complex and are mainly uttered at specifically established times of day. In most species, mated pairs may characteristically combine their songs in a relatively rigid pattern to produce coordinated duet songs. Several functions have been attributed to gibbon songs, most of which emphasise a role in territorial advertisement, mate attraction and maintenance of pair and family bonds (Geissmann, 1999; Geissmann & Orgeldinger in press; Haimoff, 1984; Leighton, 1987).
The female “great call”
The most prominent song contribution of female gibbons consists of a loud, stereotyped phrase, the great call. Depending on species, great calls typically comprise between 6-100 notes, have a duration of 6-30 s. The shape of individual great call notes and the intervals between the notes follow a species-specific pattern.
. A female song bout is usually introduced by a variable but simple series of notes termed the introductory sequence; it is produced only once in a song bout. Thereafter, great calls are produced with an interval of about 2 min. In the intervals, [are] so-called interlude sequences consisting of shorter, more variable phrases … The typical female song bout hence follows the sequential course ABCBCBCBC…,
Male duet contributions
As a rule, adult males do not produce great calls, but "male short phrases" only. Whereas female great calls remain essentially unchanged throughout a song bout, males gradually build up their phrases, beginning with single, simple notes. As less simple notes are introduced, these notes are combined to increasingly complex phrases, reaching the fully developed form only after several minutes of singing …
During duet songs, mated males and females combine their song contributions to produce complex, but relatively stereotyped vocal interactions… Both pair partners contribute to an introductory sequence at the beginning of the song bout (A). Thereafter, interlude sequences (B) and great call sequences (C) are produced in successive alternation…
During great call sequences the male becomes silent and does not resume calling until near or shortly after the end of the female's great call, when he will produce a coda.
Gibbon song samples
• Hylobates Lar– white-handed gibbon
– Female “great call” with male “coda”
• Hylobates Muelleri– gray gibbon
– Female “great call”
with male “coda”
Phylogeny of singing in primates
Singing is rare in mammals. It occurs in members of 26 species in four primate genera: Indri, Tarsius, Callicebus, Hylobates. These are 11% of primate species and 4% of primate genera. Since the four singing genera are widely separated, they are thought to have evolved singing independently.
In all singing primates, both males and females sing, and duetting usually if not always occurs. All singing primates are monogamous.
Most bird species sing; often bird song is mostly male; duetting bird species are also usually monogamous.
Gular sac
Some gibbons have developed a large “gular sac” apparently involved with breath control and/or resonance. Gular sac size and song complexity seem to correlate across species.
Symphalangus syndactylus(siamang):“the [siamang] duet is probably the
most complicated opus sung by a land vertebrate other than man…”
--Marshall and Sugardjito (1986)
Sexual dimorphism in larynx size and position
AC anterior commissure
VP tip of vocal process
AnAC angle of bilateral vocal folds at AC
GWP glottic width at vocal process level
LEG length of entire glottis
LAG length of anterior glottis
LPG length of posterior glottis
LMF length of membranous vocal fold
Male Female Ratio M/F
AnAC in degrees 16 25
LMF in mm 15.4 9.8 1.57
GWP in mm 4.3 4.2 1.02
LAG in mm 15.1 9.5 1.59
LPG in mm 9.5 6.8 1.40
LEG in mm 24.5 16.3 1.50
(Data from Hirano et al. 1997)
Sex and the larynx
Verbal ability and lateralization
Verbal ability vs. relative hand skill in 12,000 11 yr old children.Verbal score = # of phonological, semantic, logical word sequence completions.Relative hand skill = (R-L)/(R+L)*100 [number of squares checked/minute]
Data from T.J. Crow et al. Neuropsychologia (1998)
Evolutionary history
There are four major reorganizational changes that have occurred during hominid brain evolution, viz.: (1) reduction of the relative volume of primary visual striate cortex area, with a concomitant relative increase in the volume of posterior parietal cortex, which in humans contains Wernicke's area; (2) reorganization of the frontal lobe, mainly involving the third inferior frontal convolution, which in humans contains Broca's area; (3) the development of strong cerebral asymmetries of a torsional pattern consistent with human right-handedness (left-occipital and right-frontal in conjunction); and (4) refinements in cortical organization to a modern human pattern, most probably involving tertiary convolutions. (this last 'reorganiziation' is inferred; in fact, there is no direct palaeoneurological evidence for it.)
-Ralph Holloway Evolution of the Human Brain (1996)
Progress?
For most relatively social adult fishes, birds and mammals, the range or repertoire size [of communicative displays] for different species varies from 15 to 35 displays.
-Encyclopedia Britannica, “Animal Communication”
After 450 million years…Cephelopods: 15-35 distinct displays
Non-human primates: 15-35 distinct displays
Primates are “more evolved” than molluscs
• More complex bodies and brains
• More complex social structures
• More complex and flexible behavior
• Longer lived
• Better at learning and problem solving
• BUT no real change in “vocabulary size”
• limited to a small repertoire of signals• whose categories are built in
– meanings change a bit according to the environment
• reference is immediate, not displaced• “theory of mind” abilities are nonexistent
– or at best very limited
• just like “lower” animals– including some invertebrates
Spontaneous communication in non-human primates is:
Possibilities
• Something about hominid development– e.g. increased brain size caused a “phase
transition”
• Something about evolution & communication– e.g. some aspects of language are evolutionarily
inaccessible• symbolic behavior• large vocabulary• “theory of mind”
The problem of vocabulary consensus
• 10K-100K arbitrary pronunciations• How is consensus established and maintained?
Genesis 2:19-20And out of the ground the Lord God formed every beast of
the field, and every fowl of the air; and brought them unto Adam to see what he would call them: and whatsoever Adam called every living creature, that was the name thereof. And Adam gave names to the cattle, and to the fowl of the air, and to every beast of the field...
Possible solutions
Initial naming authority (Adam)
Natural names (“ding-dong” etc.)
Explicit negotiation
????
Emergent structure
Buridan’s Ants make a decision
Percentage of Iridomyrex Humulis workers passing each (equal) arm of bridge per 3-minute period
Agent-based modeling
• AKA “individual-based modeling”
Ensembles of parameterized entities ("agents") interact in algorithmically-defined ways. Individual interactions depend (stochastically) on the current parameters of the agents involved; these parameters are in turn modified (stochastically) by the outcome of the interaction.
Key ideas of ABM
• Complex structure emerges from the interaction of simple agents
• Agents’ algorithms evolve in a context they create collectively
• Thus behavior is like organic form
BUT
• ABM is a form of programming,
so just solving a problem via ABM has no scientific interest
• We must show relevant general property of some wide class of models
• Paradigmatic example is Axelrod’s work on reciprocal altruism in the
iterated prisoner’s dilemma
Emergence of shared pronunciations
• Definition of success:– Social convergence
(“people are mostly the same”)– Lexical differentiation
(“words are mostly different”)
• These two properties
are required for successful communication
A simple sample model
• Individual belief about word pronunciation: vector of binary random variables
e.g. feature #1 is 1 with p=.9, 0 with p=.1
feature #2 is 1 with p=.3, 0 with p=.7
• (Instance of) word pronunciation: (random) binary vector
e.g. 1 0
• Initial conditions: random assignment of binary values to beliefs
• Channel effect: additive noise
• Perception: assign input feature-wise to nearest binary vector
i.e. categorical perception
• Conversational geometry: circle of errorless pairwise naming among N people
• Update method: linear combination of belief and perception
“leaky integration” of perceptions
Gradient output = faster convergence
Instead of saying 1 or 0 for each feature, speakers emit real numbers
(plus noise) proportional to their belief about the feature.Perception is still categorical.Result is faster convergence, because better information is provided about
speaker’s internal state.
Gradient input = no convergenceIf we make perception gradient, then (whether or not production is
categorical) social convergence does not occur.
What’s going on?
• Input categorization creates “attractors” that trap beliefs despite channel noise
• Positive feedback creates social consensus• Random effects generate lexical differentiation• Assertion: any model of this general type needs
categorical perception to achieve social consensus with lexical differentiation
Divergence with population sizeWith gradient perception, it is not just that pronunciation beliefscontinue a random walk over time. They also diverge increasinglyat a given time, as group size increases.
20 people: 40 people:
Pronunciation differentiation
• There is nothing in this model to keep words distinct
• But words tend to fill the space randomly
(vertices of an N-dimensional hypercube)
• This is fine if the space is large enough
• Behavior is rather lifelike with word vectors of 19-20 bits
Homophony comparisonEnglish is plotted with triangles (97K pronouncing dictionary).
Model vocabulary with 19 bits is X’s.
Model vocabulary with 20 bits is O’s.