SYNTAX

Chapter 10Perception of Speech

Perry C. Hanavan, Au.D.

Question

How do we perceive speech?

A.Individual sounds (phonemes)?B.Syllables?C.Words?D.Sentences?E.Listening to Mozart? SMART Response Question

To set the properties right click and selectSMART Response Question Object->Properties...

Speech Perception

• How do we perceive speech?– Individual sounds (phonemes)?– Syllables?– Words?– Sentences?

• How do we derive meaning from the ocean of sounds we hear?– Speech is variable– Speakers vary in speech– Variant or invariant cues?

Question

What is Pattern Playback?

A.A music group from India

B.Talking machine built by Dr. Franklin S. Cooper and colleagues at Haskins Laboratories

C.A brain based device for speech perceptionSenteo QuestionTo set the properties right click and select Senteo Question Object->Properties...

Pattern Playback

Question

What is an invariant speech cue?

A.Phonemes coarticulated

B.A phoneme produced in isolation

C.Transition from one phoneme to the next

Senteo QuestionTo set the properties right click and select Senteo Question Object->Properties...

Excitation, Sensation, Cognition

Excitation: The pattern of neural responses elicited by a given stimulus.

Cognition: The interpretation of a sensation on the basis of stored knowledge.

Sensation: An internal representation of the stimulus.

Discrimination

The ability to distinguish between two levels of a stimulus parameter (e.g., different wavelengths of light.

Measured by the Just-Noticeable-Difference threshold.

Uses sensory representation.

Modality dependent

Recognition

The ability to distinguish categorize a stimulus as belonging to a particular class (e.g., colour or object type).

Uses cognitive representation: needs to refer to stored knowledge.

Representation dependent.

The relationship between discrimination and recognition

Recognition relies on discrimination… but does recognition also influence discrimination?

Discriminability seems to be affected by category structures – this is categorical perception.

Categorical perception:

Discriminability across category boundaries is more sensitive than discriminability within categories.

First example of categorical perception: the phoneme boundary effect.

Phonemes are the sounds that make up language: e.g., /b/ & /p/.

The phonemes /b/ and /p/ differ in the time between the onset (stop) and voicing.

Alvin Liberman (1917 – 2000)

Liberman and colleagues (1957) showed a phoneme boundary effect:A smaller change in delay was necessary to distinguish /b/ from /p/, than to distinguish two phonemes within these categories.

The phoneme boundary effect

Motor theory of speech perception: The phoneme boundary effect is caused by activation of the motor program required to produce a phoneme.

Question: Is the way we sense colour affected by the words for colours in our language?

Category boundary effects in the colour domain

Benjamin Lee Whorf (1897-1941)

The question about colour perception can be operationalized:

Color can be objectively measured in terms of its wavelength:

400nm 550nm 700nm

Wavelength

The question about colour perception can be operationalized:

• Single words.• Not subsumed by another term.• Not restricted to a particular class

of objects.

The number of basic color terms in a language can be measured. Basic color terms are:

Early research on color naming

Different languages have a variation in the number of words for colour categories.

English: eleven basic color terms – white, black, grey, red green, blue, yellow, orange, purple, pink, brown.

Dani (New Guinea): Two basic colour terms - mili (light), mola (dark).

Kay and Kempton (1984)

Compared English and Tamahumara speakers.

Tamahumara does not make a distinction between blue and green.

Kay and Kempton theorized that the perceptual distance between blue and green would be exaggerated in English speakers.


3 green

2 green,1 blue

3 blue

G G G

GG B

B B B


Tamahumara speakers were equally likely to choose either extreme for all three types of triplet.


English speakers were the same when all chips came from the same category.

When there was an odd one out, they were more likely to choose that one.

Perception of Vowels

• /a/ vowel has greatest intensity with unvoiced /θ/ as weakest consonant

• Front vowels perceived on basis of F1 frequency and average of F2 and F3, whereas back vowels are perceived on the basis of the average of F1 and F2, as well as F3

• So is it the absolute frequency values of the formants?

• Or the ratio of F2 to F1?• Perhaps it is the invariant cues

(frequency changes that occur with coarticulation

F1

F2/F3

F1/F2

F3

Invariant and Variant Cues

Showing how onset formant transitions that define perceptually consonant [d] differ depending on the identity of the following vowel.

(Formants highlighted by red dotted lines; transitions are the bending beginnings of the formant trajectories.)

/di/

/da/

/du/

Perception of Diphthongs

• Perceived on basis of formant transitions• Salient feature: rapidity of transition

Consonant Perceptions

• Perception different for consonants than vowels• Greater variety of consonant types than vowels• Greater complexity for consonants

Question

Which is TRUE regarding the following statements about categorical perception?

A.Experience of percept invariances in sensory phenomena that can be varied along a continuum.

B.Can be inborn or can be induced by learning.

C.Related to how neural networks in our brains detect the features that allow us to sort the things in the world into separate categories

D.All the above are true

E.All the above are false

Senteo QuestionTo set the properties right click and select Senteo Question Object->Properties...

Categorical Perception• Experience of percept

invariances in sensory phenomena that can be varied along a continuum.

• Can be inborn or can be induced by learning.

• Related to how neural networks in our brains detect the features that allow us to sort the things in the world into separate categories

• area in the left prefrontal cortex has been localized as the place in the brain responsible for phonetic categorical perception

Categorical Perception

CI Speech Coding Strategies

• ACE™: Unique to Cochlear’s Nucleus® 24 CI system. ACE optimizes detailed pitch and timing information of sound.

• SPEAK: (spectral peak) Increases the richness of important pitch information by stimulating electrodes across the entire electrode array.

• MPEAK: multipeak• CIS : (Continuous-Interleaved Sampling) This

high rate strategy uses a fixed set of electrodes. Emphasizes the detailed timing information of speech.

ACE Strategy• Sound enters the speech processor through the microphone

and is divided into a maximum of 22 frequency bands. • Up to 20 narrow-band filters divide sound into corresponding

frequency (pitch) ranges. • Each frequency band stimulates a specific electrode along

the electrode array. • The electrode stimulated depends on the pitch of the sound.

For example, in the word "show," the high pitch sound (sh) causes stimulation of electrodes placed near the entrance cochlea, where hearing nerve fibers respond to high pitch sounds. The low pitch sound (ow) stimulates electrodes further into the cochlea, where hearing nerve fibers respond to low pitch sounds.

• ACE varies the rate of stimulation of the electrodes with a total maximum stimulation rate of 14,400 pulses per second.

SPEAK• Sound enters the speech processor through the microphone

and is divided into 20 frequency bands. • SPEAK selects the six to ten frequency bands containing

maximum speech information. • Each frequency band stimulates a specific electrode along

the electrode array. • The electrode stimulated depends on the pitch of the sound.

For example, in the word "show" the high pitch sound (sh) causes stimulation of electrodes placed near the entrance of the cochlea, where the hearing nerve fibers respond to high pitch sounds. The low pitch sound (ow) stimulates electrodes further into the cochlea, where the hearing nerve fibers respond to low pitch sounds.

• SPEAK's dynamic stimulation along 20 electrodes allows you to perceive the detailed pitch information of natural sound.

CIS

• Sound enters the speech processor through the microphone.

• The sound is divided into 4, 6, 8 or 12 bands depending upon the number of channels used.

• Each band stimulates one specific electrode along the electrode array, sequentially.

• The same sites along the electrode are stimulated for every sound at a fast rate to deliver the rapid timing cues of speech.

SYNTAX

Documents

Transcript of SYNTAX