Some from Heim Chap 13 Sound you tube Learning outcomes Describe the basics of human hearing Explain...

21
Some from Heim Chap 13 Sound you tube

Transcript of Some from Heim Chap 13 Sound you tube Learning outcomes Describe the basics of human hearing Explain...

Some from Heim Chap 13

Sound

you

tube

Learning outcomes• Describe the basics of human hearing

• Explain the difference between visual and auditory interaction

• Describe the classes and subclasses of sound output and the attributes of each

• Describe the classes and subclass of sound input and recognition and attributes of each 2

the human 1

3

Hearing• Provides information about environment:

distances, directions, objects etc.

• Physical apparatus:• outer ear – protects inner and amplifies sound• middle ear – transmits sound waves as

vibrations to inner ear• inner ear – chemical transmitters are released

and cause impulses in auditory nerve

• Sound• pitch – sound frequency• loudness – amplitude• timbre – type or quality

Sound is vibration

1-4

http://www.hsc.csu.edu.au/ipt/mm_systems/3288/

digitising_sound_answers.htm

Timbre is harmonic structure• A sine wave is all energy on the ‘first harmonic’ or

‘fundamental’ frequency (sounds like O)• Other shapes of sound wave come from a distribution of energy

into other multiples of the fundamental

1-5http://hyperphysics.phy-astr.gsu.edu/hbase/

audio/geowv.html

http://www.sfu.ca/sonic-studio/handbook/

Triangle_Wave.html

the human 1

6

Hearing (cont)• Humans can hear frequencies from 20Hz to 15kHz• less accurate distinguishing high frequencies than low.• Higher frequencies disappear as you get older

• Auditory system filters sounds• can attend to sounds over background noise. • for example, the cocktail party phenomenon.• Hearing aids disrupt this filtering

• Hearing is involuntary• A sudden ‘grabs’ attention before we think• And some sounds are harder to ignore (e.g. baby crying)

• ‘Listening’ is voluntary (largely)• Whether we choose to process the meaning, especially if the sound is

language (although something like hearing your name is pretty well involuntary)

the human 1

7

What if….• You are in a noisy environment• Night clubbing

• Phone call/ text message?

• Your hearing is below average• You are deaf

Sound versus Visual

Sound exists in time and over space, vision exists in space and over time.(Gaver, 1989)

-Sound is only there when it is playing/made-Vision is there until it is replaced 8

Sound Interaction• Computer Output/Generation (input to human)• Non speech • Music• Audio Icons and Earcons

• Speech

• Computer Input/Recognition• Speech• Non speech• Environmental • Music 9

Computer Output: Music

• Can be pre-recorded or generated•Movies• Games

• Immersive experiences• Activates your brain in a different way

from language• Acts almost entirely independently from

hand-to-eye processing10

Generating music• Exciting area for artists• Everything from pseudo real to completely abstract• There are Jazz music generators that only skilled people can

differentiate from actual musicians.• Serato – dj software (www.serato.com) • Auckland company doing fantastic things • Several UOA grads

there

11

Auditory Icons and Earcons• The difference between these two is subtle• Auditory icons: emphasis on ‘natural’ sounds and metaphor with

real world• e.g. sound of filling a bottle with water to match moving a

large file• Earcons: ‘Artificial’ sounds (generated)• e.g. more abstract metaphorical relationship to action or

purely a convention (like corporate colour schemes)

Windows hardware fail insert remove

12

Auditory Icons and Earcons

• Redundant Encoding• It aids memory by adding additional associations.• Can alert without interrupting (well, at least leaves the visual

field clear) • An alterative communications channel.

• Positive/Negative Feedback• Auditory alarms might be crucial to the safe operation of

computer-operated machinery or mission-critical environments• Too many alarms

• Annoying • Ignored 13

Using Sound in Interaction Design• Learnability of the mapping between the icon and the

object represented• “Oink” and “bow wow” have high articulatory

directness (low distance between ‘appearance’ and function [or denotation])• A swishing sound accompanying a paintbrush tool also

has high articulatory directness• A system beep, on the other hand, carries no

information about what it denotes (but we may quickly learn to associate it with an error; and the square wave structure is a bit toward unpleasant, so it’s better for an error than feedback on success)

14

Can you remember earcons?• How many?• How often do you hear them?

• Can you intuitively tell what these mean?

On Sleep Off

Mis-

recognize

d

Dis-

ambiguate

15

Speech Output

• Eyes free operation•Alternative output channel•Good for checking your essays •Navigation is hard• Back tracking, • Finding location of a particular thing

16

Speech Output• Recorded• Menu choices for telephone systems• Books or other multimedia experiences

• Generated (‘text-to-speech’, TTS)• Synthesizer built into Office• See http://office.microsoft.com/en-nz/powerpoint-help/using-the-speak-text-

to-speech-feature-HA102066711.aspx

• Google Translate has a nice one too (better, I think)

• Can give pronunciation rules (the Google one sounds British to me, see also http://www.bell-labs.com/project/tts/sable.html)

• Still sound a little artificial• Best synthesizers have a physical model of the tongue and

breath to give natural flow between phonemes 17

Sound Input•Speech

•Environmental

•Music18

Speech Recognition

• Two distinct applications: • Transaction• Transcription

• Transaction• Telephone menu systems

• Choose from a limited number of options, works ok

• Automatic speech recognition (ASR) • Built into operating systems• Siri (iPhone) and Android are ~~ usable

• This is a triumph of Artificial Intelligence

• Very difficult, ongoing research problem• Not just about recognizing phonemes but also finding the ‘right’

interpretation (helped e.g. by statistical word triple frequencies, but better if AI is ‘deeper’)

19

Searching Speech and Audio• Sound files do not afford easy opportunities for indexing and

searching• Speech recognition can be used to transcribe speech files and

create transcripts that can be searched like any other text file• So long as recognition accuracy is ok, which it isn't at the

moment• Tune identification apps• Hum a bit of the tune and it tells you what it is! (e.g. Soundhog)

20

Summary• Describe the basics of human hearing

• Explain the difference between visual and auditory interaction• Sound is transitory

• Describe the classes and subclasses of sound output and the attributes of each• Non speech

•Music•Earcons

• Speech

• Describe the classes and subclass of sound input and recognition and attributes of each• Speech

•Transaction•Transcription 21