Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World...
-
Upload
marcus-sharp -
Category
Documents
-
view
214 -
download
0
Transcript of Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World...
Dr. Werner Hemmert, CPR ST2003-12-02 Page 1
Neuro-IT Roadmap: Successful in the Physical World
• Robust perception
• Image processing
• Speech recognition
• Multimodal human machine interaction
• System integration
• Scene analysis and representation
Dr. Werner Hemmert, CPR ST2003-12-02 Page 2
Automotive: Overtake-Checker and Door-Opener Assistant
Lane-basedtransformation
eVehicles
Temporal feedback
Image
Lane
b
c d
f
Contourextraction
Motion estimationalong contours
Temporally stabilizedmotion segmentation
Vehicle detection
a Dr. Axel TechmerInfineon Technologies
Dr. Werner Hemmert, CPR ST2003-12-02 Page 3
Security: Face Detection & Recognition
a
dc
b
Leading edge approach of face
detection (University of Bochum) Detection of face regions (a) Pre-selecting of frontal faces (b) Face recognition (c,d)
Elastic graph matchingGabor Wavelet Transform
Ruhr University Bochum
Dr. Werner Hemmert, CPR ST2003-12-02 Page 4
Vision Instruction Processor (VIP)
Infineon Technologies, Corporate Research, Systems Technology
Dr. Werner Hemmert, CPR ST2003-12-02 Page 5
Vision Instruction Processor (VIP)
16 parallelProcessing
Elements
Prototype available since May 2001:
SIMD - Architecture
204 instructions 10 Million logic transistors On-chip memory: 37KB Technology: 0.35µm Clock: 100 MHz Power consumption:
100µW/MOPS Die size: 22mm x 23mm Peak Performance: 53 GOPS
in 0.13µm CMOS Technology:
Clock: 200 MHz
Peak Perf.: 106 GOPS
Die Size: 70 mm²
Power Consump.: 700 mW
PCI-Board with VIP and camera
submodulesSoftware Tools for VIP:
Compiler, Debugger, ProfilerSoftware Tools on Host:
MS Visual C++ with VPL++-
LibraryApplication demonstrators
Car Vision, Face recognition,
MPEG2, GraphicInfineon Technologies, Corporate Research, Systems Technology
Dr. Werner Hemmert, CPR ST2003-12-02 Page 6
Car Vision Components - Hardware
othersensor
s
CPUVehiclecontrol
othersensors
Dr. Axel TechmerInfineon Technologies
Dr. Werner Hemmert, CPR ST2003-12-02 Page 7
Neuro-IT Roadmap: Successful in the Physical World
Robust perception
Image processing
• Speech recognition
• Multimodal human machine interaction
• System integration
• Scene analysis and representation
Dr. Werner Hemmert, CPR ST2003-12-02 Page 8
Classical Sound Processing for Speech Recognition
A/D8 kHz
| FFT |
25 mswindow
every10 ms
Meltransformation
smoothedCepstrum
&loudness
normalized
Features HiddenMarkovModel
components
d/dt
d/dt
firstderivatives
secondderivatives
LOG&
threshold40 Hz
100 frequencies 24 channels 12 components 36 features
2 kHz
.
.
.
.
.
.
.
.
.
.
.
.
Filter
Microphone
4 kHz
80 Hz
160 Hz
Dr. Werner Hemmert, CPR ST2003-12-02 Page 9
Speech production: time waveform
Dr. Werner Hemmert, CPR ST2003-12-02 Page 10
|FFT| resolves neither frequency nor temporal structure
20 ms window
|FFT|• frequency resolution: 50 Hz• temporal resolution: 20 ms
Dr. Werner Hemmert, CPR ST2003-12-02 Page 11
Classical Sound Processing for Speech Recognition
A/D8 kHz
| FFT |
25 mswindow
every10 ms
Meltransformation
smoothedCepstrum
&loudness
normalized
Features HiddenMarkovModel
components
d/dt
d/dt
firstderivatives
secondderivatives
LOG&
threshold40 Hz
100 frequencies 24 channels 12 components 36 features
2 kHz
.
.
.
.
.
.
.
.
.
.
.
.
Filter
Microphone
4 kHz
80 Hz
160 Hz
time structure of speech signal (<20 ms) is lost in the magnitude spectrum (|FFT|)
Humans extract both temporal- and spectralinformation for robust speech recognition
Dr. Werner Hemmert, CPR ST2003-12-02 Page 12
Auditory Sound Processing
soundsignal
earcanal
middleear
Dr. Werner Hemmert, CPR ST2003-12-02 Page 13
Auditory Sound Processing
inner earhydrodynamics
100µm
soundsignal
earcanal
middleear
Dr. Werner Hemmert, CPR ST2003-12-02 Page 14
0 5 10 15 20 25 30 35
10-6
10-7
10-8
10-9
10-10
cochlear location (mm)
BM
dis
pla
cem
ent
(m)
level (dBSPL)
120
100
80
60
40
20
0
Dynamic Compression in the Inner Ear
basal apical
rate threshold
spee
ch
rang
e
Inner ear model responses to 1 kHz tones
spee
ch
rang
e
BW
Dr. Werner Hemmert, CPR ST2003-12-02 Page 15
Auditory Sound Processing
sensorycell
inner earhydrodynamics
soundsignal
earcanal
middleear
synapticmechanisms
Dr. Werner Hemmert, CPR ST2003-12-02 Page 16
Coding of Sound into Action Potentials
time (ms)
coch
lea
r lo
catio
n (
mm
)
0 20 40 60 80 100
5
10
15
20
25
30
F3
F2
F1
F0
regular firing pattern (t=10 ms f0=100 Hz)
low
high
fre
qu
enc
y
Dr. Werner Hemmert, CPR ST2003-12-02 Page 17
Spectral- and Temporal Sound Processing in the Auditory Pathway
Dr. Werner Hemmert, CPR ST2003-12-02 Page 18
Neuro-IT Roadmap: Successful in the Physical World
Robust perception
Image processing
Speech recognition
• Multimodal human machine interaction
• System integration
• Scene analysis and representation
Dr. Werner Hemmert, CPR ST2003-12-02 Page 19
Audio-Visual Speech Recognition
Dr. Werner Hemmert, CPR ST2003-12-02 Page 20
Audio-Visual Speech Recognition
Tracking of lip motion with sub-pixel precision
Dr. Werner Hemmert, CPR ST2003-12-02 Page 21
Audio-Visual Speech Recognition
Tracking of lip motion with sub-pixel precision
“two - one - seven - three - five - nine - eight - zero - four - six”Hidden-
Markov
Speech
Recognizer
0 2 4 6 8 10 12
10 pixels
Variation of
mouth width
mouth height
nose to chindistance
time (s)
Dr. Werner Hemmert, CPR ST2003-12-02 Page 22
Multi-modal: Pointing, gaze, gestures, mimics,…
Dr. Axel Steinhage, Infineon Technologies AG
Dr. Werner Hemmert, CPR ST2003-12-02 Page 23
Neuro-IT Roadmap: Successful in the Physical World
Robust perception
Image processing
Speech recognition
Audio-visual speech recognition
Multimodal human machine interaction
• System integration
• Scene analysis and representation
Dr. Werner Hemmert, CPR ST2003-12-02 Page 24
Man-Machine-Interaction based on natural communication channels
Virtual Personal Assistant (VPA)
Natural channels speech, lip-motion, gestures ...
Cheap sensors(Webcam,Microphone)
Items presented by VPA
Interactive comunication between user and VPA
Dr. Axel Steinhage, Infineon Technologies
Dr. Werner Hemmert, CPR ST2003-12-02 Page 25
Man-Machine-Interaction based on
natural communication channels
Virtual Personal Assistant (VPA)
Human expert via Advanced Videophone (HHI)
Natural channels speech, lip-motion, gestures ...
Cheap sensors(Webcam,Microphone)
Items presented by VPA
Interactive comunication between user and VPA
Advanced Videophone
Dr. Axel Steinhage, Infineon Technologies
Dr. Werner Hemmert, CPR ST2003-12-02 Page 26
What do we earn from Neuro-IT ?
• Sensitive Sensors
World knowledge “Constructed brain”
Robust processing “Tools for Neuroscience” “Successful in the Physical World”
“Conscious Machines”
• Robust perception
• Image processing
• Speech recognition
• Scene analysis and representation
• Intelligent human-machine interaction
• Natural feedback
• Intelligent virtual person
• Self learning Software
• Massively parallel processing hardware Digital and/or analog neuronal networks
“Factor 10”
Dr. Werner Hemmert, CPR ST2003-12-02 Page 27
Neuro-IT Roadmap: Successful in the Physical World
Prof. Dr. Dr. h.c. H.-P. ZennerProf. Dr. A.W. Gummer
Werner Hemmert Infineon technologies AG CPR-ST
Prof. Dr. D.M. FreemanDr. M. Mermelstein, B. Tsai
U. Dürig, M. Despont, G. Genolet,U. Drechsler, P. Vettiger, G. Binning
MIT Micromechanics Group
Prof. Dr. U. Ramacher J.-P. de la Cruz-Guiterrez, M. HolmbergDr. A. Steinhage, Dr. A. Techmer
Explore the Future -Corporate Research