MPI FOR BIOLOGICAL CYBERNETICS DAGM, August 30th EU-geförderte "Cognitive Vision" Projekte am...

22
MPI FOR BIOLOGICAL CYBERNETICS DAGM, August 30th EU-geförderte "Cognitive Vision" Projekte am Max-Planck-Institut für biologische Kybernetik, AG Bülthoff Christian Wallraven Heinrich H. Bülthoff Martin Breidt, Douglas W. Cunningham, Cristobal Curio, Arnulf B.A. Graf, Markus Graf, Adrian Schwaninger
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of MPI FOR BIOLOGICAL CYBERNETICS DAGM, August 30th EU-geförderte "Cognitive Vision" Projekte am...

MPI FOR BIOLOGICAL CYBERNETICS

DAGM, August 30th

EU-geförderte "Cognitive Vision" Projekte am

Max-Planck-Institut für biologische Kybernetik, AG

Bülthoff

Christian WallravenHeinrich H. Bülthoff

Martin Breidt, Douglas W. Cunningham, Cristobal Curio, Arnulf B.A. Graf, Markus Graf, Adrian Schwaninger

MPI FOR BIOLOGICAL CYBERNETICS

DAGM, August 30th

CogVis (Cognitive Vision)WP1 (Recognition &

Categorisation)WP3 (Learning & Adaptation)

http://cogvis.nada.kth.seComputational Vision and Active Perception, Stockholm – Robotics Computer Science Department, Hamburg - Cognitive Systems Laboratory – Spatio-temporal reasoningMPI for Biological Cybernetics, Tübingen – Human psychophysicsSchool of Computing Leeds University – Spatio-temporal learning and reasoningDIST, Genova – Robotics ETH Zurich, Dept of Computer Science – Computer Vision University of Ljubljana Computer and Information Science – Computer Vision

The objective of this project is to provide the methods and techniques that enable construction of vision systems that can perform task oriented categorization and recognition of objects

and events in the context of an embodied agent.

DAGM, August 30th

Workpackage 1: Recognition and Categorisation

Objectives• Based on cognitive research on how humans recognise

and categorise objects and scenes, we will build a computational system that is capable of recognising and categorising objects and events in a natural environment (such as a living-room).

Description of work:• Building a database of 3D objects and elementary

gestures• Cognitive basis for recognition and categorisation• Dynamic multi-cue recognition• Recognition of spatio-temporal structures and relations

Deliverables:• DR.1.1 A database of solid 3D objects and gestures• DR.1.2 Psychophysical results from experiments on recognition & categorisation• DR.1.3 A recognition algorithm exploiting temporal continuity• DR.1.4 A basis set of primitives and qualitative low-level structural relations suitable for object

recognition • DR.1.5 A computational recognition system grounded in cognitive research• DR 1.6 Algorithms for robust subspace recognition • DR 1.7 Algorithm for categorisation using subspace approach

DAGM, August 30th

Highlights of WP1

Cognitive Basis ofRecognition & Categorization

Computer Visionsystems

Modeling of cognitive studies

MPIK

CSL, ETH, KTH, MPIK, UOL

ETH, KTH, MPIK, UOL

DAGM, August 30th

Cognitive basis of recognition& categorization

Psychophysical experiments• How are visual categories formed? • What are the representations used by

humans for recognition and categorization?• How are categorization and recognition

connected?• What are the temporal aspects of

recognition and categorization?• Is there a top-down influence of scene

context on categorization?

DAGM, August 30th

Cognitive basis for recognition and categorization

DAGM, August 30th

Computer Vision

Structured object representations• For categorization using local

features

• For recognition with spatio-temporal information

Multi-cue recognition on a robot Subspace learning

Computer vision for modeling psychophysicsComputer vision for modeling psychophysics

• CogVis Morphed Objects DatabaseCogVis Morphed Objects Database

• Psychophysical experiments:Psychophysical experiments:– Picture-word matching experiment Picture-word matching experiment →→ reaction times reaction times– Typicality task Typicality task →→ typicality ratings typicality ratings

• Computer vision experiment:Computer vision experiment:– Subspace-based categorisationSubspace-based categorisation– Typicality ratings as temporal weightsTypicality ratings as temporal weights

→→ uncertainty of categorisation, reconstruction errorsuncertainty of categorisation, reconstruction errors

Categorisation experiment - resultsCategorisation experiment - results

1% 25% 50% 75% 100%4

4.5

5

5.5

6

6.5

7

7.5

8

Morph transformation

Tip

ical

ity r

ate

1% 25% 50% 75% 100%550

560

570

580

590

600

610

620

630

640

650

Morph transformation

Rea

ctio

n tim

e

1% 25% 50% 75% 100%4

4.5

5

5.5

6

6.5

7

7.5

8

Morph transformation

Wei

ght

1% 25% 50% 75% 100%0

5

10

15

20

25

30

35

40

Morph transformation

Unc

erta

inty

1% 25% 50% 75% 100%0

50

100

150

200

250

300

350

400

450

500

Morph transformation

Rec

onst

ruct

ion

erro

r

Psychophysical Psychophysical experiment:experiment:

Computer Computer vision vision

experiment:experiment:

TRTR weightsweights

RTRT uncertaintyuncertainty reconstr. errorsreconstr. errors

DAGM, August 30th

Workpackage 3: Learning and Adaptation

Objectives:• How is knowledge about objects and events acquired

and maintained? A computational system able to acquire and maintain representations useful for recognition and categorisation as well as control of attention will be developed.

Description of work:• Learning perception-action maps• Learning event regularities• Learning of efficient methods for categorisation of natural

objects• Statistical modelling of objects and events

Deliverables:• DR.3.1 Set-up for experimenting action learning • DR.3.2 Initial implementation of sensorimotor representation for learning and shift of attention learning• DR.3.3 quantitative analysis of the tradeoff between precision and number of classiers for two dierent tasks• DR.3.4 Software package for learning and applying models of interactive behaviour• DR.3.5 A system capable of robustly categorising the objects from the database in a real-world environment• DR.3.6 Framework for the integration of statistical and logic-based models of objects and events• DR 3.7 Algorithms for robust learning of subspace representations • DR 3.8 Framework for robust continuous learning

DAGM, August 30th

Highlights I

A system that learns simple games by observation• Uses vision components to

identify simple visual events (laying down a card)

• Using a reasoning engine (Progol), tries to find rule-based representation explaining the observed state-space

Modeling categorization in humans• Combination of machine-

learning and psychophysics

• Which classifier explains human behaviour best?

• Support Vector Machines seem best candidate

Man or Woman?

DAGM, August 30th

Highlights II

Multi-modal object representations• Access to robotic setup

with arm and cameras allows to explore questions of multiple modalities

• Idea: store matrix of transitions between all possible views, indexed by changes in the proprioceptive state

• Exhaustive action/perception map (predicting views given an action, and vice versa)

MPI FOR BIOLOGICAL CYBERNETICS

DAGM, August 30th

IST project COMICConversational multi-modal interaction with computers

http://www.hcrc.ed.ac.uk/comic/Max Planck Institute for Psycholinguistics, Nijmegen – Fundamental Cognitive ResearchMax Planck Institute for Biological Cybernetics, Tübingen – Fundamental Cognitive Research University of Nijmegen – ASR and AGRUniversity of Sheffield – Dialogue and ActionUniversity of Edinburgh – Fission and OutputDFKI, Saarbrücken – Fusion and System IntegrationViSoft – Graphical part of Demonstrator

Multimodal interaction will only be accepted by non-expert users if fundamental cognitive interaction capabilities of human beings are properly

taken into account

DAGM, August 30th

Vision and approach of COMIC

Obtain fundamental knowledge on multimodal interaction • use of speech, pen, and facial expressions

Develop new approaches for component technologies that are guided by human factor experiments

Obtain hands-on experience by building an integrated multimodal demonstrator for bathroom design that combines new approaches for:• Automatic speech recognition• Automatic pen gesture recognition• Dialogue and Action management• Output generation combining text and speech and facial

expression• System integration• Cognitive knowledge

DAGM, August 30th

Fundamental Research on Facial Expressions

Faces do a lot in a conversation • Lip motion for speaking• Emotional Expression (pleasure, surprise, fear)• Dialog flow (back-channeling: confusion,

comprehension, agreement)• Co-expression (emphasis and word/topic stress)

We aim to broaden the capabilities of Avatars, allowing for more sophisticated self expression and more subtle dialog control.

To this end, we use psychophysical knowledge and procedures as a basis for synthesizing human conversational expressions.

DAGM, August 30th

Real, manipulated and virtual expressions

Real expressions:• We recorded a variety of conversational expressions from

several individuals. • Psychophysical experiments on identification and

believability Manipulated expressions:

• Using computer vision techniques, we manipulated these expressions to freeze selected parts of the face.

• Psychophysical experiment on relative importance of each of these parts for recognition.

Virtual expressions:• We designed and constructed a conversational avatar,

capable of producing realistic-looking facial expressions• Suitable for human-computer interaction• Perfect tool for fully-controllable cognitive research on

perception of facial expressions

DAGM, August 30th

The four faces of thought

DAGM, August 30th

The conversational avatar

MPI FOR BIOLOGICAL CYBERNETICS

DAGM, August 30th

IST project JASTJoint-Action Science & Technology

Nijmegen Institute for Cognition and Information – Human behaviour

F.C. Donders Centre for Cognitive Neuroimaging – Imaging of human behaviour

MPI for Psycholinguistics – Human dialogue behaviourMPI for Biological Cybernetics – Human behaviour

Dept. of Computer Science, TU München – Robotics Institute of Communication and Computer Systems –

ModelingUniversity of Edinburgh, Human Communication Research

Centre – Modeling Dept. of Industrial Electronics, Universidade do Minho –

Robotics Dept. of Mathematics for Science and Technology, Universidade do Minho – Modeling and Robotics

DAGM, August 30th

Objectives

build jointly-acting autonomous systems that communicate and work intelligently on mutual tasks

ensure that the functionality of future technologies includes inherent concepts of cooperative behaviour

DAGM, August 30th

Milestones

The construction of two fully functional autonomous agents that in cooperative configurations of two, three or more will allow, in principle, the completion of complex real-world assembly and construction tasks.

The development of perceptual modules for object recognition and recognition of gestures and actions of the partner (human or robot) and the implementation of biologically inspired sensory-motor control schemes for the co-ordinated action of multiple cognitive systems.

The development of cognitive control architectures for artificial agents based on neurocognitive experimental findings and the implementation of verbal and non-verbal communication structures on the basis of findings from psycholinguistic studies focusing on the role of dialogue in joint action.

The implementation of goal-directed learning processes and sophisticated error monitoring, recognition, and repair strategies to produce a real-world assembly robot scenario that will be capable of partially self-organizing towards stable solutions, taking into account not only its own behaviour (e.g., self-generated errors) but also the behaviour of others (e.g., errors generated by a human or robot partner).

DAGM, August 30th

Other EU-funded projects at AG Bülthoff

Touch HapSys – haptic systems: next generation haptic interfaces, visuo-haptic integration

POEMS – perceptually-oriented ego-motion simulation: how to use audio and visual cues to generate ego-motion perception (VR)

PRA (Network) – Perception for Recognition and Action

ECVision (Network) – European Computer Vision network

Enactive (Network) – multi-modal HCI interfaces