cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations...

6
9/19/13 1 Quoting from Breazeal 2003 1 Case Studies in Design Informatics 1 Jon Oberlander Lecture 2: Designing a Sociable Robot – Kismet Slides quote or paraphrase Breazeal 2003 http://www.inf.ed.ac.uk/teaching/courses/cdi1/ 2 Course Timetable Week Topic Mon Wed Thu Submit 16:00 Thu 1 HRI Intro (JO) Designing a robot (JO) 2 HRI State-of-the-art (JO) Tutorial Towards JAMES (JO) 3 HRI JAMES (RP) Tutorial JAMES (RP) 4 HDI Quantified selves (JO) Tutorial <COLLIDER> A1 5 HDI Quantified problems (JO) Tutorial LPE (JM) 6 HDI LPE (AV) Tutorial Strategic behaviour (SR) A2-draft 7 HDI Actigraphy (MW) Tutorial* Life logging (TBC) 8 HDI Light logging (TBC) Tutorial <COLLIDER> A2 9 Reflection (JO) Tutorial Reflection (JO) 10 Tutorial 11 (Tutorial) A3 3 Structure of lecture Breazeal’s Kismet project: 1. Video introduction 2. Emotion model 3. Emotional expression 4. Lessons 4

Transcript of cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations...

Page 1: cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations (i.e., its ‘‘drives’’ and ‘‘emotions’’) establish its nature by

9/19/13  

1  

Quoting from Breazeal 2003 1

Case Studies in Design Informatics 1 Jon Oberlander

Lecture 2: Designing a Sociable Robot – Kismet

Slides quote or paraphrase Breazeal 2003

http://www.inf.ed.ac.uk/teaching/courses/cdi1/

2

Course Timetable

Week Topic Mon Wed Thu Submit 16:00 Thu

1 HRI Intro (JO) Designing a robot (JO)

2 HRI State-of-the-art (JO) Tutorial Towards JAMES (JO)

3 HRI JAMES (RP) Tutorial JAMES (RP)

4 HDI Quantified selves (JO) Tutorial <COLLIDER> A1

5 HDI Quantified problems (JO) Tutorial LPE (JM)

6 HDI LPE (AV) Tutorial Strategic behaviour (SR) A2-draft

7 HDI Actigraphy (MW) Tutorial* Life logging (TBC)

8 HDI Light logging (TBC) Tutorial <COLLIDER> A2

9 Reflection (JO) Tutorial Reflection (JO)

10 Tutorial

11 (Tutorial) A3

3

Structure of lecture

  Breazeal’s Kismet project: 1.  Video introduction 2.  Emotion model 3.  Emotional expression 4.  Lessons

4

Page 2: cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations (i.e., its ‘‘drives’’ and ‘‘emotions’’) establish its nature by

9/19/13  

2  

Quoting from Breazeal 2003 5

Video introduction to Kismet

  Hardware introduction (4:00) –  http://www.ai.mit.edu/projects/sociable/movies/hardware-

narrative.mov

  Visual attention (1:00) –  http://www.ai.mit.edu/projects/sociable/movies/attention-

narrative.mov

  Affect and feedback (2:30) –  http://www.ai.mit.edu/projects/sociable/movies/affective-

intent-narrative.mov

  Emotional drives (4:00) –  http://www.ai.mit.edu/projects/sociable/movies/emotion-

narrative.mov

  Turn taking (1:30) –  http://www.ai.mit.edu/projects/sociable/movies/

conversation-narrative.mov

6

Cynthia  Breazeal  2003  

  Emo4on  and  sociable  humanoid  robots  

  Int.  J.  Human-­‐Computer  Studies,  59,  119–155  

  Reeves  and  Nass  1996:  –  "a  social  interface  may  be  a  truly  universal  interface"  

  The  ability  for  people  to  naturally  communicate  with  such  machines  is  important.    

  However,  for  suitably  complex  environments  and  tasks,  the  ability  for  people  to  intui4vely  teach  these  robots  will  also  be  important.  

  Ideally,  the  robot  could  engage  in  various  forms  of  social  learning  (imita4on,  emula4on,  tutelage,  etc.),  so  that  one  could  teach  the  robot  just  as  one  would  teach  another  person.  

Quoting from Breazeal 2003 7

Kismet’s  design  

  Kismet  engages  people  in  natural  and  expressive  face-­‐to-­‐face  interac4on:  –  to  eventually  learn  from  them,  reminiscent  of  parent–infant  exchanges  

  perceives  a  variety  of  natural  social  cues  from  visual  and  auditory  channels,  and  

  delivers  social  signals  to  the  human  through  gaze  direc4on,  facial  expression,  body  posture,  and  vocal  babble  

  Actuators,  with  21  degrees  of  freedom  (DoF):  –  3  DoF  direct  the  robot’s  gaze  –  3  control  the  orienta4on  of  its  head,  and  –  15  move  its  facial  features  (e.g.,  eyelids,  eyebrows,  lips,  and  ears).  

  Sensors:  –  4  color  CCD  cameras:  

•  there  is  one  narrow  field  of  view  camera  behind  each  pupil  and  the  remaining  two  wide  field  of  view  cameras  are  mounted  between  the  robot’s  eyes    

–  2  small  microphones  •  one  mounted  on  each  ear  

–  1  lavalier  microphone  worn  by  the  person  is  used  to  process  their  vocaliza4ons.  

Quoting from Breazeal 2003 8

Page 3: cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations (i.e., its ‘‘drives’’ and ‘‘emotions’’) establish its nature by

9/19/13  

3  

Emo4ons  

  Basic,  e.g.  anger,  disgust,  fear,  joy,  sorrow,  and  surprise    endowed  by  evolu4on  because  of  their  proven  ability  to  facilitate  adap4ve  responses    important  reinforcers  for  learning  new  behavior    a  relevance-­‐detec4on  and  response-­‐prepara4on  system  

  Scherer:  –  people  affec4vely  appraise  events  with  respect  to  novelty,  intrinsic  pleasantness,  goal/need  

significance,  coping,  and  norm/self  compa4bility.    These  appraisals  (along  with  other  factors  such  as  pain,  hormone  levels,  drives,  etc.)  

evoke  a  par4cular  emo4on  that  recruits  response  tendencies  within  mul4ple  systems:  –  physiological  changes  (such  as  modula4ng  arousal  level  via  the  autonomic  nervous  system),  –  adjustments  in  subjec4ve  experience,  –  elicita4on  of  behavioral  response  (such  as  approach,  a^ack,  escape,  etc.),  and  –  displaying  expression.  

  Emo4ons  establish  a  desired  rela4on  between  the  organism  and  the  environment  that  –  pulls  the  creature  toward  certain  s4muli  and  events  and  –  pushes  it  away  from  others.  

  Expressive  characteris4cs  of  emo4on  in  voice,  face,  gesture,  and  posture  serve  as  an  important  func4on  in  communica4ng  emo4onal  state  to  others  [and  influencing  them].  

Quoting from Breazeal 2003 9

Kismet  emo4on  processing  

  Kismet  is  to  socially  engage  people  and  ul4mately  to  learn  from  them.  

  Kismet’s  emo4on  and  drive  processes  are  designed  such  that  the  robot  is  in  an  alert  and  mildly  posi4ve  valenced  state  when  it  is  interac4ng  well  with  people  and  when  the  interac4ons  are  neither  overwhelming  nor  under-­‐s4mula4ng.  

  This  corresponds  to  an  environment  that  affords  high  learning  poten4al  as  the  interac4ons  slightly  challenge  the  robot,  yet  also  allow  Kismet  to  perform  well.  

Quoting from Breazeal 2003 10

Drives  –  and  the  func4on  of  emo4onal  responses  

  Drive  to  engage  people,  –  mo4vates  the  robot  to  be  in  the  presence  of  people  and  to  interact  with  

them  

  Drive  to  engage  toys,  –  mo4vates  the  robot  to  interact  with  things,  such  as  colorful  toys  

  Drive  to  occasionally  rest  –  allows  the  robot  to  shut  out  the  external  world,  instead  of  trying  to  regulate  

its  interac4on  with  it  

  The  robot’s  emo4onal  responses  –  mirror  those  of  biological  systems  and  therefore  should  seem  plausible  to  a  

human.  –  bias  the  robot’s  behavior  to  bring  it  into  contact  with  desired  s4muli  

(orienta4on  or  explora4on),  or  to  avoid  poor  quality  or  dangerous  s4muli  (protec4on  or  rejec4on).    

–  [provide]  a  social  signal  to  the  human,  who  responds  in  a  way  to  further  promote  the  robot’s  ‘‘well-­‐being.’’  

Quoting from Breazeal 2003 11

Emo4on  processing  

  An  ‘‘emo4onal’’  reac4on  for  Kismet  consists  of:  –  a  precipita4ng  event;  –  an  affec4ve  appraisal  of  that  event;  –  a  characteris4c  expression  (face,  voice,  posture);  –  ac4on  tendencies  that  mo4vate  a  behavioral  response.  

  High  level  percep4on  invokes  releaser  processes:  –  simple  ‘‘cogni4ve’’  assessments  that  combine  

•  lower-­‐level  perceptual  features  with  •  measures  of  …  internal  [robot]  state  into  

•  behaviorally  significant  perceptual  categories.  

Quoting from Breazeal 2003 12

Page 4: cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations (i.e., its ‘‘drives’’ and ‘‘emotions’’) establish its nature by

9/19/13  

4  

Releasers'  ac4va4on  and  evalua4on  

  Three  factors  –  Drives  -­‐  which  is  current  decides  whether  a  s4mulus  is  (un)desired  –  Affec4ve  state  -­‐  e.g.  soothing  speech  releaser  only  ac4ve  if  state  is  

distressed  –  Ac4ve  behaviours  -­‐  e.g  no-­‐face  condi4on  has  different  effect,  depending  on  

how  long/whether  face  search  is  under  way  

  Ac4vated  releasers  are  then  evaluated  in  terms  of    –  Arousal  (how  s4mula4ng  is  s4mulus)  –  Valence  (how  posi4ve  is  s4mulus)  –  Stance  (how  approachable  is  s4mulus)  

  Four  factors  contribute  to  this  evalua4on:  –  Intensity  of  s4mulus  (close,  fast,  big  =  high  arousal)  –  Relevance  (current  goal  =  posi4ve  valence,  stance)  –  Intrinsic  affect  (praise  =  posi4ve  valence,  arousal;  scolding  =  nega4ve  

valence,  arousal;  threats  =  nega4ve  valence,  high  arousal,  withdraw  stance)  –  Goal  directness  (progress  towards  success  vs  delay)  

Quoting from Breazeal 2003 13

Emo4ons,  expressions,  and  behaviour  

  AVS  values  then  ac4vate  mul4ple  emo4ons  (to  varying  degrees)  

  Expression  of  those  emo4ons  proceeds,  if  their  ac4va4on  exceeds  a  threshold  level.  

  A  winner-­‐take-­‐all  arbitra4on  picks  the  top  emo4on,  which  [when  it  influences  behaviour]  will  then  help  pull  the  robot  system  back  towards  equilibrium.  

  Expression  precedes  behaviour  -­‐  so  allowing  people  to  predict  what  Kismet  will  do,  before  it  happens.  –  (e.g.  shake  toy  too  close  -­‐>  fear  expression,  then  escape  movement)  

  AVS  values  map  onto  facial/vocal  effector  states  

Quoting from Breazeal 2003 14

expressions, as long as those expressions share a family resemblance. Theresemblance exists because the expressions share common facial action units. Thefacial action units characterize how each facial muscle (or combination of facialmuscles) adjusts the skin and facial features to produce human expressions and facialmovements (Ekman and Friesen, 1982). It is also known that different expressionsfor different emotions share some of the same face action components (the raisedbrows of fear and surprise, for instance). It is hypothesized by Smith and Scott(1997) that those features held in common assign a shared affective meaning to eachfacial expression. The raised brows, for instance, convey attentional activity for bothfear and surprise.

5. Models of emotion and drives for a sociable robot

Kismet’s motivations (i.e., its ‘‘drives’’ and ‘‘emotions’’) establish its nature bydefining its ‘‘needs’’ and influencing how and when it acts to satisfy them. As aconvention, we use a different font to distinguish parts of the architecture of thisparticular system from the general uses of this word. For instance, emotion refers tothe particular set of computational processes that are active in the system. When the

arousal

sleep

displeasure pleasure

neutralu

excitement

depression

stress

calm

afraidid

frustrated

relaxed

content

elatedla

bored

sad

happap ypy

surprisep

sleepy

Fig. 2. Russell’s pleasure-arousal space for facial expression. Adapted from Russell (1997).

C. Breazeal / Int. J. Human-Computer Studies 59 (2003) 119–155126

Quoting from Breazeal 2003 15

valence prototype, Ppositive; maps to a content expression. The negative valenceprototype, Pnegative; resembles an unhappy expression. The closed stance prototype,Pclosed; resembles a stern expression, and the open stance prototype, Popen; resemblesan accepting expression.

The three affect dimensions also map to affective postures. There are six prototypepostures defined, which span the space. High arousal corresponds to an erect posturewith a slight upward chin. Low arousal corresponds to a slouching posture where theneck lean and head tilt are lowered. The posture remains neutral over the valencedimension. An open stance corresponds to a forward lean movement, which suggestsstrong interest toward the stimuli the robot is leaning toward. A closed stancecorresponds to withdraw, reminiscent of shrinking away from whatever the robot islooking at.

The remaining three facial prototypes are used to strongly distinguish theexpressions for disgust, anger, and fear. Recall that four of the six primary emotionsare characterized by negative valence. Whereas the primary six basis postures(presented above) can generate a range of negative expressions from distress tosorrow, the expressions for intense anger (rage), intense fear (terror), and intensedisgust have some uniquely distinguishing features. For instance, the prototype fordisgust, Pdisgust; is unique in its asymmetry (typical of this expression such as thecurling of one side of the lip). The prototypes for anger, Panger; and fear, Pfear; eachhave a distinct configuration for the lips (furious lips form a snarl, terrified lips forma grimace).

surp

unhap

tired

anger

fearLow

arousal

Higharousal

Negativevalence

Positivevalence

Closed stance

Open stance

disgu

accepting

stern

ntent

Fig. 7. This diagram illustrates where the basis postures are located in affect space.

C. Breazeal / Int. J. Human-Computer Studies 59 (2003) 119–155 141

Quoting from Breazeal 2003 16

Page 5: cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations (i.e., its ‘‘drives’’ and ‘‘emotions’’) establish its nature by

9/19/13  

5  

A range of sample expressions generated with this technique is shown in Fig. 8,although the system can generate a much broader range. The procedure runs in realtime, which is critical for social interaction.

Given this 3-D affect space, this approach resonates well with the work of Smithand Scott (1997). They posit a 3-D space of pleasure–displeasure (maps to valencehere), attentional activity (maps to arousal here), and personal agency, control(roughly maps to stance here). Table 1 summarizes their proposed mapping of facialactions to these dimensions. They posit a fourth dimension that relates to theintensity of the expression. For Kismet, the expressions become more intense as theaffect state moves to more extreme values in the affect space. As positive valenceincreases, Kismet’s lips turn upward, the mouth opens, and the eyebrows relax.However, as valence decreases, the brows furrow, the jaw closes, and the lips turndownward. Along the arousal dimension, the ears perk, the eyes widen, and themouth opens as arousal increases. Along the stance dimension, increasing positivevalues cause the eyebrows to arc outwards, the mouth to open, the ears to open, andthe eyes to widen. These face actions roughly correspond to a decrease in personalagency/control in Smith and Scott’s framework. For Kismet, it engenders anexpression that looks more eager and accepting (or more uncertain for negativeemotions). Although Kismet’s dimensions do not map exactly to those hypothesizedby Smith and Scott (1997), the idea of combining meaningful face action units in aprincipled manner to span the space of facial expressions, and to also relate them in aconsistent way to emotion categories, holds strong.

Anger CalmCa

FearFe CoContent

Disgust

Interest

Sorrorow S iSurprise TirTi eredd

Fig. 8. Kismet is capable of generating a continuous range of expressions of various intensities byblending the basis facial postures. Facial movements correspond to affect dimensions in a principled way.A sample is shown here.

C. Breazeal / Int. J. Human-Computer Studies 59 (2003) 119–155 143

Quoting from Breazeal 2003 17

Emo4onal  expression  evalua4on  

motors. The wide eyes, elevated brows, and elevated ears suggest high arousal. Thismay account for the confusion with ‘‘surprise.’’

The still image studies were useful in understanding how people read Kismet’sfacial expressions, but it says very little about expressive posturing. Humans andanimals not only express with their face, but also with their entire body. To explorethis issue for Kismet, we showed a small group of subjects a set of video clips.

There were seven people who filled out a second questionnaire. Six were childrenof age 12, four boys and two girls. One was an adult female. In each clip Kismetperforms a coordinated expression using face and body posture. There were sevenvideos for the expressions of anger, disgust, fear, joy, interest, sorrow, and surprise.Using a forced-choice paradigm, for each video the subject was asked to select aword that best described the robot’s expression (anger, disgust, fear, joy, interest,sorrow, or surprise). On a ten-point scale, the subjects were also asked to rate theintensity of the robot’s expression and the certainty of their answer. They were alsoasked to write down any comments they had. The results are compiled in Table 4.Random chance is 14%.

The subjects performed significantly above chance, with overall strongerrecognition performance than on the still images alone. The video segments of‘‘anger,’’ ‘‘disgust,’’ ‘‘fear,’’ and ‘‘sorrow’’ were correctly classified with a higherpercentage than the still images. However, there were substantially fewer subjectswho participated in the video evaluation than the still image evaluation. Therecognition of ‘‘joy’’ most likely dipped from the still-image counterpart because itwas sometimes confused with the expression of interest in the video study. Theperked ears, attentive eyes, and smile give the robot a sense of expectation that couldbe interpreted as interest.

Misclassifications are strongly correlated with expressions having similar facial orpostural components. ‘‘Surprise’’ was sometimes confused for ‘‘fear;’’ both have aquick withdraw postural shift (the fearful withdraw is more of a cowering movementwhereas the surprise posture has more of an erect quality) with wide eyes andelevated ears. ‘‘Surprise’’ was sometimes confused with ‘‘interest’’ as well. Both havean alert and attentive quality, but interest is an approaching movement whereassurprise is more of a startled movement. ‘‘Sorrow’’ was sometimes confused with

Table 4This table summarizes the results of the video evaluation

Anger Disgust Fear Joy Interest Sorrow Surprise % Correct

Anger 86 0 0 14 0 0 0 86Disgust 0 86 0 0 0 14 0 86Fear 0 0 86 0 0 0 14 86Joy 0 0 0 57 28 0 15 57Interest 0 0 0 0 71 0 29 71Sorrow 14 0 0 0 0 86 0 86Surprise 0 0 29 0 0 0 71 71

Forced-choice percentage (random=14%).

C. Breazeal / Int. J. Human-Computer Studies 59 (2003) 119–155 145

Quoting from Breazeal 2003 18

(Human)  emo4on  input  

  Kismet  recognized  four  affec4ve  intents  (i.e.  praise,  prohibi4on,  a^en4onal  bids,  and  soothing)  from  a  person’s  vocal  prosody.  –  A  recognizer  was  designed  to  categorize  these.  

  When  interfaced  with  Kismet’s  emo4on  system,  the  person  could  manipulate  the  robot’s  affec4ve  state  through  tone  of  voice,  causing  the  robot  to  become  –  more  posi4ve  through  praising  tones,  

–  more  aroused  through  aler4ng  tones,  

–  more  ‘‘sad’’  through  scolding  tones,  and  –  moderately  aroused  through  soothing  tones.  

Quoting from Breazeal 2003 19

Lessons  from  Kismet:  Timing  

  Note  the  ability  to  engage  people  in  –  face-­‐to-­‐face,  –  rich,  –  dynamic,  

–  mutually  regulated,  and  

–  closely  coupled  affec4ve  interac4ons.    engaging  because  …  

–  expressive  behavior  is  4mely  and  …  synchronized  with  the  human’s  behavior  ([4mings]  less  than  a  second).  …  a  natural  flow  and  rhythm  to  …  interac4on  …  s4mula4ng  for  the  robot,  …  compelling  for  the  person.  

Quoting from Breazeal 2003 20

Page 6: cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations (i.e., its ‘‘drives’’ and ‘‘emotions’’) establish its nature by

9/19/13  

6  

Lessons  from  Kismet:  HRI  is  not  just  HCI  

  Robots  not  only  have  to  carry  out  their  tasks,  they  also  have  to  survive  in  the  human  environment.  

  The  ability  of  robots  to  adapt  and  learn  in  such  an  environment  is  fundamental.  

  For  robots,  social  and  emo4ve  quali4es  serve  not  only  –  to  ‘‘lubricate’’  the  interface  between  humans  and  robots,  but  also  

–  to  play  a  pragma9c  role  in  promo9ng  •  survival,  •  self  maintenance,  

•  learning,  •  decision-­‐making,  

•  a=en9on,  and  more  

  [One  way  to  capture  this  is  to  say  that  social  quali4es  are  not  just  “for  the  interface”.]  

Quoting from Breazeal 2003 21

Lessons  from  Kismet:  Emo4ons  as  (HCI)  affordances?  

  [W]hen  designing  robots  that  interact  with  humans  in  the  real  world,  the  issue  is  not  so  much  –  whether  robots  should  have  social  and  emo4ve  characteris4cs,  

–  but  of  what  kind.    …  The  robot’s  observable  behavior  and  the  manner  in  which  

it  responds  and  reacts  to  people  profoundly  shapes  the  interac4on  and  the  mental  model  people  have  for  the  robot  

  …  it  is  important  that  the  robot  not  only  does  the  right  thing,  but  also  at  the  right  4me  and  in  the  right  manner  

Quoting from Breazeal 2003 22