Tangible User Interfaces and Reinforcement Learning (Smart Toys) An honours thesis presentation...

Tangible User Interfaces and Tangible User Interfaces and Reinforcement LearningReinforcement Learning

(Smart Toys)(Smart Toys)

An honours thesis presentation by…

Trent Apted <[email protected]>Supervised by A/Prof Bob Kummerfeld

Smart Internet Technology Research Group

Tangible User InterfacesTangible User Interfaces

• Not just a mouse– Although he can advance my slides

• Facilitate a more intimate interaction with the user– Mainly targeted towards children– Huggable, cute and cuddly– Develop a relationship with the user– Play games

Toys - MotivationToys - Motivation

• Plush (soft and furry) toys account for around 25% of toy store sales

• Over 17 million Furby toys were sold between October 1998 and December 1999– They had primitive learning

capabilities– Mostly robot-like in appearance– They were also relatively cheap

(unlike Sony’s Aibo ~$2,000+)

Toys - ChallengesToys - Challenges

• Want to (cheaply) make a Smart Toy, derived from a plush doll

• Don’t want to adversely affect the original function– Namely, being soft, cute and cuddly

• Also want to be able to detect the usual ‘plush toy’ interactions– E.g. squeeze, carry, lie down with

• I am not an engineer…

Reinforcement Reinforcement LearningLearning

• Like training a dog with a ‘clicker’• Need to associate the reward (click) with

behaviour in a nearby temporal window– How to represent the behaviour– How to determine the window

• Apply learning that attempts to maximise all future possible rewards

• Many techniques– Q-learning, TD(Bayesian models, Markov

models, neural networks, actor-critic, hierarchical

Reinforcement Learning - Reinforcement Learning - ChallengesChallenges

• Not all techniques can be applied to this scenario– Infinite: no end to training examples– Interactive: need to wait for the user to determine the reward– Discrete: few training examples– Future use: a (cheap) toy can not hold a lot of state– Sensors are unsophisticated (Boolean)

• Also needs to be fun– Non-determinism– Anticipate possible actions without stimuli

• May not also be possible to punish the model

My Contributions –My Contributions –Hardware / SystemsHardware / Systems

• Design and implementation of the circuitry and sensors

• Integration into a plush toy• A hardware software

interface (via parallel port) and event model

• Many lessons learnt– E.g. limitations of high-level

hardware (PDA)

My Contributions –My Contributions –SoftwareSoftware

• Reinforcement learning in the context of a Smart Toy

• Flexible learning architecture for further research and exploration (in other contexts)

• Evaluation of the reinforcement learning techniques implemented

• Implementation of a number of simple games to motivate learning of the toy (fun?)

Some Results and AnalysisSome Results and Analysis

• Increasing the state space and re-presenting examples does not help interactive learning

• ‘Snapshot’ environments perform poorly and do not benefit from increasing the learner complexity

• Q-Learning combined with Markov models perform well

Future WorkFuture Work

• Improve the abilities of the toy– There’s spare wires - a speaker would be easy to add– Speech recognition would be harder

• Wireless– Remove the tether for more natural interaction Power source and increased expense

• Collaboration– ‘talking’ to other Smart Toys, collaborating in games– Collaborative learning

• Examine more learning models• Psychological / Sociological aspects

Tangible User Interfaces and Reinforcement Learning (Smart Toys) An honours thesis presentation...

Documents

Transcript of Tangible User Interfaces and Reinforcement Learning (Smart Toys) An honours thesis presentation...