Post on 24-Feb-2016
description
SteerBench: a benchmark suite for evaluating steering behaviors
Authors: Singh, Kapadia, Faloutsos, Reinman
Presented by: Jessica Siewert
Content of presentation
• Introduction• Previous work• The Method• Assessment
Introduction – Context and motivation
– Steering of agents– Objective comparison– Standard?– Test cases and scoring, user evaluation– Metric scoring– Demonstration
Introduction – Previous work There is not really anything like it yet (Nov ‘08)
Introduction - Promises
• Evaluate objectively• Help researchers• Working towards a standard for evaluation• Take into account:– Cognitive decisions– Situation-specific aspects
The test cases
– Simple validation scenarios– Basic one – on – one interactions– Agent interactions including obstacles– Group interactions– Large-scale scenarios
The user’s opinion
• Rank on overal score across test cases (comparing)• Rank algorithms based on – a single case, or – one agent’s behavior
• Pass/fail• Visually inspect results• Examine detailed metrics of the performance
The metric
• Number of collisions• Time efficiency• Effort efficiency• Penalties?
Movies…
Developments since then• Ioannis Karamouzas , Peter Heil , Pascal Beek , Mark H. Overmars, A Predictive Colli
sion Avoidance Model for Pedestrian Simulation, Proceedings of the 2nd International Workshop on Motion in Games, November 21-24, 2009, Zeist, The Netherlands
• Shawn Singh , Mubbasir Kapadia , Billy Hewlett , Glenn Reinman , Petros Faloutsos,
A modular framework for adaptive agent-based steering, Symposium on Interactive 3D Graphics and Games, February 18-20, 2011, San Francisco, California
• Suiping Zhou , Dan Chen , Wentong Cai , Linbo Luo , Malcolm Yoke Hean Low , Feng Tian , Victor Su-Han Tay , Darren Wee Sze Ong , Benjamin D. Hamilton, Crowd modeling and simulation technologies, ACM Transactions on Modeling and Computer Simulation (TOMACS), v.20 n.4, p.1-35, October 2010
Experiments – Claim recall
• Evaluate objectively• Help researchers• Working towards a standard for evaluation
Assessment – good things
• All the measured variables seem logical (Too?)• Extensive variable set, with option to expand• Customized evaluation• Cheating not allowed – collision penalties– fail constraint– goal constraint
• Layered set of test cases
Assessment
• The measurements all seem to be approximately the same
• User test makes the difference?• Who are these users?• Examine, inspect, all vage terms• What about the objective of objectiveness?
Assessment
• How good is it to be general• How general/specific is this method?• Time efficiency VS. Effort efficiency• Should it be blind for the algorithm itself?• Penalties, fail and goal constraints not
specified!
Assessment – scoring(1/2)
• The test cases are clearly specified. But it is not specified HOW a GOOD agent SHOULD react, though they say there is such a specification
• How can you get cognitive decisions out of only position, direction and a goal?
Assessment – scoring(2/2)
• “Scoring not intended to be a proof of an algorithm’s effectiveness.”
• How do you interpreted scores and who wins?– “B is slightly better on average, but A has the
highest scores.”
Assessment – final questions
• Can this method become a standard?• What if someone claims to be so innovative
this standard does not apply to them?• Nice first try, though!
Getty images