RADAR EVALUATION Goals, Targets, Review & Discussion Jaime Carbonell & soon Full SRI/CMU/IET RADAR...

9
RADAR EVALUATION Goals, Targets, Review & Discussion Jaime Carbonell & soon Full SRI/CMU/IET RADAR Team 1-February-2005 School of Computer Science Supported By DARPA IPTO PAL Program: “Personalized Assistant That Learns”

Transcript of RADAR EVALUATION Goals, Targets, Review & Discussion Jaime Carbonell & soon Full SRI/CMU/IET RADAR...

RADAR EVALUATIONGoals, Targets, Review & Discussion

Jaime Carbonell

& soon Full SRI/CMU/IET RADAR Team

1-February-2005

School of Computer Science

Supported By DARPA IPTO

PAL Program: “Personalized Assistant That Learns”

Carnegie Mellon University 2

Outline: Radar Evaluation

• Brief Review of Radar Challenge Task

• Evaluation Objectives: Obligation and Desiderata

• Evaluation Components: Radar Tasks

• Radar Metrics: Tasks Meaningful Measures

• Putting it all together: Tin-man formula proposal

Carnegie Mellon University 3

The resolver needs to replan:gather information, commandeer other rooms, change schedules,post to websites,inform participants.

The original plan has been disrupted. Conference wing A is no longer available.Other rooms may be affected.

Test: Radar will assist a conference planner in a crisis situation.

The test will be evaluated on quality and completeness of the new plan and on the successful completion of related tasks.

Crisis Resolver

RADAR

NLPPlanning & Scheduling

E-Mail

Handler

Learning

Knowledge Base

Conference Participants

WebsiteConference Organizers

Wing A

Wing B

Carnegie Mellon University 4

Conference Re-planning Tasks

•Situation Assessment – Which resources have become unavailable– What alternative resources exist and at what price

•Tentative re-planning of conference schedule– Elicit and satisfy as many preferences as possible

•Validating conference schedule & resource allocation– Securing buy-in from key stakeholders (requires meeting)– Awaiting external confirmations (or default assumptions)– Modifying plan as/when needed

•Informing all stakeholders– Briefings to VIPs, Update website for participants

•Cope with background tasks (time permitting)

Carnegie Mellon University 5

Scoring Criteria (Adapted from Garvey)

• Task Realism – Must reflect RADAR challenge performance

• Sensitive to Learning– Must allow headroom beyond Y2 (no low ceiling)

– Must include measurement of learning effects

• Auditable with Pride– Objective, Simple, Clear, Transparent,

Statistically Sound, Replicable, …

• Comprehensive & Research-Useful– All RADAR modules included, albeit differentially

– Responsive to RADAR scientific objectives

Carnegie Mellon University 6

Evaluation Components • All RADAR Modules (Sched quality)

– Time-Space Planning (TSP): Schedule quality

– Meeting Scheduling (CMRadar): Meetings, bumps

– Webmaster + Briefing Assistant (VIO)

– Email + NLP: Other tasks completed: background

• Additional Learning Targets (?)– Relevant facts & preferences acquired

– Strategic knowledge (when/how to apply K)

• Combination Function (Utility-like)– Linear weighted sum with +/- terms

Carnegie Mellon University 7

Example: Schedule Quality Metric

sessionsS Sfactorsfkjconf

J jk

fpswScore)(

)(1,0max)(

W = Weight = importance of the session (e.g. keynote > posters)

P = Penalty for distance from ideal (e.g. room smaller than target), linear or step fn

f = factors of sessions (e.g. room size, duration, equipment, …)

r = resource (e.g. ballroom at Flagstaff)

schedNewrschedOldr

t rScore,

cos )$(

Carnegie Mellon University 8

Putting It All Together

Normalizing components:

Summing: or

MINiMAXi

MINiRADARii SS

SSS

,,

,,ˆ

%100

ˆ

ˆ

Cii

Ciii

total w

Sw

S

Ci

iitotal SwS ˆˆ

Carnegie Mellon University 9

Next Steps for Evaluation Metrics• Metrics for Other components

• Metrics for Learning Boost

• Discuss/Refine/Redo Combination– True open-ended scale?

– Something other than weighted sum?

– Quality metric w/o penalties (+ ’s only)

• Test in a full walk-through scenario– Refine the details

– Don’t loose sight of objectives