Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken...

12
Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus ([email protected]) Work by: Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Dan Bohus, Alex Rudnicky Carnegie Mellon University – 2001

Transcript of Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken...

Page 1: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

Is This Conversation on Track?

Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system

Presented by: Dan Bohus ([email protected])

Work by: Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Dan Bohus, Alex RudnickyCarnegie Mellon University – 2001

Page 2: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Outline

The Problem. The Approach Training Data and Features Experiments and Results Conclusion. Future Work

Page 3: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

The Problem

Systems often misunderstand, take misunderstanding as fact, and continue to act using invalid information Repair costs Increased dialog length User Frustration

Confidence annotation provides critical information for effective confirmation and clarification in dialog systems.

Page 4: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

The Approach

Treat the problem as a data-driven classification task. Objective: accurately label

misunderstood utterances.

Collect a training corpus. Identify useful features. Train a classifier ~ identify the best

performing one for this task.

Page 5: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Data

Communicator Logs & Transcripts: Collected 2 months (Oct, Nov 1999). Eliminated conversations with < 5 turns. Manually labeled OK (67%) / BAD (33%)

BAD ~ RecogBAD / ParseBAD / OOD / NONSpeech

Discarded mixed-label utterances (6%). Cleaned corpus of 4550 utterances / 311

dialogs.

Page 6: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Feature Extraction

12 Features from various levels: Decoder Features:

Word Number, Unconfident Percentage

Parsing Features: Uncovered Percentage, Fragment Transitions,

Gap Number, Slot Number, Slot Bigram

Dialog Features: Dialog State, State Duration, Turn Number,

Expected Slots Garble: handcrafted heuristic currently used by

the CMU Communicator

Page 7: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Experiments with 6 different classifiers Decision Tree Artificial Neural Network Naïve Bayes Bayesian Network

Several network structures attempted

AdaBoost Individual feature-based binning estimators as

weak learners, 750 boosting stages

Support Vector Machines Dot, Polynomial, Radial, Neural, Anova

Page 8: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Evaluating performance

Classification Error Rate (FP+FN) CDR = 1-Fallout = 1-(FP/NBAD) Cost of misunderstanding in dialog

systems depends on Error type (FP vs. FN) Domain Dialog state

Ideally, build a cost function for each type of error, and optimize for that

Page 9: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Results – Individual Features

Features (top 8) Mean Err. Rate

Uncovered Percentage 19.93%

Expected Slot 20.97%

Gap Number 23.01%

Bigram Score 23.14%

Garble 25.32%

Slot Number 25.69%

Unconfident Percentage 27.34%

Dialog State 31.03%

Baseline error 32.84% (when predicting the majority class) All experiments involved 10-fold cross-validation

Page 10: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Results – Classifiers

Classifier Mean Err. Rate F/P Rate F/N Rate

AdaBoost 16.59% 11.43% 5.16%

Decision Tree 17.32% 11.82% 5.49%

Bayesian Network 17.82% 9.41% 8.42%

SVM 18.40% 15.01% 3.39%

Neural Network 18.90% 15.08% 3.82%

Naïve Bayes 21.65% 14.24% 7.41%

T-Test showed there is no statistically significant difference between the classifiers except for the Naïve Bayes Explanation: independence between feature assumption is

violated

Baseline error 25.32% (GARBLE)

Page 11: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Future Work

Improve the classifiers Additional features

Develop a cost model for understanding errors in dialog systems. Study/optimize tradeoffs between F/P and F/N;

Integrate value and confidence information to guide clarification in dialog systems

Page 12: Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus (dbohus@cs.cmu.edu)

09-06-2001 Is This Conversation on Track ?

Confusion Matrix

OK BAD

System says OK TP FP

System says BAD FN TN

FP = False acceptance FN = False detection/rejection Fallout = FP/(FP+TN) = FP/NBAD CDR = 1-Fallout = 1-(FP/NBAD)