1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs...

22
1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs...

Page 1: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

1

Shaping in Speech Graffiti: results from the initial user study

Stefanie TomkoDialogs on Dialogs meeting10 February 2006

Page 2: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

2

Big picture (i.e. thesis statement)

A system of shaping and adaptivity can be used to induce more efficient user interactions with spoken dialog systems.

This strategy can increase efficiency by increasing the amount of user input that is actually understood by the system, leading to increased task completion rates and higher user satisfaction.

This strategy can also reduce upfront training time, thus accelerating the process of reaching optimally efficient interaction.

Page 3: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

3

This study

Speech Graffiti? (target)

shapeable? (expanded

)

{confsig}

User input

resultshaping

prompt

yes

no

yes

no

Page 4: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

4

My approach, graphically

Speech Graffiti? (target)

shapeable? (expanded

)

intelligent shaping

help

User input

resultshaping

prompt

yes

no

yes

no

Page 5: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

5

Speech Graffiti

Standardized framework of syntax, keywords, and principles

Domain-specific vocabulary Theater is Showcase North Theater

Showcase Cinemas Pittsburgh NorthGenre is drama

DramaWhat movies are playing?

{confsig} [an error beep, since previous utterance is not in grammar]WHERE WAS I?

Theater is Showcase Cinemas Pittsburgh North, genre is dramaOPTIONS

You can specify or ask about title, show time, rating, {ellsig} [a 3-beep list continuation signal]What is title?

2 matches: Dark Water, War of the WorldsSTART OVER

Starting overTheater is Northway Mall Cinemas Eight

Northway Mall Cinemas 8What is address?

1 match: 8000 McKnight Road in Pittsburgh

Page 6: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

6

Expanded grammar Exploit the fact that knowledge of speaking to a

limited-language system restricts input Create a grammar that will accept more natural

language input cf. SG This grammar is opaque for users Why have two grammars?

Lower perplexity LMs lower error rates

Some applications may be SG-only

Restriction: linear mapping from EXP input to TGT equivalent

Page 7: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

7

Shaping strategy

Handle user input accepted by expanded grammar but not target

Balance current task success with future interaction efficiency

Baseline strategy – this study: Confirm expanded grammar input with

full, explicit slot+value confirmation Give result if appropriate for query

Page 8: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

8

Study participants

“Normal” adults, i.e. not CMU students 15 males, 14 females, aged 23-54 Native speakers of American Eng. Little/no computer programming exp New to Speech Graffiti

Page 9: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

9

Study design

Between-subjects 3 conditions

non-shaping+tutorial (BT) shaping+tutorial (ST) shaping+no_tutorial (SN)

Tutorial 9-slide .ppt presentation 5 minutes

Page 10: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

10

Study tasks

15 tasks 4 difficulty levels

# of slots to be specified/queried 40 minutes or when all tasks

completed Only one user did not get to attempt all

15 tasks in 40 minutes Afterwards: SASSI questionnaire

Page 11: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

11

Results

In short, the baseline shaping strategy didn’t have an effect

Efficiency

turns to completion

1

3

5

7

9

11

non-shaping shaping

time to completion, in seconds

0

10

20

30

40

50

60

70

80

90

100

non-shaping shaping

completed tasks

0

2

4

6

8

10

12

non-shaping shaping

Mean results from shaping subjects are only slightly better – non-significant

Page 12: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

12

User satisfaction

Again, no significant differences

No differences on individual SASSI factors No efficiency/satisfaction differences

between tutorial/non-tutorial, either

user satisfaction (mean of means)

1

2

3

4

5

6

7

non-shaping shaping

Page 13: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

13

Grammaticality

How often did users speak within the Target SG grammar?

0

10

20

30

40

50

60

70

80

Q1 Q4

non-shaping shaping

From Q1 to Q4, both groups showed significant increases in TGT gram

Page 14: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

14

Error rates - WER

For non-shaping: 39.9% 30.3% for grammatical utts 38.3% utt-level concept error

For shaping: a bit harder to figure, because of 2-pass ASR Each shaping input generated a

TGT hyp & a EXP hyp Selection based on AM/LM score and a

few simple heuristics

Page 15: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

15

Error rates – WER

Shaping: For selected hypothesis: 37.3% All TGT: 40.9% All EXP: 64.2%

25.6% utt-level concept error

Page 16: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

16

So – what happened?

Shaping users had success with NL-ish input, and shaping prompts were not strong enough to change behavior.

Page 17: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

17

Biggest problem

Using NL or slot-only query formats My theory: <slot> is <value> specification

format is very structured. what is <slot> sounds structured to me, but

to users it sounds like <just ask a question!>

In new versions, query format will be list <slot> Users don’t seem to have too much trouble

adapting to a structure – but the structure needs to be clear.

Will also shape more explicitly by confirming with “I think you meant, ‘list movies’”

Also for more explicit shaping of specifications

Page 18: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

18

Other problems Not using start over to clear context Confusion about semantics of location Long utterances Using next instead of more Pacing

These will be addressed via targeted help messages

Page 19: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

19

Current hang-up

Can we improve WER? LM improvements? COTS recognizer?

Dragon: Using Results Issues

Page 20: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

20

A little bit about trying DNS

Dragon Naturally Speaking 8 Distribution from Jahanzeb

Set up for dictation – i.e. mic input So, no telephone models

To compare with Sphinx Test set of utterances from this study Rerecorded with head mic (so, read) at 16kHz Downsampled to 8kHz for Sphinx

Page 21: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

21

More Dragon stuff

Two groups TGT

Sphinx mean 56.4% ( Worse than 8k telephone model (?)

Dragon mean 35.9% Mean diff: Dragon 18.8pts less (ns)

EXP Sphinx mean 68.5% Dragon mean 45.4% Mean diff: Dragon 22.3pts less (s)

Page 22: 1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

22

More Dragon stuff

But – Dragon rates are not that different from original Sphinx WER rates Sphinx WER in this test might be fishy

Setup seems tricky – can I still do 2-pass decoding?

Would need to change to mic setup Black-box LM stuff

Mysterious adaptation? – not good for user studies!

So, sticking with Sphinx.