taolusi.github.io · Dialogue State Induction Using Neural Latent Variable Models. Qingkai Min. 1;2...

Post on 15-Sep-2020

0 views 0 download

Transcript of taolusi.github.io · Dialogue State Induction Using Neural Latent Variable Models. Qingkai Min. 1;2...

Dialogue State Induction Using Neural Latent Variable Models

Qingkai Min1;2 , Libo Qin3 , Zhiyang Teng1;2 , Xiao Liu4 and Yue Zhang1;2

1School of Engineering, Westlake University2Institute of Advanced Technology, Westlake Institute for Advanced Study3Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology4School of Computer Science and Technology, Beijing Institute of Technology

目录

Motivation

Method

Experiments

Conclusion

Motivation

CHAPTER 1 What is a task-oriented dialogue system?

Pic from: Gao J, Galley M, Li L. Neural approaches to conversational AI[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1371-1374.

Typical modular architecture

Assist user in solving a task

CHAPTER 1 What is a task-oriented dialogue system?

End-to-end architecture

Assist user in solving a task

Pic from: Gao J, Galley M, Li L. Neural approaches to conversational AI[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1371-1374.

CHAPTER 1 What is a task-oriented dialogue system?

Typical modular architecture

Pic from: Gao J, Galley M, Li L. Neural approaches to conversational AI[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1371-1374.

CHAPTER 1 What is dialogue state tracking (DST)?

The dialogue state represents what the user is looking for at the current turn of the conversation.

CHAPTER 1

User: I want an expensive restaurant that serves Turkish food.

System: Anatolia serves Turkish food.User: What is the area?

Turn 1:inform(price=expensive, food=Turkish)

Turn 2:inform(price=expensive, food=Turkish); request(area)

What is dialogue state tracking (DST)?

The dialogue state represents what the user is looking for at the current turn of the conversation.

CHAPTER 1 Current DST scenarios

CHAPTER 1 Current DST scenarios

NLU DST

Propagatinguncertainty

Traditional DST:

utterance dialogue state

CHAPTER 1

NLU DST

Propagatinguncertainty

Traditional DST:

A single DST component

End-to-end DST:

utterance dialogue state

dialogue stateutterance

Current DST scenarios

CHAPTER 1 End-to-end DST

CHAPTER 1 End-to-end DST

User: I want an expensive restaurant that serves Turkish food.

System: Anatolia serves Turkish food.User: What is the area?

CHAPTER 1 End-to-end DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

CHAPTER 1 End-to-end DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

DST

CHAPTER 1 End-to-end DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

Turn 1:inform(price=expensive, food=Turkish)

Turn 2: inform(price=expensive, food=Turkish); request(area)

DST

CHAPTER 1 Limitation of end-to-end DST

CHAPTER 1 Limitation of end-to-end DST

End-to-end DST paradigm:

manual labeling train testDST modelcorpus

CHAPTER 1 Limitation of end-to-end DST

End-to-end DST paradigm:

manual labeling train testDST modelcorpus

Limitations of end-to-end DST:

CHAPTER 1 Limitation of end-to-end DST

End-to-end DST paradigm:

manual labeling train testDST modelcorpus

Limitations of end-to-end DST:

• Costly and slow: 8438 dialogues with 1249 workers in MultiWOZ 2.0 dataset

CHAPTER 1 Limitation of end-to-end DST

End-to-end DST paradigm:

manual labeling train testDST modelcorpus

Limitations of end-to-end DST:

• Costly and slow: 8438 dialogues with 1249 workers in MultiWOZ 2.0 dataset

• Error-prone:Annotation errors

MultiWOZ 2.0 around 40% [Eric et al., 2019]

MultiWOZ 2.1 over 30% [Zhang et al., 2019]

[Eric et al., 2019] Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyag Gao, and Dilek Hakkani-Tur. Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines. arXiv, 2019.[Zhang et al., 2019] Jian-Guo Zhang, Kazuma Hashimoto, Chien-Sheng Wu, Yao Wan, Philip S Yu, Richard Socher, and Caiming Xiong. Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. arXiv, 2019.

CHAPTER 1 Limitation of end-to-end DST

End-to-end DST paradigm:

manual labeling train testDST modelcorpus

Limitations of end-to-end DST:

• Costly and slow: 8438 dialogues with 1249 workers in MultiWOZ 2.0 dataset

• Error-prone:Annotation errors

MultiWOZ 2.0 around 40% [Eric et al., 2019]

MultiWOZ 2.1 over 30% [Zhang et al., 2019]

[Eric et al., 2019] Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyag Gao, and Dilek Hakkani-Tur. Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines. arXiv, 2019.[Zhang et al., 2019] Jian-Guo Zhang, Kazuma Hashimoto, Chien-Sheng Wu, Yao Wan, Philip S Yu, Richard Socher, and Caiming Xiong. Find or classify? dual strategy for slot-value predictions on multi-domain dialog state tracking. arXiv, 2019.

now updated toMultiWOZ 2.2

CHAPTER 1 What is the problem?

CHAPTER 1

Successful in narrow domains with large annotated datasets

What is the problem?

CHAPTER 1

Successful in narrow domains with large annotated datasets

What is the problem?

CHAPTER 1

Limited to the domain trained on and do not afford generalization to new domains.

Successful in narrow domains with large annotated datasets

What is the problem?

CHAPTER 1 What is the problem?

Successful in narrow domains with large annotated datasets

Limited to the domain trained on and do not afford generalization to new domains.

CHAPTER 1 Dialogue State Induction (DSI)

Definition:

What is given?

A set of customer service records without annotation.

User: I want an expensive restaurant that serves Turkish food.System: Anatolia serves Turkish food.User: What is the area?

CHAPTER 1

Definition:

What is given?

A set of customer service records without annotation.

What is the target?

Automatically discover information that the user is looking for at each turn.

User: I want an expensive restaurant that serves Turkish food.System: Anatolia serves Turkish food.User: What is the area?

inform(price=expensive, food=Turkish)

inform(price=expensive, food=Turkish); request(area)

Dialogue State Induction (DSI)

CHAPTER 1 Dialogue State Induction vs DST

CHAPTER 1 Dialogue State Induction vs DST

User: I want an expensive restaurant that serves Turkish food.

System: Anatolia serves Turkish food.User: What is the area?

CHAPTER 1 Dialogue State Induction vs DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

CHAPTER 1 Dialogue State Induction vs DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

CHAPTER 1 Dialogue State Induction vs DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

Turn 1:inform(price=expensive, food=Turkish)

Turn 2:inform(price=expensive, food=Turkish); request(area)

CHAPTER 1 Dialogue State Induction vs DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

Turn 1:inform(price=expensive, food=Turkish)

Turn 2:inform(price=expensive, food=Turkish); request(area)

User: I want an expensive restaurant that serves Turkish food.

System: Anatolia serves Turkish food.User: What is the area?

CHAPTER 1 Dialogue State Induction vs DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

Turn 1:inform(price=expensive, food=Turkish)

Turn 2:inform(price=expensive, food=Turkish); request(area)

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

CHAPTER 1 Dialogue State Induction vs DST

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

Turn 1:inform(price=expensive, food=Turkish)

Turn 2:inform(price=expensive, food=Turkish); request(area)

User: I want an expensive restaurant that serves Turkish food.Manual labeling: inform(price=expensive, food=Turkish)

System: Anatolia serves Turkish food.User: What is the area?Manual labeling: inform(price=expensive, food=Turkish); request(area)

Ontology(optional): price: [cheap, expensive, moderate, ...] food: [Turkish, Italian, polish, ...] area: [south, north, center, ...]

Turn 1:inform(price=expensive, food=Turkish)

Turn 2:inform(price=expensive, food=Turkish); request(area)

CHAPTER 1 DSI VS zero-shot DST

Zero-shot DST: support unseen domains (services)

CHAPTER 1

Zero-shot DST: support unseen domains (services)

Motivation: Different domains (services) with similar schemas

DSI VS zero-shot DST

CHAPTER 1

Zero-shot DST: support unseen domains (services)

service_name: “Flights" Service

name: "origin" Slots

name: "destination"

service_name: "Trains" Service

name: "from" Slots

name: "to"

Motivation: Different domains (services) with similar schemas

DSI VS zero-shot DST

CHAPTER 1

Zero-shot DST: support unseen domains (services)

Example from the SGD dataset [Rastogi et al., 2019].

Motivation: Different domains (services) with similar schemasslots along with their natural language description

service_name: “Flights" Servicedescription: "Search for cheap flights across multiple providers"

name: "origin" Slotsdescription: “City of origin for the flight"

name: "destination"description: “City of destination for the flight"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: “Starting city for train journey"

name: "to"description: “Ending city for the train journey"

[Rastogi et al., 2019]Rastogi A, Zang X, Sunkara S, et al. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset[J]. arXivpreprint arXiv:1909.05855, 2019.

DSI VS zero-shot DST

CHAPTER 1

service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"

name: "origin" Slotsdescription: “City of origin for the flight"

name: "destination"description: “City of destination for the flight"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: “Starting city for train journey"

name: "to"description: “Ending city for the train journey"

DSI VS zero-shot DST

CHAPTER 1

service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"

name: "origin" Slotsdescription: “City of origin for the flight"

name: "destination"description: “City of destination for the flight"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: “Starting city for train journey"

name: "to"description: “Ending city for the train journey"

Zero-shot DST Limitations:

DSI VS zero-shot DST

CHAPTER 1

• High qualified (consistent) human annotation

service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"

name: "origin" Slotsdescription: “City of origin for the flight"

name: "destination"description: “City of destination for the flight"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: “Starting city for train journey"

name: "to"description: “Ending city for the train journey"

Zero-shot DST Limitations:

DSI VS zero-shot DST

CHAPTER 1

service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"

name: "origin" Slotsdescription: “City of origin for the flight"

name: "destination"description: “City of destination for the flight"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: “Starting city for train journey"

name: "to"description: “Ending city for the train journey"

• High qualified (consistent) human annotationZero-shot DST Limitations:

DSI VS zero-shot DST

CHAPTER 1

service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"

name: "origin" Slotsdescription: “City of origin for the flight"

name: "destination"description: “City of destination for the flight"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: “Starting city for train journey"

name: "to"description: “Ending city for the train journey"

• High qualified (consistent) human annotationZero-shot DST Limitations:

DSI VS zero-shot DST

CHAPTER 1

service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"

name: "origin" Slotsdescription: “City of origin for the flight"

name: "destination"description: “City of destination for the flight"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: “Starting city for train journey"

name: "to"description: “Ending city for the train journey"

consistency

• High qualified (consistent) human annotationZero-shot DST Limitations:

DSI VS zero-shot DST

CHAPTER 1

service_name: “Flights" Servicedescription: "Search for cheap trains across multiple providers"

name: "origin" Slotsdescription: “City of origin for the flight"

name: "destination"description: “City of destination for the flight"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: “Starting city for train journey"

name: "to"description: "Ending city for the train journey"

consistency

• High qualified (consistent) human annotation

• Transfer to distant domain (service)

Zero-shot DST Limitations:

DSI VS zero-shot DST

CHAPTER 1

service_name: "Financial" Servicedescription: “Search for financial products across multiple providers"

name: “deposit_rate" Slotsdescription: "the standard for calculating deposit interest"

name: "loan_rate"description: "the interest rate that banks charge to borrowers"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: "Starting city for train journey"

name: "to"description: "Ending city for the train journey"

consistency

• High qualified (consistent) human annotation

• Transfer to distant domain (service)

Zero-shot DST Limitations:

DSI VS zero-shot DST

CHAPTER 1

service_name: "Financial" Servicedescription: “Search for financial products across multiple providers"

name: “deposit_rate" Slotsdescription: "the standard for calculating deposit interest"

name: "loan_rate"description: "the interest rate that banks charge to borrowers"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: "Starting city for train journey"

name: "to"description: "Ending city for the train journey"

consistency

• High qualified (consistent) human annotation

• Transfer to distant domain (service)

XZero-shot DST Limitations:

DSI VS zero-shot DST

CHAPTER 1

service_name: "Financial" Servicedescription: “Search for financial products across multiple providers"

name: “deposit_rate" Slotsdescription: "the standard for calculating deposit interest"

name: "loan_rate"description: "the interest rate that banks charge to borrowers"

service_name: "Trains" Servicedescription: “Service to find train journeys between cities"

name: "from" Slotsdescription: "Starting city for train journey"

name: "to"description: "Ending city for the train journey"

consistency

• High qualified (consistent) human annotation

Zero-shot DST Limitations:

• Transfer to distant domain (service)

XDSI features:• Release human burden

• Data-driven: automatically discover

DSI VS zero-shot DST

Method

CHAPTER 2 How we solve DSI?

Two steps: Utterance: I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

CHAPTER 2 How we solve DSI?

train, Chicago, Dallas, Wednesday

• Candidates (values) extraction (POS tag, NER, coreference)

Two steps: Utterance: I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

CHAPTER 2 How we solve DSI?

train, Chicago, Dallas, Wednesday

• Candidates (values) extraction (POS tag, NER, coreference)

Inform{train-departure=Chicago, train-destination=Dallas, train-leave at=Wednesday}train=None

Two steps:

• Slot assignment: two neural latent variable models(DSI-base and DSI-GM)

Utterance: I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

CHAPTER 2 What is VAE?

Pics from https://www.jianshu.com/p/ffd493e10751

AutoEncoder Variational AutoEncoder

CHAPTER 2 Model: generative DSI-base

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

CHAPTER 2 Model: generative DSI-base

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

0 1 0 1 … 0 0 1 1

vocab length (all candidates)

one-hot

Encoder

CHAPTER 2 Model: generative DSI-base

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

0 1 0 1 … 0 0 1 1

vocab length (all candidates)

one-hot

LT+SoftPlus+Dropout

Encoder

CHAPTER 2 Model: generative DSI-base

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

0 1 0 1 … 0 0 1 1

vocab length (all candidates)

one-hot

LT+SoftPlus+Dropout

encoded oh

Encoder

CHAPTER 2 Model: generative DSI-base

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

0 1 0 1 … 0 0 1 1

vocab length (all candidates)

one-hot Pre-trained ELMo

train Chicago Dallas Wednesdaycontextualized

embedding

LT+SoftPlus+Dropout

encoded oh

Encoder

CHAPTER 2 Model: generative DSI-base

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

0 1 0 1 … 0 0 1 1

vocab length (all candidates)

one-hot Pre-trained ELMo

train Chicago Dallas Wednesdaycontextualized

embedding

LT+SoftPlus+Dropout

AvgPooling+LT+SoftPlus+Dropout

encoded oh

Encoder

CHAPTER 2 Model: generative DSI-base

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

0 1 0 1 … 0 0 1 1

vocab length (all candidates)

one-hot Pre-trained ELMo

train Chicago Dallas Wednesdaycontextualized

embedding

LT+SoftPlus+Dropout

AvgPooling+LT+SoftPlus+Dropout

encoded oh encoded ce

Encoder

CHAPTER 2 Model: generative DSI-base

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

0 1 0 1 … 0 0 1 1

vocab length (all candidates)

one-hot Pre-trained ELMo

train Chicago Dallas Wednesdaycontextualized

embedding

LT+SoftPlus+Dropout

AvgPooling+LT+SoftPlus+Dropout

encoded oh encoded ce

LT+BatchNorm LT+BatchNorm

posterior µ 2posterior logσ

Encoder

CHAPTER 2

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

Sampling

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

Sampling

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

Sampling

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

slatent slot vector

(dimension: slot_num)

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

slatent slot vector

(dimension: slot_num)

LT+BatchNorm

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

slatent slot vector

(dimension: slot_num)

LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed ohvocab length (all candidates)

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed ohvocab length (all candidates)

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ

vocab length (all candidates)contextualized

feature dimension

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ

vocab length (all candidates)contextualized

feature dimension

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ 2recon_ logσ

vocab length (all candidates)contextualized

feature dimensioncontextualized

feature dimension

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

( ), 0,µ σ ε ε+ ∗ Ι

reparameterization trick

zlatent state vector

LT+Softmax

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ 2recon_ logσ

vocab length (all candidates)contextualized

feature dimensioncontextualized

feature dimensionreconstructed parameters for ce

Sampling

Decoder

posterior µ 2posterior logσ

Model: generative DSI-base

CHAPTER 2

reconstruction term

regularization term

Loss

CHAPTER 2

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh

vocab length (all candidates)

0 1 0 1 … 0 0 1 1 ohCross Entropy

Loss

reconstruction term

regularization term

Loss

CHAPTER 2

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh

recon_µ

2recon_ logσ

vocab length (all candidates)

contextualized feature dimension

contextualized feature dimension

0 1 0 1 … 0 0 1 1 ohCross Entropy

Loss

Multivariate Gaussian Distribution

reconstruction term

regularization term

Loss

CHAPTER 2

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh

recon_µ

2recon_ logσ

vocab length (all candidates)

contextualized feature dimension

contextualized feature dimension

0 1 0 1 … 0 0 1 1 ohCross Entropy

Loss

Multivariate Gaussian Distribution

train Chicago Dallas Wednesdayreconstruction

term

regularization term

Loss

CHAPTER 2

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh

recon_µ

2recon_ logσ

vocab length (all candidates)

contextualized feature dimension

contextualized feature dimension

0 1 0 1 … 0 0 1 1 ohCross Entropy

Loss

Multivariate Gaussian Distribution

Pic from https://www.datalearner.com/blog/1051485590815771

train Chicago Dallas Wednesday

-dist.log_prob(ce[train])

-dist.log_prob(ce[Wednesday])

-dist.log_prob(ce[Chicago])

-dist.log_prob(ce[Dallas])

reconstruction term

regularization term

Loss

CHAPTER 2

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh

recon_µ

2recon_ logσ

vocab length (all candidates)

contextualized feature dimension

contextualized feature dimension

0 1 0 1 … 0 0 1 1 ohCross Entropy

Loss

Multivariate Gaussian Distribution

Pic from https://www.datalearner.com/blog/1051485590815771

train Chicago Dallas Wednesday

-dist.log_prob(ce[train])sum

-dist.log_prob(ce[Wednesday])

-dist.log_prob(ce[Chicago])

-dist.log_prob(ce[Dallas])

reconstruction term

regularization term

Loss

CHAPTER 2

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4 reconstructed oh

recon_µ

2recon_ logσ

vocab length (all candidates)

contextualized feature dimension

contextualized feature dimension

0 1 0 1 … 0 0 1 1 ohCross Entropy

Loss

Multivariate Gaussian Distribution

Pic from https://www.datalearner.com/blog/1051485590815771

train Chicago Dallas Wednesday

-dist.log_prob(ce[train])sum

-dist.log_prob(ce[Wednesday])

-dist.log_prob(ce[Chicago])

-dist.log_prob(ce[Dallas])

reconstruction term

( ) ( )( , , 0,1 )KL µ σ KL Divergence between posterior and prior: regularization term

Loss

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

posterior µ 2posterior logσ

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

posterior µ 2posterior logσ

Sampling

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

s

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

s

p(s)[slot_num,1]

CHAPTER 2 DSI-base inferenceI need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

s

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

[slot_num,1]

CHAPTER 2

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ 2recon_ logσ

vocab length (all candidates)contextualized

feature dimensioncontextualized

feature dimensionreconstructed parameters for ce

What does the model learn ?

Decoder

CHAPTER 2

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ 2recon_ logσ

vocab length (all candidates)contextualized

feature dimensioncontextualized

feature dimensionreconstructed parameters for ce

slot_num

vocab len

W: parameter matrix

What does the model learn ?

Decoder

CHAPTER 2

0.10.7

0.01

0.08

……

What does the model learn ?s: slot vector

slot_num

vocab_len

W: parameter matrix

slot_num 0.01 0.2 0.01 0.8 … 0.03 0.02 0.3 0.4× =

CHAPTER 2

0.10.7

0.01

0.08

……

What does the model learn ?s: slot vector

slot_num

vocab_len

W: parameter matrix

slot_num 0.01 0.2 0.01 0.8 … 0.03 0.02 0.3 0.4

0 1 0 1 … 0 0 1 1

× =Cross Entropy Loss

CHAPTER 2

…….

0.10.7

0.01

0.08

……

What does the model learn ?s: slot vector

slot_num

vocab_len

W: parameter matrix

slot_num 0.01 0.2 0.01 0.8 … 0.03 0.02 0.3 0.4

0 1 0 1 … 0 0 1 1

× =Cross Entropy Loss

slot_num different slot-candidate distributions: p(c|si), i=1,2,…, slot_num

CHAPTER 2 DSI-base inference

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

s

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

[slot_num,1]

CHAPTER 2 DSI-base inference

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

s

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0oh

[slot_num,1]

CHAPTER 2 DSI-base inference

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

s

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0oh

slot_num

vocab_len

[slot_num,1]

CHAPTER 2 DSI-base inference

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

s

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0oh

slot_num

vocab_len

p(oh|s)[slot_num,1] [slot_num,1]

CHAPTER 2 DSI-base inference

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder

zlatent state vector

posterior µ 2posterior logσ

Sampling

LT+Softmax

s

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0oh

slot_num

vocab_len

p(oh|s)[slot_num,1] [slot_num,1]

Chicagoce

CHAPTER 2

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ 2recon_ logσ

vocab length (all candidates)contextualized

feature dimensioncontextualized

feature dimensionreconstructed parameters to generate ce

DSI-base inference

feature_dim

Decoder

CHAPTER 2

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ 2recon_ logσ

vocab length (all candidates)contextualized

feature dimensioncontextualized

feature dimensionreconstructed parameters to generate ce

feature_dim

DSI-base inference

slot_num

feature_dim

Decoder

CHAPTER 2

slatent slot vector

(dimension: slot_num)

LT+BatchNorm LT+BatchNorm LT+BatchNorm

0.01 0.2 0.01 0.08 … 0.03 0.02 0.3 0.4

reconstructed oh

recon_µ 2recon_ logσ

vocab length (all candidates)contextualized

feature dimensioncontextualized

feature dimensionreconstructed parameters to generate ce

feature_dim

DSI-base inference

slot_num

feature_dim

slot_num

Decoder

CHAPTER 2

0.10.7

0.01

0.08

……

What does the model learn ?

s: slot vector

slot_num

featrure_dim

W: parameter matrix

slot_num ×

=

slot_num

featrure_dim

recon_µ

2recon_ logσ=

CHAPTER 2

0.10.7

0.01

0.08

……

What does the model learn ?

s: slot vector

slot_num

featrure_dim

W: parameter matrix

slot_num ×

=

slot_num

featrure_dim

recon_µ

2recon_ logσ=

1Sµ −

…….

1Sσ −

……

CHAPTER 2

0.10.7

0.01

0.08

……

What does the model learn ?

s: slot vector

slot_num

featrure_dim

W: parameter matrix

slot_num ×

=

slot_num

featrure_dim

recon_µ

2recon_ logσ=

1Sµ −

…….

1Sσ −

……

slot_num different Multivariate Gaussian Distributions

( , ), 1, 2, ,i i i Sµ σ =

CHAPTER 2 Multivariate Gaussian probability density

µ

σ

contextualized feature dimension

contextualized feature dimension Multivariate Gaussian Distribution

Pic from https://www.datalearner.com/blog/1051485590815771

train Chicago Dallas Wednesday

-dist.log_prob(ce[train])

-dist.log_prob(ce[Wednesday])

-dist.log_prob(ce[Chicago])

-dist.log_prob(ce[Dallas])

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

zlatent state vector

LT+Softmax

slatent slot vector(dimension: slot_num)

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0

p(oh|s)

oh

[slot_num,1]

slot_num

vocab_len

[slot_num,1]

Chicagoce

DSI-base inference

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

zlatent state vector

LT+Softmax

slatent slot vector(dimension: slot_num)

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0

p(oh|s)

oh

[slot_num,1]

slot_num

vocab_len

[slot_num,1]

Chicagoce

( , ),1, 2, , _

i i

i slot numµ σ

=

DSI-base inference

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

zlatent state vector

LT+Softmax

slatent slot vector(dimension: slot_num)

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0

p(oh|s)

oh

[slot_num,1]

slot_num

vocab_len

[slot_num,1]

Chicagoce

( , ),1, 2, , _

i i

i slot numµ σ

=

p(ce|s)[slot_num,1]

DSI-base inference

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

zlatent state vector

LT+Softmax

slatent slot vector(dimension: slot_num)

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0

p(oh|s)

oh

[slot_num,1]

slot_num

vocab_len

[slot_num,1]

Chicagoce

( , ),1, 2, , _

i i

i slot numµ σ

=

p(ce|si)[slot_num,1]

* *

DSI-base inference

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

zlatent state vector

LT+Softmax

slatent slot vector(dimension: slot_num)

p(s)

For each candidate in {train, Chicago, Dallas, Wednesday}

0 0 0 1 … 0 0 0 0

p(oh|s)

oh

[slot_num,1]

slot_num

vocab_len

[slot_num,1]

Chicagoce

( , ),1, 2, , _

i i

i slot numµ σ

=

p(ce|si)[slot_num,1]

* *p(s|Chicago)[slot_num,1]

DSI-base inference

CHAPTER 2 Post-processing: slot mapping

Chicago slot 0

Houston

Washington

Boston

slot 0

slot 0

slot 0

predictions

train-departure

references

Mapping from slot indexes to labels?

{0: train-departure}

Majority vote

train-departure

train-departure

train-departure

Chicago

Houston

Washington

Boston

Magdalene College slot 0 Magdalene College attraction-name

4 times

1 time

CHAPTER 2 Post-processing: slot mapping

Chicago slot 0

Houston

Washington

Boston

slot 0

slot 0

slot 0

predictions

train-departure

references

Mapping from slot indexes to labels?

train-departure

train-departure

train-departure

Chicago

Houston

Washington

Boston

Magdalene College slot 0 Magdalene College train-departure

{0: train-departure}

CHAPTER 2 What is the problem with DSI-base?

Multiple domains train

departuredestinationday

taxi

leave atarrive bySimilar slots

CHAPTER 2

Multiple domains train

departuredestinationday

taxi

leave atarrive by

I need to arrive by 19:45 and departs from Cambridge.

Similar slots

Similar contexts

What is the problem with DSI-base?

CHAPTER 2 How does DSI-GM work?

train-arrive by

train-leave at

train-daytrain-destination

train-departure

taxi-arrive by

taxi-leave at

taxi-destinationtaxi-departure

Z

A single Gaussian prior

µ σ

CHAPTER 2 How does DSI-GM work?

train-arrive by

train-leave at

train-daytrain-destination

train-departure

taxi-arrive by

taxi-leave at

taxi-destinationtaxi-departure

Z

A single Gaussian prior

µ σ

train-arrive by

train-leave at

train-day

train-destinationtrain-departure

taxi-arrive by

taxi-leave at

taxi-destination

taxi-departureZ

A Mixture-of-Gaussians prior

1 Kµ µ 1 Kσ σ

Z

π

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

DSI-base

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

DSI-base

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

DSI-base

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

DSI-base

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

Chicago: taxi-departureDSI-base

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

Chicago: taxi-departureDSI-base DSI-GM

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

Chicago: taxi-departureDSI-base

Encoder+Sampling

DSI-GM

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

Chicago: taxi-departureDSI-base

Encoder+Sampling

z

DSI-GM

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

Chicago: taxi-departureDSI-base

Encoder+Sampling

z

DSI-GM

A Mixture-of-Gaussians

Domain: train

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

Chicago: taxi-departureDSI-base

Encoder+Sampling

z

Prediction

DSI-GM

A Mixture-of-Gaussians

Domain: train

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

Chicago: taxi-departureDSI-base

Encoder+Sampling

z

Prediction Chicago: departure

DSI-GM

A Mixture-of-Gaussians

Domain: train

How does DSI-GM work?

CHAPTER 2

I need to take a train out of Chicago, I will be leaving Dallas on Wednesday.

Encoder+Sampling

z

Prediction

Chicago: taxi-departureDSI-base

Encoder+Sampling

z

Prediction Chicago: departure

DSI-GM

A Mixture-of-Gaussians

Domain: train

Chicago: train-departure

How does DSI-GM work?

请输入您的标题NINDEBIAOTIExperiments

CHAPTER 3 DSI results

CHAPTER 3 DSI results

CHAPTER 3 DSI-Based Response Generation

CHAPTER 3 DSI-Based Response Generation

[Chen et al., 2019] Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, and William Yang Wang. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In ACL, 2019.

CHAPTER 3 DSI-Based Response Generation

[Chen et al., 2019] Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, and William Yang Wang. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In ACL, 2019.

CHAPTER 3 DSI-Based Response Generation

[Chen et al., 2019] Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, and William Yang Wang. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In ACL, 2019.

CHAPTER 3 Analysis

CHAPTER 3

Domain level comparison of the latent representation z.

Analysis

请输入您的标题NINDEBIAOTIConclusion

CHAPTER 4 Conclusion

CHAPTER 4

• Dialogue state induction: a novel task to automatically identify dialogue states

Conclusion

CHAPTER 4

• Dialogue state induction: a novel task to automatically identify dialogue states

Conclusion

• DSI-base/DSI-GM: two neural generative models with latent variables

CHAPTER 4

• Dialogue state induction: a novel task to automatically identify dialogue states

Conclusion

• DSI-base/DSI-GM: two neural generative models with latent variables

• Challenging and promising: unsupervised setting is very practical

CHAPTER 4

• Dialogue state induction: a novel task to automatically identify dialogue states

Conclusion

• DSI-base/DSI-GM: two neural generative models with latent variables

• Challenging and promising: unsupervised setting is very practical

• IJCAI review: this problem is important and interesting, this area should attract more attention. This work has great potential of motivating follow-up research.

Contact: minqingkai@westlake.edu.cn

Paper: https://www.ijcai.org/Proceedings/2020/0532.pdf

GitHub: https://github.com/taolusi/dialogue-state-induction

GitHubpaper