Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a...

29
DATA SCIENCE POP UP AUSTIN Using LDA and Structural Topic Modeling to Explore Trending Topics in a Call Center Jordana Heller Data Scientist, Mattersight jheller

Transcript of Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a...

Page 1: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

DATA SCIENCEPOP UP

AUSTIN

Using LDA and Structural Topic Modeling to Explore Trending Topics in a Call Center

Jordana HellerData Scientist, Mattersight

jheller

Page 3: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

DATA SCIENCEPOP UP

AUSTIN

#datapopupaustin

April 13, 2016Galvanize, Austin Campus

Page 4: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

Lightning Talk: Using LDA and Structural Topic Modeling to Explore Trending Topics in a Call CenterJordana Heller @jhellerData Science Pop-up Austin, April 13, 2016

Page 5: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

What We Do

Page 6: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Our goal: Topic Trends

3/31/2016 4/30/2016 5/31/2016 6/30/2016 7/31/2016

Identifying contents and prevalence of multiword topics present in conversation in an unsupervised way

Unexpected Prevalence Critical Spikes Escalating Frequency

Page 7: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Our goals, continued

Manageable number of topics

Track expected and unexpected topics

Go deep: Contextualize topic usage

Page 8: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Short text: Keywords, hashtags, ngrams

Page 9: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Long text: Could use predetermined topics

Image credit: IBM Watson Concept Insights

Page 10: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Long text: Or discover themes

Image credit: Blei, 2012, Communications of the ACM

Latent Dirichlet Allocation (LDA) (Blei et al., 2003)

Page 11: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Great! How about contextualizing trends?

• Where are topics trending?• Structural Topic Modeling (Roberts et al., 2013)

– Instead of relying on post-hoc comparisons, includes covariates in LDA model• Specifies priors as GLMs• Word distribution determined by topic, covariates,

topic-covariate interaction– Authors’ implementation: R package stm (available

via CRAN; all code on GitHub!)

Page 12: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

Ready to talk pipeline!

Page 13: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Data Collection and Preprocessing

Read Transcripts

Add Call-level Covariates

Preprocess text

• Collocations• -Stop words• Stem/completion• -Low freq terms

Create Term-Document

Matrix

Page 14: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Topic Model Creation

Retrieve last topic

model

• For comparison

Create current

topic model

•Detect number of topics, or specify

Create topic labels

Page 15: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Topic Model Comparison

Inspect overall topic prevalence

Compare overall topic prevalence across periods

• Topics change! Measure change in word probability distributions for each new topic wrt each old topic

• Match new to closest previous match below change threshold (otherwise new topic)

• Evaluate trends!

Estimate and inspect effects of

covariates

Compare effects of covariates

across periods

•Output can be interpreted similarly to regression

Page 16: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

Example results: Hotel reservations Covariates: booking, caller distress

Page 17: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

� convention, center, mind, worry, philadelphia, inventory� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

Page 18: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

� school, college, graduate, medical, clinic

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

Page 19: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

Ã30% beach, balcony, ocean, view

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

Page 20: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

Ä10% back, next, receive, listen, cash future

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

Page 21: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

� back, minute, system, run, inconvenience

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

Page 22: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

Ã42% confirm, email, arrival, local

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

Page 23: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Caller Distress

� NewÄ Decreasingà Increasing

Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion

Page 24: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Caller Distress

� square, city, price, hotel, manhattan, central

� NewÄ Decreasingà Increasing

Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion

Page 25: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Caller Distress

Ä12% online, website, cancel, purchase, advance� NewÄ Decreasingà Increasing

Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion

Page 26: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

Nice!

Page 27: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Our goals, revisited

Manageable number of topics

Track expected and unexpected topics

Go deep: Contextualize topic usage

Page 28: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Topic trends using structural topic models

Thank you!

Page 29: Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a Call Center

DATA SCIENCEPOP UP

AUSTIN

@datapopup #datapopupaustin