Turning Text Into Insights: An Introduction to Topic Models
-
Upload
datascience -
Category
Technology
-
view
192 -
download
1
Transcript of Turning Text Into Insights: An Introduction to Topic Models
![Page 1: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/1.jpg)
AN INTRODUCTION TO TOPIC MODELING
Turning text into insight:
![Page 2: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/2.jpg)
Handling Raw, Unlabeled Text
§ Common Datasets: ª Product/ Customer Reviews ª Call Center Transcripts ª News Paper Articles ª Legal Documents
§ Common Tasks: ª Find documents were interested in? ª Categorize documents? ª Retrieve information?
2
![Page 3: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/3.jpg)
Handling Raw, Unlabeled Text
3
§ Common Datasets: ª Product/ Customer Reviews ª Call Center Transcripts ª News Paper Articles ª Legal Documents
§ Common Tasks: ª Find documents were
interested in? ª Categorize documents? ª Retrieve information?
§ The Challenge ª Normal quantitative approaches don’t work with text. ª Datasets are large, complicated, sparse, and unwieldy. ª Data is often unlabeled.
![Page 4: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/4.jpg)
Example: Understanding Customer Reviews
4
§ Mon Ami Gabi is a restaurant in the Paris Paris Hotel and Casino.
§ Thousands of customer reviews for the restaurant over the last 8 years.
What are customers saying?
Excellent breakfast menu. They just need to hire more staff to have a better service.
Great place for brunch!
Highly recommend the steak and fries and sitting outside.
Had a great meal with a great atmosphere
Food was ok… What it has going for it is the view from the outside
terrace.
![Page 5: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/5.jpg)
Topic Modeling: Framework
5
Excellent breakfast menu. They just need to hire more staff to have a better service
Breakfast
Quality of Service
breakfast
better
service
staff
Documents Topics Words and Phrases
![Page 6: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/6.jpg)
Topic Modeling: Preprocessing
6
§ Tokenize: Extract meaningful units from sentences ª I ordered a french toast
ª Regular expression cleanup, end-‐of-‐line hyphenation, contraction, and sentence-‐initial capitalization rules.
§ Stemming Algorithm: Consolidate feature space into word stems or lemmas ª {I, ordered, a, french toast}
ª Suffix stripping, part of speech tagging
§ Matrix Factorization: Convert text into data structure for learning algorithms.
ª Word-‐document matrices often have 1,000,000,000,000+ values. Need special compression algorithms to make data manageable.
{I, ordered, a, french toast}
{I, order, a, french toast}
![Page 7: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/7.jpg)
Topic Modeling: Estimation with Gibbs Sampler
7
ª Use Markov Chain Monte Carlo methods to simulate our document-‐topic and topic-‐word probability distributions.
ª Results:
Topic-‐Word
Breakfast Service
Breakfast: 0.31 Service: 0.28
Eggs: 0.27 Staff: 0.24
Coffee: 0.24 Friendly: 0.21
Document-‐Topic
The french toast was great The staff was great, but the outdoor patio was a bit noisy.
French Toast: 0.71 Service: 0.51
Breakfast: 0.25 Environment: 0.44
Service: 0.03 Breakfast: 0.02
![Page 8: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/8.jpg)
Harnessing the Model: Topic Frequency
8
What are my customers talking about?
![Page 9: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/9.jpg)
Harnessing the Model: Evaluate Products and Verticals
9
How do customers feel about my products?
![Page 10: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/10.jpg)
Harnessing the Model: Temporal Insights
10
How has customer sentiment evolved among my product lines over time?
![Page 11: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/11.jpg)
Harnessing the Model: Deep Product Insights
11
Which properties of French Toast drive satisfaction (or dissatisfaction)?
![Page 12: Turning Text Into Insights: An Introduction to Topic Models](https://reader034.fdocuments.in/reader034/viewer/2022052514/58a8584f1a28ab210b8b710b/html5/thumbnails/12.jpg)
Thank you.