Twitter Sentiment Prediction.pptx
-
Upload
krishnesh-pujari -
Category
Documents
-
view
148 -
download
0
Transcript of Twitter Sentiment Prediction.pptx
![Page 1: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/1.jpg)
INST 737 – Twitter Sentiment Prediction on
#Windows10 release
Anuj Sharma, Krishnesh Pujari and Rajesh Gnanasekaran
12/03/15
![Page 2: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/2.jpg)
Objective• Twitter in the recent time has come at par to other social
Medias such as Facebook, Google+ and Myspace in terms of creating sentiment waves on any issue around the world.
• To perform a twitter sentiment analysis and sentiment prediction on Microsoft’s Windows 10 release which took place on July 29th of this year.
• Follow semi-supervised learning technique to create target variable and use it in the classification models.
• To analyze and interpret the results and provide recommendations to Microsoft.
![Page 3: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/3.jpg)
About the Data• Imported using NodeXL from Twitter Search Network
• Original dataset had 9000+ observations on hashtag ‘#Windows10’ for the time period between July 28th 2015 till August 05th 2015
• After cleaning (missing, duplicate, other language) ended up with 4646 observations with 28 original factors, 19 derived features
• Performed feature engineering to arrive at these additional features as we felt they might be better used to predict the target factor, i.e, “Polarity”
• Types of Variables - Categorical, Continuous
![Page 4: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/4.jpg)
Sentiment Analysis● Tweet text cleaning - remove filler words, ignore words
which are not in english● Used a customized R code for text mining which parsed
tweets and classified the words into +ve, -ve or neutral polarities
● The code compared the words in the tweets with a dictionary and mapped the polarity with the tweet.
● Cross checked for the correct functionality of the code by creating 100 odd tweets and manually checked the polarity
![Page 5: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/5.jpg)
Exploring the Data● Created histograms and box plots to identify any unusual
behavior between the variables. Found some interesting patterns
![Page 6: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/6.jpg)
Continued...● Tested the variables
over Pearson’s Correlation; found significant correlation between factors like Tweets and Followed. Made sure that we did not include both these variables together in logistic regression.
● Momentum of tweets shifted from +ve-neutral to -ve at the end period of sample; almost 80% of -ve tweets on 08/05
![Page 7: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/7.jpg)
Feature Engineering● Tweet timestamp was broken into Tweet date and Tweet
time● Current Date● Days difference = Tweet date minus upgrade date● Number of weeks since joined Twitter● Number of months since joined Twitter● Log of number of months since joined Twitter● Log of number of followers● Log of number of people followed by the user● Log of number of favorites● Log of number of tweets● Length of Tweet
![Page 8: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/8.jpg)
Multinomial Logistic Regression and Interpretation
● Multinomial over Binomial - Target variable has more
than two values.
● To check which factors are affecting the tweet polarity in
any manner.
● Interpret using Log of odds to see the variation.
● Variables of importance: Relationship, No. of followers,
Tweet length, No. of weeks since joined twitter
![Page 9: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/9.jpg)
Results
![Page 10: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/10.jpg)
Decision Trees Classification and Interpretation
● Decision trees are the alternative to logistic regression● CART (Classification and Regression Trees) method is
used to recursively classify the target variable● Variables of importance: Tweet date, Days difference and
length of the tweet
![Page 11: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/11.jpg)
Results
![Page 12: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/12.jpg)
Random Forest Classification and Interpretation
● Random Forest is an ensemble of decision trees which will helps in better prediction of polarity
● Implemented 501 decision trees to identify important predictors of polarity
● Variables of importance: Tweet date, Days difference and length of tweet
![Page 13: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/13.jpg)
Results
![Page 14: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/14.jpg)
Limitations● The dataset was for a short span of time between 07/28/15 and
08/05/15, if bigger dataset sample, results may differ
● We have limited the scope of this project to tweets only in English
language.
● We were not able to take advantage of the geo-spatial coordinates
as most of the records had n/a value.
Recommendations● As the negative sentiment starts to prevail post release in
the later half of the week, Microsoft should not stop on
the positive branding even post release.
● As the tweet coming from a seasoned twitter user is
more likely to be negative, Microsoft should target those
influential accounts to spread positive word.
![Page 15: Twitter Sentiment Prediction.pptx](https://reader034.fdocuments.in/reader034/viewer/2022051504/58edc5ba1a28ab7e4c8b466b/html5/thumbnails/15.jpg)
Thank You !