The spread of information on Twitter based on sentimentncuwm/21stAnnual/presentation-library/... ·...

25
The spread of information on Twitter based on sentiment Haley Knox Eastern Connecticut State University Mentors: Dr. Garrett Dancik and Dr. Megan Heenehan January 26, 2019 Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 1 / 22

Transcript of The spread of information on Twitter based on sentimentncuwm/21stAnnual/presentation-library/... ·...

  • The spread of information on Twitter based onsentiment

    Haley Knox

    Eastern Connecticut State University

    Mentors: Dr. Garrett Dancik and Dr. Megan Heenehan

    January 26, 2019

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 1 / 22

  • Twitter

    A tweet can be spread by ‘liking’, replying, quoting or retweeting.Some stories spread across the world almost instantaneously.One Direction star Harry Styles tweeted about his band’s breakup andin fifteen seconds, it moved from the United States to just about everycorner of the planet [1].We look to discover features of what makes information spread onTwitter.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 2 / 22

  • Sentiment

    We determine if sentiment impacts whether or not a user retweets.Are positive or negative tweets more likely to be retweeted?Is the number of retweets correlated with sentiment?

    We classify the sentiment of a tweet as positive, negative or neutraland analyze how these tweets spread.We use the package sentimentr in R to calculate the sentiment oftweets.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 3 / 22

  • Examples

    Negative sentiment:“Because unlike you lazy lot they won’t sigh and go “it is well” afterbeing owed salaries. They’ll fight. Throw chairs. Threatenimpeachment. Scheme. They are not lazy nor foolish like we thinkthem. They know their strengths and apply it accordingly.”

    -1.668267

    Positive sentiment:“So nice getting to meet the brilliant Eddie Redmayne! Absolutely loveFantastic Beasts: The Crimes Of Grindelwald! #FantasticBeasts#ProtectTheSecrets #BeFantastic”

    1.4546

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 4 / 22

  • Network analysis

    We build networks for 20 positive and 20 negative tweets.We use graph theory measures to see if there are any measurabledifferences between the networks of positive and negative tweets.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 5 / 22

  • How we build our networks

    Twitter doesn’t accurately show whom someone retweeted.According to Twitter, everyone retweets the author of the tweet, sothe retweet network is a star graph.

    Retweet network based on Twitter’s information. This is a star graph.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 6 / 22

  • How we build our networks

    Twitter doesn’t accurately show whom someone retweeted.According to Twitter, everyone retweets the author of the tweet, sothe retweet network is a star graph.

    Retweet network based on Twitter’s information. This is a star graph.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 6 / 22

  • Our networks

    1 We consider all of the retweeters of a tweet.2 We get the ‘friends’ of all the retweeters.3 If someone is a friend of a retweeter and is also a retweeter, then we

    form an edge between them.

    This shows where a user saw the tweet from and who they actuallyretweeted.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 7 / 22

  • The data

    We collected 12,000 random tweets at 12pm and 10pm every day fortwo weeks.We built networks for one positive and negative tweet from eachsample that had 50-100 retweets and a sentiment score > |0.9|.

    0

    5000

    10000

    15000

    Sund

    ay

    Mon

    day

    Tues

    day

    Wed

    nesd

    ay

    Thur

    sday

    Frid

    ay

    Satu

    rday

    Day

    Cou

    nt

    Sentiment

    Negative

    Positive

    Tweets sampled each day

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 8 / 22

  • The data

    We collected 12,000 random tweets at 12pm and 10pm every day fortwo weeks.We built networks for one positive and negative tweet from eachsample that had 50-100 retweets and a sentiment score > |0.9|.

    0

    5000

    10000

    15000

    Sund

    ay

    Mon

    day

    Tues

    day

    Wed

    nesd

    ay

    Thur

    sday

    Frid

    ay

    Satu

    rday

    Day

    Cou

    nt

    Sentiment

    Negative

    Positive

    Tweets sampled each day

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 8 / 22

  • 0

    20000

    40000

    60000

    −1 0 1Sentiment

    Histogram for sentiment of all data

    The distribution of sentiment for all of our samples.

    Positive Negative Neutral98,038 61,960 49,039

    The number of tweets in our data set by sentiment.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 9 / 22

  • 0

    20000

    40000

    60000

    −1 0 1Sentiment

    Histogram for sentiment of all data

    The distribution of sentiment for all of our samples.

    Positive Negative Neutral98,038 61,960 49,039

    The number of tweets in our data set by sentiment.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 9 / 22

  • Saturday

    Wednesday Thursday Friday

    Sunday Monday Tuesday

    0

    5

    10

    15

    0

    5

    10

    15

    0

    5

    10

    15

    Sentiment

    log(

    retw

    eet c

    ount

    + 1

    )

    Retweet count by sentiment for each day

    The number of retweets on the log scale by sentiment for each day of the week.We consider tweets with or without any retweets.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 10 / 22

  • 0

    5

    10

    15

    Sentiment

    log(

    retw

    eet c

    ount

    + 1

    )Retweet count by sentiment

    The number of retweets on the log scale by sentiment for our entire dataset. Weconsider tweets with or without any retweets.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 11 / 22

  • Likelihood of retweet based on sentiment

    We previously determined that the greater the polarity of the tweet,the more retweets that tweet will receive.Now we look into if a tweet is more likely to be retweeted based on ifit is positive or negative.Recall that there are more positive tweets in our data set thannegative tweets, so we will use proportions.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 12 / 22

  • 0.00

    0.25

    0.50

    0.75

    1.00

    nega

    tive

    neut

    ral

    posi

    tive

    Sentiment

    Pro

    port

    ion Retweet count > 1000

    No

    Yes

    Retweet count > 1000 by sentiment

    The proportion of negative, neutral, and positive tweets that are retweeted morethan 1000 times.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 13 / 22

  • Results

    15% of negative tweets are retweeted more than 1000 times.Only 12.3% of positive tweets are retweeted more than 1000 times.This difference is statistically significant (Fisher Test results in ap-value of 0.0004997).Therefore, negative tweets are more likely than positive tweets to beretweeted more than 1000 times.The same is true for retweet count > 1 and retweet count > 100.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 14 / 22

  • Network analysis

    We calculate the Pearson correlation between sentiment score and:group betweennessgroup degreemodularitygroup closenessthe number of communitiesdensityaverage clustering coefficient

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 15 / 22

  • r(pos) p-value(pos)

    r(neg) p-value(neg)

    Difference p-value(diff)

    Betweenness 0.1771 0.4821 0.0797 0.761 0.0973 0.7898Degree 0.0374 0.883 0.3379 0.1846 -0.3006 0.3976Modularity 0.1368 0.5884 0.3366 0.01186 -0.1998 0.5672Closeness -0.1364 0.5894 -0.5944 0.1865 0.458 0.1409Communities 0.1995 0.4273 -0.3865 0.1254 0.5861 0.1007Density -0.1042 0.6807 0.2824 0.2721 -0.3866 0.288Avg. C4 -0.2399 0.3375 0.3732 0.1401 -0.6132 0.0866

    Pearson correlation of graph theoretic measures compared to sentiment score.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 16 / 22

  • 0.000.050.100.150.200.25

    negative positiveSentimentG

    roup

    bet

    wee

    nnes

    s

    Betweenness

    0.000.250.500.75

    negative positiveSentiment

    Mod

    ular

    ity

    Modularity

    48

    12

    negative positiveSentiment

    Com

    mun

    ities

    Communities

    0.250.500.751.00

    negative positiveSentiment

    Gro

    up d

    egre

    e Degree

    0.00.20.40.60.8

    negative positiveSentimentA

    vg. c

    lust

    erin

    g co

    eff.

    Avg. clustering coefficient

    0.000

    0.005

    0.010

    negative positiveSentiment G

    roup

    clo

    sene

    ss Closeness

    Comparing positive and negative sentiments among our chosen network measures.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 17 / 22

  • Network example

    Network for a tweet by @AubreyKMiller. The size of the vertices represents thenumber of followers of that user. The lighter the color, the more friends that userhas.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 18 / 22

  • Network example

    The tweet is by @AubreyKMiller: "So nice getting to meet the brilliantEddie Redmayne! Absolutely love Fantastic Beasts: The Crimes OfGrindelwald! #FantasticBeasts #ProtectTheSecrets #BeFantastic."Fantastic Beasts is a movie starring actor Eddie Redmayne.While Aubrey is the original author of the tweet, she is not the mostimportant.@AubreyKMiller only has a degree of 11.@FantasticBeasts has a degree of 56.Our construction of a network is much different than that of a stargraph with @AubreyKMiller as the center vertex.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 19 / 22

  • Conclusion

    There are more positive tweets than negative tweets.Negative tweets are slightly more likely to be retweeted than positivetweets.The larger the absolute value of the sentiment score (i.e. the greaterthe emotional impact), the more retweets the tweet is likely to have.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 20 / 22

  • Acknowledgments

    I would like to thank my mentors Dr. Dancik and Dr. Heenehan for theirhelp and constant support throughout this research project and theNCUWM committee that put together this wonderful conference.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 21 / 22

  • References

    Beres, Damon. (2016) Watch The Amazing Way Information SpreadsOn Twitter. Huffington Post, Mar 20, 2016.https://www.huffingtonpost.com/entry/how-twitter-works_us_56ec6480e4b084c672204d74

    A. K. Jose, N. Bhatia, and S. Krishna. (2010) “Twitter SentimentAnalysis”. National Institute of Technology Calicut.

    Sarlan, A., Nadam, C., and Basri, S. (2014) Twitter sentimentanalysis. 212-216. doi:10.1109/ICIMU.2014.7066632.

    https://www.rdocumentation.org/packages/sentimentr. pages745-750.

    Barabási, A.-L. and Pósfai, M. (2016) Network science. Cambridge:Cambridge University Press.

    Haley Knox (Eastern CT State University) Twitter sentiment analysis January 26, 2019 22 / 22

    IntroductionTwitterRelated research