BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

1
TEMPLATE DESIGN © 2008 www.PosterPresentations.com Introduction Footnotes Acknowledgements Sreejata Chatterjee ([email protected]) Faculty of Computer Science, Dalhousie University, Halifax, Canada [1] Mashable Social Media: http://mashable.com/2011/09/08/twitter-has-100-million-active [2] Social Media Lab: http://socialmedialab.ca/?p=1952 [3] Wired.com: http://www.wired.com/wiredscience/2010/10/twitter-crystal-ball [4] Radian6: Social Media Monitoring and Engagement, Social CRM There are huge amounts of real-time social media data being created every moment. For example, ~230 million tweets are posted daily by Twitter’s 200 million users [1]. If harnessed, it can provide a great wealth of insight into what people are thinking about and what they like or dislike. For instance, Twitter data has already proven to be useful in a number of different contexts: monitoring elections [2] to predicting stock market trends [3] to conducting brand monitoring and PR campaigns [4]. However, social media data tend to be noisy and ephemeral. Furthermore, social media companies often limit the amount of data one can access automatically at any point of time, making this rich source of transient data difficult to collect. This work focuses on designing and developing automated methods and a web-based infrastructure that can help other researchers and developers to collect and process raw social media data by: (1) Creating a Data Collector and Repository Tool for collecting and storing public Twitter data for a specified group of online users in an effective and efficient manner, (2) Connecting open APIs via Web Services which process Twitter to add value and richness to the Twitter data in our database, such as geo-coding or assigning “influence” scores to Tweeters, (3) Creating an NLP (Natural Language Processing) Module that can conduct sentiment analysis on social media data, (4) Providing a robust API that other developers can use to create and test innovative web applications with the data collected. I would like to thank Dr. Anatoliy Gruzd, Director of the Social Media Lab, for supervising this research. Additionally, I would like to thank Philip Mai, Research Manager at the Social Media Lab for his valuable feedback. System Architecture for Handling Social Media Data getAllTweet - Return all the tweets by all the users getUserTweets - Returns tweets posted by a specified user getTimedUserTweets - Returns tweets within a time interval getUserProfilePicUrl - Returns user’s profile picture getUserDetails - Returns detailed user information getUserTimeLineInfo - Returns basic user information API calls are made via HTTP requests (see below). The output is formatted in JSON (JavaScript Object Notation). 1) Gets all tweets that have been posted between Feb 14 - April 14, 2012, by all of the users who follow “asist2011” and “asist_org”: http://URL_BASE/tweetApiCalls.php?call=getAllTweets& seedUserList=asist2011,asist_org&startTime=2012-02- 14&endTime=2012-04-14 2) Returns details about dalprof’s profile such as profile info, followers, friends, Klout score (influence score), geocoded location – for easy and universal location identification http://URL_BASE/tweetApiCalls.php?call=getUserDetails &user=dalprof GRAND Projects: DINS - Digital Infrastructures: Access and Use in the Network Society NAVEL - Network Assessment and Validation for Effective Leadership Netlytic a system for automated discovery, analysis and visualization of information about online communities, being developed by Dr. Gruzd at the Dalhousie University Social Media Lab. Example 2: Tag Cloud of Top 30 Topics derived from Positive (left) and Negative (right) Tweets about #OccupyWallStreet Example 1: A Visual Representation of the Sentiment Analysis made possible by the new NLP Module now available in Netlytic As a proof of concept, the new NLP Module, based on the Natural Language ToolKit (NLTK), has been added to an existing web tool called Netlytic, giving it the ability to provide sentiment analysis. Sentiment Analysis of >70K Tweets about #OccupyWallStreet Conclusion: Overall, tweets about the Occupy Wall Street movement were more positive than negative. Case Studies #2: Netlytic.org Sample API Calls Research Objectives Case Studies #1: AcademiaMap.com AcademiaMap-Dashboard App AcademiaMap-GeoVisualizer App AcademiaMap helps scholars to filter the “noise” from their Twitter streams using various "influence" metrics and provides them with an easy way to identify trending topics and interesting voices to follow on Twitter. (Lead developer: Melissa Anez) A Geo-based Visualization system that displays communication connections between scholarly users of Twitter from across the globe. (Lead developer: Jamiur Rahman) AcademiaMap - Twitter App The API developed as part of this project is currently being used in a few different applications for a system called AcademiaMap, an Online Influence Assessment App designed for scholars. A Twitter app that automatically posts tweets about trending topics and re- posts tweets that are popular within a group of scholarly Twitter users. (Lead developer: Sreejata Chatterjee)

description

by Sreejata Chatterjee,Social Media Lab, Dalhousie University, Halifax, Canada

Transcript of BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

Page 1: BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

Introduction

Footnotes

Acknowledgements

Sreejata Chatterjee ([email protected])

Faculty of Computer Science, Dalhousie University, Halifax, Canada

[1] Mashable Social Media: http://mashable.com/2011/09/08/twitter-has-100-million-active

[2] Social Media Lab: http://socialmedialab.ca/?p=1952

[3] Wired.com: http://www.wired.com/wiredscience/2010/10/twitter-crystal-ball

[4] Radian6: Social Media Monitoring and Engagement, Social CRM

There are huge amounts of real-time social media data

being created every moment. For example, ~230 million

tweets are posted daily by Twitter’s 200 million users [1].

If harnessed, it can provide a great wealth of insight into

what people are thinking about and what they like or

dislike. For instance, Twitter data has already proven to

be useful in a number of different contexts: monitoring

elections [2] to predicting stock market trends [3] to

conducting brand monitoring and PR campaigns [4].

However, social media data tend to be noisy and

ephemeral. Furthermore, social media companies often

limit the amount of data one can access automatically at

any point of time, making this rich source of transient

data difficult to collect.

This work focuses on designing and developing

automated methods and a web-based infrastructure that

can help other researchers and developers to collect

and process raw social media data by:

(1) Creating a Data Collector and Repository Tool

for collecting and storing public Twitter data for a

specified group of online users in an effective and

efficient manner,

(2) Connecting open APIs via Web Services which

process Twitter to add value and richness to the

Twitter data in our database, such as geo-coding or

assigning “influence” scores to Tweeters,

(3) Creating an NLP (Natural Language Processing)

Module that can conduct sentiment analysis on

social media data,

(4) Providing a robust API that other developers can

use to create and test innovative web applications

with the data collected.

I would like to thank Dr. Anatoliy Gruzd, Director of the Social Media Lab, for

supervising this research. Additionally, I would like to thank Philip Mai,

Research Manager at the Social Media Lab for his valuable feedback.

System Architecture for Handling Social Media Data

getAllTweet - Return all the tweets by all the users

getUserTweets - Returns tweets posted by a specified user

getTimedUserTweets - Returns tweets within a time interval

getUserProfilePicUrl - Returns user’s profile picture

getUserDetails - Returns detailed user information

getUserTimeLineInfo - Returns basic user information

API calls are made via HTTP requests (see below).

The output is formatted in JSON (JavaScript Object

Notation).

1) Gets all tweets that have been posted between Feb 14 -

April 14, 2012, by all of the users who follow “asist2011” and

“asist_org”:

http://URL_BASE/tweetApiCalls.php?call=getAllTweets&

seedUserList=asist2011,asist_org&startTime=2012-02-

14&endTime=2012-04-14

2) Returns details about dalprof’s profile such as profile info,

followers, friends, Klout score (influence score), geocoded

location – for easy and universal location identification

http://URL_BASE/tweetApiCalls.php?call=getUserDetails

&user=dalprof

GRAND Projects:

• DINS - Digital Infrastructures: Access and

Use in the Network Society

• NAVEL - Network Assessment and

Validation for Effective Leadership

Netlytic – a system for

automated discovery, analysis

and visualization of information

about online communities, being

developed by Dr. Gruzd at the

Dalhousie University Social

Media Lab.

Example 2: Tag Cloud of Top 30 Topics derived from

Positive (left) and Negative (right) Tweets about #OccupyWallStreet

Example 1: A Visual Representation of the Sentiment Analysis

made possible by the new NLP Module now available in Netlytic

As a proof of concept, the new NLP Module, based on the

Natural Language ToolKit (NLTK), has been added to an existing

web tool called Netlytic, giving it the ability to provide sentiment

analysis.

Sentiment Analysis of >70K Tweets

about #OccupyWallStreet

Conclusion: Overall, tweets about

the Occupy Wall Street movement

were more positive than negative.

Case Studies #2: Netlytic.org

Sample API Calls

Research Objectives

Case Studies #1: AcademiaMap.com

AcademiaMap-Dashboard App

AcademiaMap-GeoVisualizer App

AcademiaMap helps scholars to filter

the “noise” from their Twitter streams

using various "influence" metrics and

provides them with an easy way to

identify trending topics and interesting

voices to follow on Twitter.

(Lead developer: Melissa Anez)

A Geo-based Visualization system

that displays communication

connections between scholarly users

of Twitter from across the globe.

(Lead developer: Jamiur Rahman)

AcademiaMap - Twitter App

The API developed as part of this project is currently being

used in a few different applications for a system called

AcademiaMap, an Online Influence Assessment App

designed for scholars.

A Twitter app that automatically posts

tweets about trending topics and re-

posts tweets that are popular within a

group of scholarly Twitter users.

(Lead developer: Sreejata Chatterjee)