BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

www.PosterPresentations.com

Introduction

Footnotes

Acknowledgements

Sreejata Chatterjee (sreejata@cs.dal.ca)

Faculty of Computer Science, Dalhousie University, Halifax, Canada

[1] Mashable Social Media: http://mashable.com/2011/09/08/twitter-has-100-million-active

[2] Social Media Lab: http://socialmedialab.ca/?p=1952

[3] Wired.com: http://www.wired.com/wiredscience/2010/10/twitter-crystal-ball

[4] Radian6: Social Media Monitoring and Engagement, Social CRM

There are huge amounts of real-time social media data

being created every moment. For example, ~230 million

tweets are posted daily by Twitter’s 200 million users [1].

If harnessed, it can provide a great wealth of insight into

what people are thinking about and what they like or

dislike. For instance, Twitter data has already proven to

be useful in a number of different contexts: monitoring

elections [2] to predicting stock market trends [3] to

conducting brand monitoring and PR campaigns [4].

However, social media data tend to be noisy and

ephemeral. Furthermore, social media companies often

limit the amount of data one can access automatically at

any point of time, making this rich source of transient

data difficult to collect.

This work focuses on designing and developing

automated methods and a web-based infrastructure that

can help other researchers and developers to collect

and process raw social media data by:

(1) Creating a Data Collector and Repository Tool

for collecting and storing public Twitter data for a

specified group of online users in an effective and

efficient manner,

(2) Connecting open APIs via Web Services which

process Twitter to add value and richness to the

Twitter data in our database, such as geo-coding or

assigning “influence” scores to Tweeters,

(3) Creating an NLP (Natural Language Processing)

Module that can conduct sentiment analysis on

social media data,

(4) Providing a robust API that other developers can

use to create and test innovative web applications

with the data collected.

I would like to thank Dr. Anatoliy Gruzd, Director of the Social Media Lab, for

supervising this research. Additionally, I would like to thank Philip Mai,

Research Manager at the Social Media Lab for his valuable feedback.

System Architecture for Handling Social Media Data

getAllTweet - Return all the tweets by all the users

getUserTweets - Returns tweets posted by a specified user

getTimedUserTweets - Returns tweets within a time interval

getUserProfilePicUrl - Returns user’s profile picture

getUserDetails - Returns detailed user information

getUserTimeLineInfo - Returns basic user information

API calls are made via HTTP requests (see below).

The output is formatted in JSON (JavaScript Object

Notation).

1) Gets all tweets that have been posted between Feb 14 -

April 14, 2012, by all of the users who follow “asist2011” and

“asist_org”:

http://URL_BASE/tweetApiCalls.php?call=getAllTweets&

seedUserList=asist2011,asist_org&startTime=2012-02-

14&endTime=2012-04-14

2) Returns details about dalprof’s profile such as profile info,

followers, friends, Klout score (influence score), geocoded

location – for easy and universal location identification

http://URL_BASE/tweetApiCalls.php?call=getUserDetails

&user=dalprof

GRAND Projects:

• DINS - Digital Infrastructures: Access and

Use in the Network Society

• NAVEL - Network Assessment and

Validation for Effective Leadership

Netlytic – a system for

automated discovery, analysis

and visualization of information

about online communities, being

developed by Dr. Gruzd at the

Dalhousie University Social

Media Lab.

Example 2: Tag Cloud of Top 30 Topics derived from

Positive (left) and Negative (right) Tweets about #OccupyWallStreet

Example 1: A Visual Representation of the Sentiment Analysis

made possible by the new NLP Module now available in Netlytic

As a proof of concept, the new NLP Module, based on the

Natural Language ToolKit (NLTK), has been added to an existing

web tool called Netlytic, giving it the ability to provide sentiment

analysis.

Sentiment Analysis of >70K Tweets

about #OccupyWallStreet

Conclusion: Overall, tweets about

the Occupy Wall Street movement

were more positive than negative.

Case Studies #2: Netlytic.org

Sample API Calls

Research Objectives

Case Studies #1: AcademiaMap.com

AcademiaMap-Dashboard App

AcademiaMap-GeoVisualizer App

AcademiaMap helps scholars to filter

the “noise” from their Twitter streams

using various "influence" metrics and

provides them with an easy way to

identify trending topics and interesting

voices to follow on Twitter.

(Lead developer: Melissa Anez)

A Geo-based Visualization system

that displays communication

connections between scholarly users

of Twitter from across the globe.

(Lead developer: Jamiur Rahman)

AcademiaMap - Twitter App

The API developed as part of this project is currently being

used in a few different applications for a system called

AcademiaMap, an Online Influence Assessment App

designed for scholars.

A Twitter app that automatically posts

tweets about trending topics and re-

posts tweets that are popular within a

group of scholarly Twitter users.

(Lead developer: Sreejata Chatterjee)

BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

Technology

Transcript of BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

Harnessing Hydropower - Water Reporting, Data & Convening

Harnessing the Data Revolution (HDR): Institutes for Data ......2020/10/26 · Harnessing the Data Revolution (HDR): Institutes for Data-Intensive Research in Science and Engineering

SIMPLE DATA ANALYSIS FOR BIOLOGISTS - WorldFish | Harnessing

Rhythm: Harnessing Data Parallel Hardware for Server Workloadsalvy/papers/rhythmasplos.pdf · Rhythm: Harnessing Data Parallel Hardware for Server Workloads Sandeep R Agrawal Duke

Data 2.0 - Harnessing New Data Visualization Tools CIL 2008

Big data: Harnessing a game-changing asset

Harnessing hadoop for big data analytics v0.1

Harnessing Unstructured Big Data Through Computer Vision ... · Harnessing Unstructured Big Data Through Computer Vision & Machine Learning Presented by Hive.AI August 2017 . Page

WSO2 Intro Webinar - The WSO2 Data Services - Harnessing Disparate Enterprise Data

Harnessing Big Data For Marketing Results

Urban Informatics: Harnessing data to understand socio ...

Harnessing Hadoop: Understanding the Big Data - Cognizant

Harnessing Preattentive Processes for Multivariate Data ...graphicsinterface.org/wp-content/uploads/gi1993-14.pdf · Harnessing Preattentive Processes for Multivariate Data Visualization

Harnessing Big Data with Spark

Land O' Lakes: Harnessing Big Data Variety

HARNESSING BIG DATA FOR PRECISION MEDICINE ...psb.stanford.edu/previous/psb17/conference-materials/proceedings/7...HARNESSING BIG DATA FOR PRECISION MEDICINE: INFRASTRUCTURES AND APPLICATIONS

Harnessing the Power of Data to Improve Building Performance... · Harnessing the Power of Data to Improve Building Performance 5/27/2015 . Panelists ... / Analytics • Access Data

Poster SGCI Science Gateways: Harnessing Big Data and Open Data

Harnessing the Data Revolution (HDR): Data Science Corps ... · Harnessing the Data Revolution (HDR): Data Science Corps (DSC) Building Capacity for HDR Synopsis of Program: In 2016,

Banking on Big Data: Harnessing Big Data to drive valuable BigDecisions