BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

Post on 10-May-2015

532 views 3 download

Tags:

description

by Sreejata Chatterjee,Social Media Lab, Dalhousie University, Halifax, Canada

Transcript of BIG Data, Social Data: Targeted Harnessing of Transient Micro-Blogging Data

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

Introduction

Footnotes

Acknowledgements

Sreejata Chatterjee (sreejata@cs.dal.ca)

Faculty of Computer Science, Dalhousie University, Halifax, Canada

[1] Mashable Social Media: http://mashable.com/2011/09/08/twitter-has-100-million-active

[2] Social Media Lab: http://socialmedialab.ca/?p=1952

[3] Wired.com: http://www.wired.com/wiredscience/2010/10/twitter-crystal-ball

[4] Radian6: Social Media Monitoring and Engagement, Social CRM

There are huge amounts of real-time social media data

being created every moment. For example, ~230 million

tweets are posted daily by Twitter’s 200 million users [1].

If harnessed, it can provide a great wealth of insight into

what people are thinking about and what they like or

dislike. For instance, Twitter data has already proven to

be useful in a number of different contexts: monitoring

elections [2] to predicting stock market trends [3] to

conducting brand monitoring and PR campaigns [4].

However, social media data tend to be noisy and

ephemeral. Furthermore, social media companies often

limit the amount of data one can access automatically at

any point of time, making this rich source of transient

data difficult to collect.

This work focuses on designing and developing

automated methods and a web-based infrastructure that

can help other researchers and developers to collect

and process raw social media data by:

(1) Creating a Data Collector and Repository Tool

for collecting and storing public Twitter data for a

specified group of online users in an effective and

efficient manner,

(2) Connecting open APIs via Web Services which

process Twitter to add value and richness to the

Twitter data in our database, such as geo-coding or

assigning “influence” scores to Tweeters,

(3) Creating an NLP (Natural Language Processing)

Module that can conduct sentiment analysis on

social media data,

(4) Providing a robust API that other developers can

use to create and test innovative web applications

with the data collected.

I would like to thank Dr. Anatoliy Gruzd, Director of the Social Media Lab, for

supervising this research. Additionally, I would like to thank Philip Mai,

Research Manager at the Social Media Lab for his valuable feedback.

System Architecture for Handling Social Media Data

getAllTweet - Return all the tweets by all the users

getUserTweets - Returns tweets posted by a specified user

getTimedUserTweets - Returns tweets within a time interval

getUserProfilePicUrl - Returns user’s profile picture

getUserDetails - Returns detailed user information

getUserTimeLineInfo - Returns basic user information

API calls are made via HTTP requests (see below).

The output is formatted in JSON (JavaScript Object

Notation).

1) Gets all tweets that have been posted between Feb 14 -

April 14, 2012, by all of the users who follow “asist2011” and

“asist_org”:

http://URL_BASE/tweetApiCalls.php?call=getAllTweets&

seedUserList=asist2011,asist_org&startTime=2012-02-

14&endTime=2012-04-14

2) Returns details about dalprof’s profile such as profile info,

followers, friends, Klout score (influence score), geocoded

location – for easy and universal location identification

http://URL_BASE/tweetApiCalls.php?call=getUserDetails

&user=dalprof

GRAND Projects:

• DINS - Digital Infrastructures: Access and

Use in the Network Society

• NAVEL - Network Assessment and

Validation for Effective Leadership

Netlytic – a system for

automated discovery, analysis

and visualization of information

about online communities, being

developed by Dr. Gruzd at the

Dalhousie University Social

Media Lab.

Example 2: Tag Cloud of Top 30 Topics derived from

Positive (left) and Negative (right) Tweets about #OccupyWallStreet

Example 1: A Visual Representation of the Sentiment Analysis

made possible by the new NLP Module now available in Netlytic

As a proof of concept, the new NLP Module, based on the

Natural Language ToolKit (NLTK), has been added to an existing

web tool called Netlytic, giving it the ability to provide sentiment

analysis.

Sentiment Analysis of >70K Tweets

about #OccupyWallStreet

Conclusion: Overall, tweets about

the Occupy Wall Street movement

were more positive than negative.

Case Studies #2: Netlytic.org

Sample API Calls

Research Objectives

Case Studies #1: AcademiaMap.com

AcademiaMap-Dashboard App

AcademiaMap-GeoVisualizer App

AcademiaMap helps scholars to filter

the “noise” from their Twitter streams

using various "influence" metrics and

provides them with an easy way to

identify trending topics and interesting

voices to follow on Twitter.

(Lead developer: Melissa Anez)

A Geo-based Visualization system

that displays communication

connections between scholarly users

of Twitter from across the globe.

(Lead developer: Jamiur Rahman)

AcademiaMap - Twitter App

The API developed as part of this project is currently being

used in a few different applications for a system called

AcademiaMap, an Online Influence Assessment App

designed for scholars.

A Twitter app that automatically posts

tweets about trending topics and re-

posts tweets that are popular within a

group of scholarly Twitter users.

(Lead developer: Sreejata Chatterjee)