GSAT (General Sentiment Analysis Tool) Design Review By Asaf Bruner.

19
GSAT (General Sentiment Analysis Tool) Design Review By Asaf Bruner

Transcript of GSAT (General Sentiment Analysis Tool) Design Review By Asaf Bruner.

GSAT(General Sentiment Analysis

Tool)

Design ReviewBy Asaf Bruner

Problem Description

Big Data & Sentiment Analysis

O Let’s start with a short video: http://www.youtube.com/watch?v=ij5yC-moPCM

O Textual information is either facts or opinions.

O Very little research has been made on the processing of opinions until only recently. Yet, opinions are so important that whenever we need to make a decision we want to hear others’ opinions.

   The specific problem I will be dealing with

O Currently there is no unified solution that can answer the problem which was discussed above.

O I will design and build a system which does the

following:O Automatically collects the talkbacks from websitesO Analyzes the data using NLP toolsO Draws conclusions from the gathered informationO Displays it in an easy to understand wayO Answer some very interesting and important

questions.

Where else can we use GSAT?

O Individuals making purchasing decisions.

O Organizations can use this tool to replace opinion polls, surveys, and focus groups.

O Trend analysis.

General scheme of the proposed

solution

The Data

O I am using an open source, java based, web crawler – crawler4j by Google to collect my data.

O Using regular expression and DOM analysis I extract the main text & talkbacks from the article while cleaning advertisements and unrelated text.

O The list of sites I am crawling is defined in advance.

The algorithm – Design review

Integrate with a crawler and extract articles and their talkbacks

Integrate with an NLP code and analyze the articles and talkback for their entities and emotions

Build a database to hold this information

Design and build an algorithm to answer the above mentioned questionsBuild easy to interpret

GUI to display the data and conclusions

The algorithm – Design review

The Tools and infrastructure I am using

O The program is written in java (eclipse IDE).O Crawling using crawler4j.O NLP & sentiment analysis using AlchemyAPI.O Database using MySQL.O GUI using Google visualize.

Expected deliverables

 What is actually going to be delivered and how it can be used

O I am going to present a specific use case – analyzing ynetnews.com and haaretz.com for political entities and sentimental information relating to them.

O Other then that this will be a fully functional program. Meaning only slight changes will have to be made to generalize this use case.

Potential intellectual property that could come out of the project

O Integration between several toolsO Algorithm

Competing solutions

Well…

OCurrently no free open source tool is available that does what GSAT offers!

Other ways the problem can be solved

O Currently there are 30 US based companies that offer paid sentiment analysis. None of them offers freely the combination of data mining and text analysis.

Characterization of the users

Initial group of users and the most general group

O Everyone who wants to know what is being wrote and thought about entities in which they have interest.

O Everyone who has interest in analyzing trends.

How do you think one could make money out of your product

O Advertisement market (campaign evaluating).

O Product comparison (retail companies).O Trend analysis.O And many more…