BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

19
BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008

Transcript of BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Page 1: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

BlogWall at Kent Ridge MRT Station

Janaka Prasad02/07/2008

Page 2: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Project Plan

Page 3: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

SMS reading and writing to database/ Testing

Task 10 – completed SMS read by the SMSReader written to 2 tables “sms”,

“sms_log” “sms_log” will log of all the SMS received by the system Status in “sms” indicate who to process it next

Status = 0 SMS Processor application Status = 1 Display application

Valid in “sms” indicate validity of the SMS Valid = 0 Invalid SMS Valid = 1 Valid SMS

Page 4: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

SMS reading and writing to database/ Testing

Base on the traffic of the system “sms_log” table may grow very rapidly

necessary to clear contents of this table from time to time

Developed a tool to do this task

Page 5: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

SMS reading and writing to database/ Testing

Page 6: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Processing SMS from the database/ Testing

Task 12 – completed Reading configuration file Polling Banned words Invalid characters POSTagger Finding poetry

Page 7: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Reading configuration file

How the application is going to work is set using the configuration file Located at Data\settings.cfg Data read from the config file are

No. of keywords selected from the SMS • Default 3

No. of synonyms selected for each keyword • Default 1

Connect to Internet to generate synonyms?• 1 YES• 0 NO

Remove banned words?• 1 YES• 0 NO

Maximum length of the SMS • Default 100

Polling enabled?• 1 YES• 0 NO

Page 8: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Polling

Polling data is held in “poll” and “poll_answers” tables

Page 9: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Polling

Page 10: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Polling

Page 11: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Banned words

If the SMS contains banned words we can not display them

“swearwords” table hold all the banned words

When the system initialize all the words in that table get loaded to a list to compare against the words in the SMS

Page 12: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Invalid characters

If the SMS contains characters that the POSTagger can not process it will generate an error

All the chars that can be processed by the system get loaded to a list when the system initialize

The SMS is checked to see whether it contains any chars other than what is in the list

Page 13: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Finding the poetry

The application goes to a infinite loop and check the “sms” table with entries with status==0

Check for length of the SMS If > Max length error

Check SMS is poll answer Update poll answers table

Check for invalid chars Set valid=0

Process message with POSTagger If return text length == 0 then error

Page 14: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Finding the poetry

Check No. of words in the SMS Less than 3 unique words valid=0

Check for length of each word in the SMS If > 40 malicious attack? valid=0

Check for banned words Error

Calculate emotional weight of the SMS Identify the tag ids in the output string generated by

POSTagger Retrieve the tf-idf weight of each word from the

database Select maximum weighted words Store results in “sms_text_word”

Page 15: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Finding the poetry

keywords

PK id

wordidf

sms

PK id

sms_textstatusweight_xweight_yvalid

sms_text_word

PK,FK2 id

FK1 sms_idsms_wordword_weightword_typeword_selected

sms_synonym

FK1 sms_word_idFK2 keyword_id

selectedsynonymword_weight

sms_poem_line

PK id

tfweight_xweight_yfinalpoem_line

pos_tags

PK id

typedescription

Page 16: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Finding the poetry

Post each of the selected word to free dictionary website (http://www.thefreedictionary.com)

Analyze HTML response from the website to find the synonyms for each of the word

Retrieve the tf-idf weight of each synonym from the database

Select maximum weighted synonyms Store the result in “sms_synonym” table

Page 17: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Finding the poetry

Find poetry lines from the database where the selected synonym is used in the same context as in the SMS

Select the final poetry line which maximize the tf weight and minimizes emotional weight difference to the users SMS

Page 18: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Data entry

234 poems added to the database

Page 19: BlogWall at Kent Ridge MRT Station Janaka Prasad 02/07/2008.

Important points …

Testing still processing

Shinsuke will come up with the first visuals by this weekOpenGL, FreeType