Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al

Post on 14-Jan-2016

22 views 0 download



Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al. Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith. Introduction. People increasingly publish their reactions to public events using a blog - PowerPoint PPT Presentation

Transcript of Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al

Thomas van der Elsen, Richard Lawrence,

Jumi Oladimeji, Alastair Smith

IntroductionPeople increasingly publish their reactions to

public events using a blogA tool that enables this info to be published quicklyA journal that is available on the web

Need for effective data-mining techniques specific to blogs and similar tools (e.g. the Semantic Web)

“Our goal is to develop a method of capturing hot conversations by automating readers’ processes for characterizing and monitoring blogs.”

OverviewData-mining techniques

Creation of blog link structureAnalysing link structure

Types of important bloggersAgitatorsSummarisers

Applications, analysis and conclusionsReal-world applications and extensionsPros and cons of the paper

Crawling blogsExtracting hyperlinksExtracting blog threads

Crawling blogs

System crawls through RSS list registering for each entry:TitlePermalink List entry date

Aggregator: gathers RSS feeds from multiple sources and organises them

OPML: file format used to share RSS feed lists

RSS: A format for distributing content on the web


RSS list

RSS feeds


Extracting hyperlinks

Problem: Different tag structures per server

RSS feed from list


Blog entries

Hyperlink list

Extracting blog threadsHyperlink

If sourceLinkIf replyLink

Check links exist in thread data


Check departure URL exists in thread data

Check destination URL points to entry on list


Add dest entry to thread


Add destination entry to entry list and add to thread


Add departure entry to thread

01Create new thread


Example Results

AgitatorsSummarisersJoe Bloggs

AgitatorsDiscussion stimulatorThreads often grow after an agitator’s entryThree discriminants for an agitator

Link (Agi1)Popularity (Agi2)Topic (Agi3)

The three discriminants can be weighted using the following formula:

Link-based Discriminantex is an agitator if

(kx) > θ1

ex = a blog entry

kx = no of entries

in threadi with a

replyLink to ex

Popularity-based discriminantex is an agitator if

(lx/mx) > θ2

ex = a blog entrylx = no of entries in


published t days after ex

mx = no of entries in

threadi published t days

before ex

Topic-based discriminantex is an agitator if

ex = a blog entry

n = number of entries

Summarizers Publish entries that collate

and compact previous posts Provide a convenient way of

digesting an entire thread The discriminant for

summarizers is link-based:ex is a summarizer if

(px) > θ4

ex = a blog entry

px = number of entries in threadi that have a replyLink from ex

ApplicationsPros and ConsConclusions

ApplicationsSupplementary info e.g. TV, news site etc

Home and Away – who shot Josh West Agitator

Sports, etc. – used by studios and media to highlight points of interest in a match Summariser

Analysis – ProsBasis for future research – a brief intro to the

subject. Multiple thread analysisIdentification of areas of bloggers’ expertise

Highly effective in certain specific areasNews and reviews

Implementation of theory (feature vector)

Analysis – ConsOnly 25 sites used in sample (but 1000s of

blogs)Does not take context into consideration

E.g., an agitator may be posting offensive entries

No measurement of summary successComments are not analysedInappropriate for certain areas

MySpace, Bebo, et al. (due to target audience)

ConclusionsCreated a data-mining framework for future

researchMay instigate research into further work

Nice idea and potentially useful but needs to be extended

Thank you for your time