New CHAPTER 6 EXPERIMENTAL EVALUATION -...

140

CHAPTER 6

EXPERIMENTAL EVALUATION

6.1 EXPERIMENTAL SETUP

In this chapter, the dataset used for experimental setup is collected

from Folksonomy oriented bookmarking sites. Experiments are conducted to

find the relevant tags to provide an effective tag recommendation for the

resource. For efficient tag recommendation, Folksonomy dataset collection

has been taken into consideration based on the blog contents. Common

metrics such as precision, recall and F-measure have been discussed briefly

and the results of experiments are presented. For each testing data, the tags

are extracted from blogs using keyword extraction method and interest scores

for keywords to set up the input for recommendation has been computed. The

datasets such as BibSonomy and Delicious are examples of extracted tags

from the blogs.

6.2 DATASETS

The training data set consists of tags, titles, category from

Wikipedia and semantic relationship from WordNet. The main objective of

the training phase is to construct topic ontology related to tags. Existing blog

tags are used as a test data set. To evaluate the proposed recommendation

approach, datasets have been chosen from two different Folksonomy systems

namely Delicious and BibSonomy. BibSonomy and Delicious are the popular

social networking systems which have been applied to the research work on a

141

wide scale by providing various services that have been producing interesting

recommendations to the community. These services permit users to convey

their thoughts on resources with their own words.

6.2.1 Delicious

Delicious datasets have been used for a limited period of time in

which user can build bookmark to the URL and share with others. Delicious is

one of the popular collaborative tagging sites for bookmarking which permit

users to tag blog and web pages on the web. Figure 6.1 shows a snapshot of

Delicious.

Figure 6.1 Snapshot of Delicious

142

Table 6.1 Most Frequent Domains in the Delicious Corpus

S. No. Domain Bookmarks Users

1 en.wikipedia.org 937,785 305,739

2 www.flickr.com 892,157 262,963

3 www.youtube.com 890,769 256,126

4 www.google.com 772,460 176,890

5 www.nytimes.com 613, 676 121, 575

6 www.amazon.com 541, 314 94,093

7 news.bbc.co.uk 416,878 85,910

8 lifehacker.com 369,078 80,728

9 community.livejournal.com 320, 021 39,755

10 www.microsoft.com 310, 701 131,847

Table 6.1 shows the most frequent popular URLs in the delicious

corpus. Tags can be added to the users’ bookmark to explain, search, share

and classifying the bookmarks. Most recent bookmarks and its corresponding

tags are shown in Delicious’ front page statistically. Delicious has a popular

page to demonstrate the same information for most popular URLs. A set of 10

tags have been considered for this research work. For these tags, 23,701

URLs are retrieved. The tags which occur with a larger frequency and most

popular tags have been obtained. Finally, 2, 01,711 tags are retrieved with 89

% of tags per URL. It is easy to find the relevant topics since many users tag

the content. Table 6.2 shows the most frequent domains in the delicious

corpus. Figure 6.2 shows the popular URLs in delicious corpus.

143

Table 6.2 TOP 10 Popular URLs in the Delicious Corpus

S. No. URL Bookmarks

1 www.flickr.com 35, 732

2 www.pandora.com 35,531

3 script.aculo.us 31, 643

4 www.netvibes.com 30, 782

5 en.wikipedia.org 27, 672

6 www.youtube.com 26, 183

7 slashdot.org 25, 630

8 www.last.fm 23, 957

9 oswd.org 21,530

10 www.alvit.de 21, 130

Figure 6.2 Popular URLs in the Delicious Corpus

144

6.2.2 BibSonomy

BibSonomy is possibly the best investigated Folksonomy to date in

which user can accumulate and interpret URLs and publications as well.

Bibsonomy dataset is employed for tag recommendation challenge. Users,

resources, tags or keywords are considered as datasets. Other additional data

have been disregarded or ignored for all practical purposes. A set of 10 tags

has been chosen randomly from the tag list. Bookmark content has been

received for each tag with respect to relevant tags. Figure 6.3 shows the

snapshot of BibSonomy.

Figure 6.3 Snapshot of BibSonomy

145

6.3 CHOICE OF LANGUAGE FOR IMPLEMENTATION

6.3.1 Java

Java is a simple, portable, object-oriented, distributed, secure,

interpreted, robust, architecture neutral, multithreaded and dynamic

programming language. Java has significant advantages over other languages

and environments that make it suitable for programming task. Java becomes a

language of choice to implement the concepts for providing worldwide

internet solutions.

IDE used is Net Beans 6.0. Initially, front end is designed with

Macromedia Dreamweaver 8 tool. The process of article and relation

extractions is performed using JSP and Core Java.

6.3.2 MS-Access

MS-Access has been used to create multiple relational tables and

store more data. MS-Access allows the user to create relationships between

similar field across different tables or queries.

MS-Access is used as a back end for storing and retrieval process.

User details are saved for authentication purpose and it can be changed

dynamically each time when a new user enters. Keywords from existing blogs

are extracted and applied interest scores for the process of recommendation.

These scores are updated dynamically whenever a new keyword occurs.

Based on the highly activated scores, tags are suggested and represent it

graphically using MATLAB 7.5.0.342 (R2007b).

146

6.3.3 MATLAB (Matrix Laboratory)

With huge quantities of information circulating round the web,

various samples of data set needed to be considered for effective tag

recommendation. MATLAB is an ideal simulator tool that could be used for

applications with custom graphical interfaces. In this approach, MATLAB

environment allows writing of programs using JAVA and develop algorithms

and applications to evaluate the performance.

6.4 PERFORMANCE EVALUATION METRICS

Performance of the tag recommendation is based on the following

standard metrics.

6.4.1 Precision

In Information Retrieval (IR), Precision is the portion of retrieved

instances that are relevant and measure the quality of the recommended tags.

Precision =relevant tags retrieved tags

retrieved tags (6.1)

6.4.2 Recall

In IR, Recall is the portion of relevant instances that are retrieved

and measure the completeness of the recommended tags.

Recall =relevant tags retrieved tags

relevant tags (6.2)

147

6.4.3 F-Measure

F-measure combines recall and precision into one measure and is

defined as

2* Pr ecision * RecallF MeasurePr ecision Recall

(6.3)

It is also called F1 measure, because precision and recall are

weighted equally.

6.5 EVALUATION AND COMPARISION

This approach is validated using data from Delicious and

BibSonomy. A sample set of 50000 blogs has been taken into consideration.

Out of this set, a set of 50 blogs has been set aside for testing purposes.

Training set is used to build topic ontology in order to recommend tags for

resource in the test set. The increase of interest for the tags in a test set is

computed. In this approach, Precision, Recall and F-Measure are computed to

evaluate the performance effectiveness. Greater the precision, more precise

the suggested tags are. The most probable user will use the suggested tags

with recall. Not all the tags in test set are recommended. Experimental results

demonstrate the efficient tag recommendation based on weight of the tags

(interest scores which is assigned on the tags) and semantic relationship in

topic ontology. It retrieves the high scored tags when tags are related to the

users and scores are updated each time a new tag appears. Figure 6.3

represents the interest scores for number of tags in a blog. It is evident from

the figure 6.4 that the tags used by a large number of users with increase in

interest score have been identified.

148

1 2 3 4 5 6 7 8 9 1010

20

30

40

50

60

70

80

90

100

Number of sample tags

Interest score

Figure 6.4 Interest Score for the Tags

The following results are retrieved from the test set. The precision

and recall for the recommendation results for both Delicious and Bibsonomy

datasets have been obtained. The tags are then taken from the

recommendation list and used to suggest user interest for a particular concept.

Topic ontological tags are initialized to one with interest scores. Such a

recommendation set represents a condition where no initial user interest is

available. Spreading activation algorithm is applied to update interest scores

once topic ontology is constructed and the precision and recall values are

calculated with recommended results in order to compare with existing

AutoTag approach. The upgrading of interest scores for tags, illustrated in

Figure 6.4 has been calculated in percentage. Recall is the process of giving

input data into a trained set and receiving the response. Table 6.3 shows the

149

precision and recall calculation for 10 tags of Bibsonomy and Delicious

datasets. This work illustrates the idea clearly that the dataset outperforms the

proposed topic ontology for tag recommendation. When an user posts a

bookmark to a system, it recommends the right set of tags to the users.

Table 6.3 Precision, Recall for both BibSonomy and Delicious Datasets

Number

of Tags

Precision Recall

BibSonomy Delicious BibSonomy Delicious

1

2

3

4

5

6

7

8

9

10

0.20

0.263

0.301

0.331

0.361

0.382

0.418

0.438

0.463

0.482

0.21

0.265

0.310

0.368

0.393

0.421

0.440

0.463

0.481

0.492

0.124

0.163

0.217

0.261

0.310

0.347

0.385

0.409

0.428

0.449

0.128

0.182

0.228

0.268

0.298

0.319

0.337

0.352

0.370

0.393

6.5.1 Comparison

The tag recommendation approach is compared with existing

AutoTag mechanism after evaluating the performance of the proposed

approach based on the collected data from two different Folksonomy systems.

Folksonomy is a social and decentralized approach that is formed by

150

individuals or groups. Existing AutoTag mechanism does not recommend

newly added tags when it is used already in a blog.

Table 6.4 Precision, Recall and F-measure for BibSonomy Datasets

Table 6.4 shows the precision, recall and F-measure for the

BibSonomy datasets for both existing AutoTag Mechanism and proposed

Topic Ontology with Spreading Activation Algorithm.

Precision is a percentage of correctly recommended tags among all

tags recommended by the algorithm. The proposed method does not explicitly

focus on frequently used tags, which creates a potential area of improvement.

If the system failed to recommend frequent tags with high accuracy, its results

could be combined with the results of a system that focuses explicitly on these

tags. To test if such extension is needed, the results of the system are re-

evaluated by considering the top N [1, 10000] tags, sorted by the frequency

of occurrence in all posts. Posts which contained no tags from the set of the

151

most frequent tags were removed from the evaluation process. It is important

to notice that, it does not prune the list of recommended tags by removing the

low frequency tags. Although, it would certainly improve the accuracy of the

system, it would defeat one of the purposes of the experiment, which was to

determine if the system needs an additional module to increase the rank of

frequently used tags among all recommended tags.

The results of the experiment show that the system achieves much

higher precision score considering the most frequent tags only, comparing to

the results of the system evaluated for all tags. In most cases the largest

improvement is noticed for the top few tags. The accuracy of recommendation

decreases with the increasing size of the most frequent tags set, which is an

expected behavior, given that less frequent tags would become harder to

recommend. The same pattern can be observed for user relevant tags, which

show that the spreading activation is not impairing the quality of

recommendation for high frequency tags.

1 2 3 4 5 6 7 8 9 100.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of recommended tags

Bibsonomy

Topic ontology with SAAutoTag mechanism

Figure 6.5 Precision for Bibsonomy Datasets

152

Figure 6.5, 6.6 and 6.7 shows the precision, recall, F-measure for

the BibSonomy datasets respectively. It increases gradually according to the

increasing number of more tags for recommendation used in Bibsonomy

datasets. Here the proposed Algorithm achieves better performance than the

existing AutoTag mechanism on the tags whereas it is more tedious to hit the

resources specific to the most popular tags.

1 2 3 4 5 6 7 8 9 100.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5


Bibsonomy


Figure 6.6 Recall for BibSonomy Datasets

1 2 3 4 5 6 7 8 9 100.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Recommended Tags

F-Measure for Bibsonomy Datasets

AutoTag MechanismTopic Ontology with SA

Figure 6.7 F-Measure for Bibsonomy Datasets

153

Table 6.5 Precision, Recall and F-Measure for Delicious Datasets

Table 6.5 shows the Precision, Recall and F-Measure for the

Delicious datasets for both existing AutoTag Mechanism and proposed Topic

Ontology with Spreading Activation Algorithm.

154

1 2 3 4 5 6 7 8 9 100.2

0.25

0.3

0.35

0.4

0.45

0.5


Delicious


Figure 6.8 Precision for Delicious Datasets

1 2 3 4 5 6 7 8 9 100.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45


Delicious


Figure 6.9 Recall for Delicious Datasets

155

1 2 3 4 5 6 7 8 9 100.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5F-Measure for Delicious Datasets

Number of Recommended Tags

AutoTag MechanismTopic Ontology with SA

Figure 6.10 F-Measure for Delicious Datasets

Figure 6.8, 6.9 and 6.10 illustrates the Precision, Recall and F-

Measure for the Bibsonomy datasets respectively. It gradually increases when

more tags of the recommendation is used in Delicious datasets. The proposed

Algorithm achieves better performance than the existing AutoTag mechanism

on the tags whereas it is much difficult to hit the resource specific of the most

popular tags. Though, the proposed approach identifies the semantics of tags

and resources, the approach of discovering semantics varies from AutoTag

mechanism. The detailed dataset holds essential metrics and plots and so it

provides better results.

156

6.6 RESULTS AND DISCUSSION

This research work has calculated the Precision, Recall and F-

Measure values for 10 tags correspondingly. The tag recommendation

approach achieves higher precision than the existing AutoTag

recommendation approach. The recall develops with number of recommended

tags. Proposed topic ontology with spreading activation based tag

recommendation approach is experimentally demonstrated and it will reach

92.35% of the best promising performance when tags are recommended,

which is much higher than the existing approach. Figure 6.11, 6.12 and 6.13

shows the performance comparison of Precision, Recall and F-Measure for

BibSonomy datasets.

Figure 6.11 Comparison of Precision for BibSonomy datasets

157

Figure 6.12 Comparison of Recall for BibSonomy datasets

Figure 6.13 Comparison of F-Measure for BibSonomy datasets

158

Figure 6.14, 6.15 and 6.16 shows the performance comparison of

Precision, Recall and F-Measure for Delicious datasets.

Figure 6.14 Comparison of Precision for Delicious datasets

Figure 6.15 Comparison of Recall for Delicious datasets

159

Figure 6.16 Comparison of F-Measure for Delicious datasets

6.7 CONCLUSION

In this chapter, Experiments demonstrated that tags occurrences are

utilized to present more related tags recommendations to the users.

Experiments in real world datasets are conducted and showed that topic

ontology with spreading activation outperforms the existing AutoTag

mechanism. Conclusion of experiment demonstrated in this research work is:

The development of the topic ontology design in tag recommendation yields a

major advantage. The most popular tags attained rational Precision, Recall

and F-measure on the datasets of Delicious and BibSonomy. Currently, topic

ontology with the spreading activation approach yields a high precision, recall

and F-Measure for both Delicious and BibSonomy.

New CHAPTER 6 EXPERIMENTAL EVALUATION -...

Documents

Transcript of New CHAPTER 6 EXPERIMENTAL EVALUATION -...