[IEEE 2011 3rd International Conference on Electronics Computer Technology (ICECT) - Kanyakumari,...

5

Click here to load reader

Transcript of [IEEE 2011 3rd International Conference on Electronics Computer Technology (ICECT) - Kanyakumari,...

Page 1: [IEEE 2011 3rd International Conference on Electronics Computer Technology (ICECT) - Kanyakumari, India (2011.04.8-2011.04.10)] 2011 3rd International Conference on Electronics Computer

Generation of Ontology Based User Profiles for Personalized Web Search

Jayanthi .J Assistent Professor

Dept. Computer Science Sona College of Technology

Salem, India E-mail:[email protected]

Dr.K.S.Jayakumar Associate Professor Dept. Mechanical Engineering SSN College of Engineering

Chennai,India Email: [email protected]

Sruthi Surendran

Dept. Computer Science

Sona College of Technology

Salem, India E-mail:

[email protected]

Abstract— As the number of Internet users and accessible Web pages grows; it is becoming increasingly difficult for users to find documents that are relevant to their particular needs. Users must either browse through a large hierarchy of concepts to find the information for which they are looking or submit a query to a publicly available search engine and wade through hundreds of results, most of them irrelevant. The core problem is that whether the user is browsing or searching, whether they are novice web users or experts, the identical information result is retrieved and presented commonly without any difference. The goal of Web search personalization is to tailor search results to a particular user based on that user’s interests and preferences, thus allowing for more efficient and relevant information access. Personalization of information retrieval involves two major challenges: identifying the user context and organizing them in such a way that improves the search precision. This leads to the development of user profiles in a hierarchical structure namely ontology for user profiles. The main goal of this paper is to propose a technique that implicitly builds ontology-based user profiles. We build and update profiles dynamically by monitoring and storing the user’s browsing habits.

Keywords- Personalization, information retrieval , ontology, precision, User profiles.

I. INTRODUCTION The huge amount of information available on the Internet

is widely shared primarily due to ability of Web search engines to find useful information for users. However, present day search engines are far from perfect. They return results based on simple keyword matches without any concern for the information needs of the user in due course of search process. The query “Mercury” returns the same results to a person searching for planet, element and a mythology searching for information in the different contexts. The query “space” returns the same results to a person searching for industry or residential space and a person searching for space stations. Search engines are lacking a personalization mechanism that would understand the information needs of the user at a particular instance of time and return custom results.

Personalization broadly involves the process of gathering user-specific information during interaction with the user, [18] that is then used to deliver appropriate content and services; tailor-made to the user’s needs.

When applied to search, personalization would involve the following steps: [12]

� Collecting and representing information about the user, to understand the user’s interests.

� Using this information to either filter or re-rank the results returned from the initial retrieval process, or directly including this information into the search process itself to select personalized results.

Thus, the problem of search engine personalization has two broad dimensions:

� How can accurate information about the user’s interests be collected and represented with minimal user intervention?

� How can this information about the user be used to deliver personalized search results?

In this paper we analyzed various approaches to personalizing Web search engines using ontology-based contextual user profiles. An accurate representation of a user’s interests, generally stored in some form of user profile, is crucial to the performance of personalized search or browsing agents. The issues that need to be addressed in this process are how to build an accurate profile, particularly how to identify major versus minor concepts of interest to the users, and how to represent the profile. Profiles may be built explicitly, by asking users questions, or implicitly, by observing their activity. Because user interests change over time, we focus on implicit methods for constructing the profile that have the potential to adapt over time, reflecting changing user interests. User profiles are often represented by keyword/concept vectors or, however we build a weighted concept hierarchy. The main objective of our research is to create an accurate, ontology-based user profile without the user interaction. The algorithms which we evaluate here focus on an ontology-based user profile followed by investigations into ranking concepts in the profile by their importance and, ultimately, the improvement of the profile by removing unimportant concepts.

Adaptive Web alleviates the burden of information overload by tailoring the information presented based on an individual user’s needs. One of the key factors for accurate personalized information access is user context. By access of user context, user interests can be achieved. System that does not know who is asking for information and for what purpose, will never be able to provide more than very general answers.

PG Student

240

___________________________________ 978-1-4244 -8679-3/11/$26.00 ©2011 IEEE

Page 2: [IEEE 2011 3rd International Conference on Electronics Computer Technology (ICECT) - Kanyakumari, India (2011.04.8-2011.04.10)] 2011 3rd International Conference on Electronics Computer

While there are many factors that may contribute to the delineation of the user context, here we consider three essential elements that collectively play a critical role in personalized Web information access. These three independent but related elements are the user’s short-term information need, such as a query or localized context of current activity, semantic knowledge about the domain being investigated and the user’s profile that captures long term interests. Since the users’ interests change over time, we focus on implicit methods for incrementally creating an ontological representation of user profiles. Utilizing annotations, such as an interest score, has proven to be successful for the evaluation of personal ontology. After annotating each concept with a weight based on an accumulated similarity score, a user profile is created consisting of all concepts with nonzero weights.

Ontology is an explicit specification of concepts and

relationships that can exist between them. One increasingly popular method to mediate information access is through the use of ontology. Researchers have attempted to utilize ontology for improving navigation effectiveness as well as personalized Web search and browsing, specifically when combined with the notion of automatically generating semantically enriched ontology based user profiles. Since semantic knowledge is an essential part of the user context, we use domain ontology as the fundamental source of semantic knowledge in our framework. An ontological approach to user profiling has proven to be successful in addressing the cold-start problem in recommender systems where no initial information is available early on upon which to base recommendations. When initially learning user interests, systems perform poorly until enough information has been collected for user profiling. Using ontology as the basis of the profile allows the initial user behaviour to be matched with existing concepts in the domain ontology and relationships between these concepts. Our experimental results show the rate of increase in interest scores stabilizes over incremental updates. Initially, the interest scores for the concepts in the profile will continue to change. However, once enough information has been processed for profiling, the amount of change in interest scores should decreases and it can use as long term interest. In recent years, personalized search has attracted interest in the research community as a means to decrease search ambiguity and return results that are more likely to be interesting to a particular user and thus providing more effective and efficient information access

Our experimental results show that re-ranking the search results based on the interest scores and the semantic evidence in an ontological user profile. Successfully provides the user with a personalized view of the search results by bringing results closer to the top when they are most relevant to the user.

II. RELATED WORK

A. Ontology Ontology is a formal representation of knowledge as a

set of concepts within a domain, and the relationships

between those concepts [14]. It is defined as a "formal, explicit specification of a shared conceptualization"[13]. Basic building blocks of ontology design include:

� Classes or concepts � Properties of each concept describing various

features and attributes of the concept called slots (sometimes called roles or properties).

� Restrictions on slots called facets (sometimes called role restrictions).

Various researches have been done and are still going on in this field. One such system is Onto Seek [4], which is designed for content-based information retrieval from online yellow pages and product catalogs. OntoSeek uses simple conceptual graphs to represent queries and resource descriptions. The system uses the Sensus ontology [5]which comprises a simple taxonomic structure of approximately 70,000 nodes. The system presented in [6] uses Yahoo! as ontology. The system semantically annotates Web pages via the use of Yahoo! categories as descriptors of their content. The system uses Telltale [3] as its classifier. Telltale computes the similarity between documents using n-grams as index terms. The ontologies used in the above examples use simple structured links between concepts. A richer and more powerful representation is provided by SHOE [21]. SHOE is a set of Simple HTML Ontology Extensions that allow WWW authors to annotate their pages with semantic content expressed in terms of ontology.

B. Personalization Personalization is a broad field of active research.

Applications include personalized access to online information such as personalized “portals” to the Web, filtering/rating systems for electronic newspapers [2], Usenet news filtering, and recommendation services for browsing, navigation, and search. Usenet news filtering systems include PSUN [10], New T [9] and Site IF [11] aim to provide personalized search and navigation support. Many personalization projects have focused on navigation. Syskill & Webert [19] also recommends interesting Web pages using explicit feedback. If the user rates some links on a page, Syskill & Webert can recommend other links on the page in which they might be interested. In addition, the system can construct a Lycos query and retrieve pages that might match a user’s interest. Personal Web Watcher [20] is an individual system. It “watches over the user’s shoulder,” but it avoids involving the user in its learning process because it does not ask the user for keywords or opinions about pages.

Similar to our work [15] uses concept hierarchies for user profiles. In contrast, however, these are quite small (40 -600 nodes), and weight adjustments are done using data that explicitly describes document contents. It is doubtful that hand -made hierarchical content annotation of data will be done on a large scale.In order to build a user profile, some source of information about the user must be collected. Commercial systems, e.g., My Yahoo, explicitly ask the user to provide information about them which is simply stored to create a profile.

241

Page 3: [IEEE 2011 3rd International Conference on Electronics Computer Technology (ICECT) - Kanyakumari, India (2011.04.8-2011.04.10)] 2011 3rd International Conference on Electronics Computer

Explicit profile creation is not recommended because it places an additional burden on the user, the user may not accurately report their own interests, and the profile remains static whereas the user’s interests may change over time. Thus, implicit profile creation based on observations of the user’s actions is used in most recent projects. [Chan 00] describes the types of information available. His model considers the frequency of visits to a page, the amount of time spent on the page, how recently a page was visited and whether or not the page was bookmarked. Similar to our research, the user’s surfing behavior is used to create the user profiles in Letizia [7,8], Personal Web Watcher [20], and WBI [1].Our user profiling technique differs from other approaches due to our focus on automatically creating user profiles based on ontology’s. In our use of ontology’s, we overlap with initiatives aimed at creating a Semantic Web.

III. PROPOSED SYSTEM

A. System Description The proposal presented in this paper includes several

stages of data processing to automatically generate an ontology-based user profiles from a previously selected documents from users browsing history. The websites the user browses are spidered and classified using reference ontology derived from search engines like Google, or any other hierarchically arranged website. Here all sites are browsed using the user’s own ontology rather than a system supplied reference ontology. To create a personal ontology many factors are considered like the term, term frequency, URL’s visited, downloads done on a page and the time spent on a page . A profile construction algorithm is also used. The system then finds a mapping from the reference ontology concepts to concepts in the personal ontology based on query weights and file sizes. Using this mapping, the user can browse any site that has been characterized by his or her personal ontology without reclassifying the documents. Since the system will characterize every site in the same manner, and each user’s personal ontology reflects their view of the world, they will be able to browse Web pages in a personalized, consistent manner.

For the reference ontology, the documents were collected from general search results from a search engine like Google. For the personal ontology, the sample documents are provided by the user i.e. from the browsing history. The personal browsing system needs to map from reference ontology concepts to the best matching concept in the personal ontology. To do this, it must calculate the match value between each concept in the reference ontology and the concepts in the personal ontology. The goal of the mapping phase is to map every concept in the reference ontology to a concept in the personal ontology and also to map related websites to give more personalized search results.

Fig 1. System Architecture.

IV. PROFILE CONSTRUCTION ALGORITHM

A. Automatic Construction Of User Profiles In our system, the user profile is created automatically

and implicitly while the users browse. The user profile is essentially a reference ontology in which each concept has a weight indicating the perceived user interest in that concept. Profiles are generated by analyzing the surfing behavior of the user, specifically the content, length, downloads and time spent on each page they visit. No User feedback is necessary. The documents for each concept were merged to create a collection D containing one super -document per concept. The super -documents were pre -processed to remove high-frequency function words (stop words) and HTML tag. Finally, the Porter stemmer was used to reduce each word to its root and thereby decrease the dimensionality of the vectors used to represent each concept. The influence of the other factors in concept weight calculation: duration of the visit, page length, frequency of terms and downloads done are also investigated. Intuitively, if a user spends a long time on the page, their interest value in that page should be increased. Also, if the page is long, the time factor should be decreased since the increased time may be due to the amount of information presented, not the level of interest.

The algorithm we consider for creating user profiles consist of three phases, the initialization phase, where the terms extracted and an empty ontology is initialized. The first stage is to create a full ontology from narrower term relations. The full ontology is then pruned by eliminating unnecessary relations in the second stage.

242

Page 4: [IEEE 2011 3rd International Conference on Electronics Computer Technology (ICECT) - Kanyakumari, India (2011.04.8-2011.04.10)] 2011 3rd International Conference on Electronics Computer

Here profile term relevance (PTR) is calculated using

(1) Where

tf (i,j)=frequency of a term i in profile j. profile (i) =number of profiles in which the term i appears. |P|=total number of profiles.

|Term|=total number of terms in a profile j. Term (i)=number of usages of a term i in profile .

To find the relatedness degree we are going to consider the factors frequency of occurrence of a term, frequency of visits to a link and the downloads done on a page. Occurrence frequency q y

(2) Where,

f (ti, p) = frequency of a term in a user profile. fmax (p) = occurrence no of most frequent term in the profile.

The relatedness degree of two terms is calculated using (3)

The algorithm is as follows,

a) Definition and Initialization � T = {t1, t2 … tm} is a list of distinct terms

extracted from word net. � Occur-f is the occurrence frequency of a term in

a profile. � f-Usage is the usage frequency of a term by a

user. � f-vist –frequency of visits to a page i � In link (i) is the no. of in links to a page i. � Out link (i) no. of out links from a page i. � Dw (i)-no. of downloads from a page i.

� T (i) is the time spent on a page i. � 0 � � � 1 is the verge limit. � Ontology = {} is an empty ontology

description. b) First Stage: For each ti, tj � T and ti� tj

� Calculate RD (ti, tj) and RD (tj, ti) using the above equation.

� Select NT (tp, tq) subject to � RD (tp, tq) = arg max {RD (ti, tj), RD (tj, ti)}

� RD(tp, tq)��or RD(tp,tq) α

� �Add {NT(tp, tq), RD(tp,tq)} into Ontology � Add f-visit, in link(i), out link(i)and Dw(i)to the

Ontology c) Second Stage: For each NT (ti, tj) � Ontology

� �Find P = { NT(ti, tm1), NT(tm1, tm2), …., NT(tmn, tj) }

� �(tp, tq) = arg minNT {RD(tm,tn), NT(tm, tn) � P}

� �if RD(tj,ti) RD(tp,tq) then remove NT(ti, tj) and the related ,f-visit, in link(i), out link(i)and Dw(i) from the Ontology.

V. RESULTS AND DISCUSSION This section presents the experimental study used to

represent user profiles. The ontological representation of the user profile is based on the documents extracted based on their browsing history. In this experiment, only the profiles of users who provide a significant enough number of documents are built.

This is so, because profiles generated using smaller contributions are not representative for specific user preferences. In order to choose the best method to measure the relevancy of a concept in a user context, an evaluative analysis of the relevant concept set in terms of precision and recall have been carried out. This user has reported that the concepts selected by the algorithm, using the relatedness measure in formula (3) include the majority of the relevant terms of this trial.

TABLE I Relevant concepts for a registered user based on the

relatedness degree

Concept Relatedness degree

web mining 3.25 Games 2.77

Hardware 2.64

Verge limit (0.7), considered as threshold to eliminate unimportant concepts from the user ontology. Concepts with higher relatedness degree come first in the user ontology.

An alternative technique which is widely used to evaluate ontology’s is to compare them with another ontology that is deemed to be a benchmark. We analyse the extracted users’

Ontology’s according to the standards of: WorldNet In this analysis, we compared all relationships from ontology’s generated for the most relevant users with the Word Net semantic relationships: synonyms, coordinates, hyponyms and hyper nyms.

243

Page 5: [IEEE 2011 3rd International Conference on Electronics Computer Technology (ICECT) - Kanyakumari, India (2011.04.8-2011.04.10)] 2011 3rd International Conference on Electronics Computer

Fig 2.A Sample User Profile

The results of the extracted relationships quality are 1% to 2% precision and less than 1%recall. We also performed some tests using standard measures implemented in Word Net::Similarity .The measures are used to evaluate the ontology relationships and the results.

VI. CONCLUSION The flexible nature of ontology’s may support a wide

range of approaches to the information retrieval and filtering problems. The use of ontology’s to represent user profiles is proposed in this work. This approach has been applied for general web search. The users of this ontology have validated that the automatically obtained ontology’s represent their interests to a greater extent. Future work includes working with a large number of users and a greater set of documents. This expansion will also allow performing experiments with the update procedure. Further research is directed towards the task of improving the user profile quality, using a pruning process to avoid concepts which has no significance. It is also necessary to consider the information provided apriori by the user.

REFERENCES [1] R. Barrett, P. Maglio and D. Kellem. “How to Personalize the

Web.” In Proceedings of ACM CHI’97, Atlanta, USA, 1997. [2] P. Chesnais, M. Mucklo and J. Sheena.“The Fishwrap Personalized

News System.” In Proceedings of IEEE 2nd International Workshop on Community Networking: Integrating Multimedia Services to the Home. Princeton, NJ, June 1995.

[3] [Chower 96a] G. Chower and C. Nicholas. “Resource Selection in Café: an Architecture for Networked Information Retrieval.” In Proceedings of SIGIR’96 Workshop on Networked Information Retrieval, Zurich, 1996

[4] N. Guarino, C. Masolo, and G. Vetere,” OntoSeek: Content-Based Access to the Web.” IEEE Intelligent Systems, 14(3), May 1999, pp. 70-80.

[5] K. Knight and S. Luk.” Building a Large Knowledge Base for Machine Translation.” In Proceedings of American Association of Artificial Intelligence Conference (AAAI), 1999, pp. 773-778.

[6] Yannis Labrou, Tim Finin. “Yahoo! As An Ontology – Using Yahoo! Categories To Describe Documents.” In Proceedings of the 8th International Conference On Information Knowledge Management (CIKM), 1999, pp. 180-187.

[7] Henry Lieberman. Letizia: “An Agent That Assists Web Browsing”. In Proceedings of the 14th International Joint Conference On Artificial Intelligence, 1995, pp. 924-929.

[8] Henry Lieberman.“Autonomous Interface Agents”. In Proceedings of the ACM Conference on Computers and Human Interaction (CHI’97), May 1997.

[9] B. Sheth.” A Learning Approach to Personalized Information Filtering.” Master’s thesis, Massachusetts Institute of Technology, February 1994.

[10] H. Sorensen and M. McElligott. PSUN: “A Profiling System for Usenet News.” In Proceedings of CIKM’95 Workshop on Intelligent Information Agents, December 1995.

[11] A. Stefani and C. Strappavara. “Personalizing Access to Web Sites:” The Site IF Project. In Proceedings of the 2nd Workshop on Adaptive Hypertext and HypermediaHYPERTEXT’98, June 1998.

[12] Susan Gauch, Jason Chaffee, Alexander Pretschner: “Ontology-Based User Profiles for Search and Browsing.” The OBIWAN Project 2003.

[13] Ying Ding and Schubert Foo : “Ontology research and development. Part 1 - a review of ontology generation,” Journal of Information Science 2002

[14] Natalya F. Noy and Deborah L. McGuinness: “Ontology Development 101: A Guide to Creating Your First Ontology” 2001.

[15] Mateus Ferreira-Satler, Francisco P. Romero, Victor H. Menendez, Alfredo Zapata, Manuel E. Prieto : “A Fuzzy Ontology Approach to represent User Profiles inE-Learning Environments,” IEEE 2010

[16] Marek Reformat, Senior Member, IEEE and Sayed Koosha Golmohammadi : “Updating User Profile using Ontology-based Semantic Similarity”, IEEE 2010.

[17] H. M. Shirazi, M. M. Shirazi and N. Fardroo : “Discovering User Interest by Ontology-based User Profile”, International Journal of Intelligent Information Technology Application, 2009, 2(1):19-24.

[18] Ahu Sieg, Bamshad Mobasher, Robin Burke: “Learning Ontology-Based User Profiles: A SemanticApproach to Personalized Web Search” ,IEEE Intelligent Informatics Bulletin November 2007 Vol.8 No.1.

[19] Michael Pazzani, Jack Muramatsu, Daniel Billsus. Syskill & Webert:” Identifying Interesting Web Sites”. In Proceedings of the 13th National Conference On Artificial, 1996.

[20] Dunja Mladenic , “Personal web watcher- Design and implementation”,Report IJS-DP-7472, J. Stefan Institute, Department for Intelligent Systems, Ljubljana,Slovenia, 1998.

[21] S. Luke, L. Spector, D. Rager and J. Hendler. Ontology-Based Web Agents. In Proceedings of the First International Conference on Autonomous Agents (AA’97), 1997.

244