[IEEE 2013 International Conference on Information Technology and Electrical Engineering (ICITEE) -...

6
News Recommendation in Indonesian Language Based on User Click Behavior Diandra Mayang Desyaputri 1 , Alva Erwin 1 , Maulahikmah Galinium 1 , Didi Nugrahadi 2 I paculty of Engineering and Information Technology Department of Information Technology Swiss German University, BSD, Tangerang, Indonesia {diandra.desyaputri@student., alva.erwin@, maulahikmah.galinium@}sgu.ac.id 2 B . entagar Jakarta, Indonesia [email protected] Abstract-Recommendation system has been proposed for years as the solution of information era problem. This research strives to develop an intelligent recommendation system based on user click behavior on news websites. We extracted frequent itemsets and association rules from the web server log of a news website, performed a pre-computation of similarity between news articles, and then proposed a three-level recommendation system: based on association rule discovery, news articles on the same category, and similarity between news articles. By combining collaborative filtering approach and content-based filtering, experiment results show that the technique produces reliable news recommendation. Keywords- web usage mining, association rules, similari, news recommendation I. INTRODUCTION Supported by the fast-paced growth rate of technology that enables users to easily access worldwide news information, online news websites from around the world has become very popular. The enormous amount of information available have forced users to face information overload, such as overwhelm- ing volumes of articles. Users have to endure tremendous volumes of information to find their desired articles. There are some cases where articles are inaccessible by users because it has been diminished behind the great load of information. From these large sources of data and information, users like to ask for recommendations from friends, family, partners, the community, trusted people, or specialists to obtain the desired items. However, their own knowledge about particular items are limited, and yet, new articles, new websites, new blogs, new items are emerging as the clock ticks. An automatic recommendation by the intelligent system has been proposed for years as the solution for this information era problem. Although the concept of recommendation has been intro- duced more than ten years ago, the technologies for support has developed greatly recently. Because of its great impact in industries as well, primarily in marketing, recommender systems have spread widely on the Internet. Not only e- commerce websites use such systems, but it has also become common for news sites or online news providers to adopt a recommendation system to generate news articles that would keep users reading. An interesting point about user behavior in searching news websites that may affect the recommendation system is the fact that users access news websites with a 'show me something interesting' mindset, rather than knowing what they want to read beforehand. To generate recommendations for a particular user based only from his/her past news preferences, the result will be biased only to those topics. Moreover, there is a problem for new users who haven't had any transaction logs and the system can't recommend articles based on their previous activities. This problem is called 'cold start problem '. It is also a responsibility of the system to balance the wants and needs of users, and the fast-changing news everyday. A "related news" column is oſten found in news websites, which links one article to others to keep users reading. This is usually done by the editor who needs to manually select interesting news from many articles that keep increasing ev- eryday. Moreover, news is changing every second that it can be overwhelming to do this task by using manual effort. Given the problems above, this research strives to develop an intelligent recommendation system based on user click behavior on news websites. The proposed system of this study will be implemented on an online Indonesian news provider. Therefore the web usage log will be provided by that Indonesian news provider, and the recommended articles will be generated from c1ickstream data as the basis. To provide a detailed description, this paper is organized into 5 chapters. Chapter 1 serves as the introductory part of this study. This chapter describes the background, the general idea and problems of today's current state of information. Chapter 2 narrates related works of other research in this topic. Chapter 3 presents the proposed system and explains about the detailed steps. Chapter 4 presents the research findings and Chapter 5 describes the conclusion. 978-1-4799-0425-9/13/$31.00 ©2013 IEEE

Transcript of [IEEE 2013 International Conference on Information Technology and Electrical Engineering (ICITEE) -...

News Recommendation in Indonesian Language

B ased on User Click B ehavior

Diandra Mayang Desyaputri 1, Alva Erwin 1, Maulahikmah Galinium 1, Didi Nugrahadi2

Ipaculty of Engineering and Information Technology Department of Information Technology

Swiss German University, BSD, Tangerang, Indonesia {diandra.desyaputri@student . , alva.erwin@, maulahikmah.galinium@} sgu.ac . id

2B .

entagar Jakarta, Indonesia

didinugrahadi@gmail .com

Abstract-Recommendation system has been proposed for years as the solution of information era problem. This research strives to develop an intelligent recommendation system based on user click behavior on news websites. We extracted frequent item sets and association rules from the web server log of a news website, performed a pre-computation of similarity between news articles, and then proposed a three-level recommendation system : based on association rule discovery, news articles on the same category, and similarity between news articles. By combining collaborative filtering approach and content-based filtering, experiment results show that the technique produces reliable news recommendation.

Keywords- web usage mining, association rules, similarity,

news recommendation

I . INTRODUCTION

Supported by the fast-paced growth rate of technology that enables users to easily access worldwide news information, online news websites from around the world has become very popular. The enormous amount of information available have forced users to face information overload, such as overwhelm­ing volumes of articles . Users have to endure tremendous volumes of information to find their desired articles . There are some cases where articles are inaccessible by users because it has been diminished behind the great load of information.

From these large sources of data and information, users like to ask for recommendations from friends, family, partners, the community, trusted people, or specialists to obtain the desired items. However, their own knowledge about particular items are limited, and yet, new articles, new websites, new blogs, new items are emerging as the clock ticks. An automatic recommendation by the intelligent system has been proposed for years as the solution for this information era problem.

Although the concept of recommendation has been intro­duced more than ten years ago, the technologies for support has developed greatly recently. Because of its great impact in industries as well, primarily in marketing, recommender systems have spread widely on the Internet. Not only e­commerce websites use such systems, but it has also become

common for news sites or online news providers to adopt a recommendation system to generate news articles that would keep users reading.

An interesting point about user behavior in searching news websites that may affect the recommendation system is the fact that users access news websites with a ' show me something interesting' mindset, rather than knowing what they want to read beforehand. To generate recommendations for a particular user based only from his/her past news preferences, the result will be biased only to those topics . Moreover, there is a problem for new users who haven't had any transaction logs and the system can't recommend articles based on their previous activities . This problem is called ' cold start problem' . It is also a responsibility of the system to balance the wants and needs of users, and the fast-changing news everyday.

A "related news" column is often found in news websites, which links one article to others to keep users reading. This is usually done by the editor who needs to manually select interesting news from many articles that keep increasing ev­eryday. Moreover, news is changing every second that it can be overwhelming to do this task by using manual effort.

Given the problems above, this research strives to develop an intelligent recommendation system based on user click behavior on news websites . The proposed system of this study will be implemented on an online Indonesian news provider. Therefore the web usage log will be provided by that Indonesian news provider, and the recommended articles will be generated from c1ickstream data as the basis.

To provide a detailed description, this paper is organized into 5 chapters. Chapter 1 serves as the introductory part of this study. This chapter describes the background, the general idea and problems of today's current state of information. Chapter 2 narrates related works of other research in this topic. Chapter 3 presents the proposed system and explains about the detailed steps. Chapter 4 presents the research findings and Chapter 5 describes the conclusion.

978-1-4799-0425-9/13/$31.00 ©2013 IEEE

II. RELATED WORK

As users browse the Internet, their actions and movements are recorded in the web server logs. From these web server logs, patterns can be extracted which can create a user model based on the articles that they have read in the past. By ana­lyzing the navigational pattern of user behavior, for example in a news website, we can discover the preferences of that particular user and later suggest news articles to him.

Web Usage Mining [ 1 ] is a technique for extracting knowledge from patterns of users ' behavior, such as user access data collected from web log files.

However, the data collected from various sources, including the web server, is not always complete and ready to be analyzed. On the contrary, raw data is usually inaccurate, erroneous, and incomplete. This may lead to misleading and incorrect information. For that reason, data preprocessing tasks are required to be performed. In recent years, researchers [2] [3] [4] are continuously finding intelligent ways to perform data preprocessing and preparation on web log file .

There are also numerous studies done by various re­searchers in data mining field regarding the use of association rule in recommendation system and web usage mining. A study by Peng [5] discovered association rules from web usage data by using FP-Growth algorithm. He combined the interest measure and website users to view the topology of the best access time of the user in order to optimize the interesting association rules . Mobasher et al. [6] proposed a scalable framework for Web personalization based on association rule discovery. They presented a data structure to store discovered frequent item sets and used a fixed- size sliding window to capture the current session of a user. As a result, this framework is able to produce recommendations efficiently in real-time. Another recommendation system based on association rule mining is proposed by Lin et al. [7] , which requires no minimum support to be specified in advance . They proposed an algorithm that automatically adjusts the minimum support so that the number of rules generated likes within a specified range.

In another study [8], association rules are applied to extract potentially useful knowledge from web usage data. They use Weka, a data mining software to discover association rules in web log data. A recent study in recommendation system based on association rules showed an improved result in performance and efficiency by combining association rule mining with user clustering [9] .

The generation of association rules is done online in another study [ 1 0] . Due to the dimension of log files and high processing time, the first two phases are being achieved offline, in a batch process, while the recommendation phase usually performed online and real-time. Once the session has been processed, rules extracted and inserted, it will be deleted from the sessions table. As a result, the system has been a scalable model for recommendation since it can work with large datasets in real-time.

Recommendation aims to timely provide suitable and valu­able information according to users ' demand, and such infor­mation will be used as reference for supplementing decision-

making [ 1 1 ] . there are several common and widely used technologies for personalization and recommendation systems. One of them is Content-based method (CB). In Content- based filtering, the user will be recommended items similar to the ones that the user preferred in the past. Meanwhile, in Collaborative filtering (CF), the user will be recommended items that similar people will prefer in the past [ 12 ] .

Many proposed Hybrid method, that i s the combination of Content-based method and Collaborative filtering. This method has been experimented in various studies with various source of data, and has proven to be more effective since it provides satisfying results, with improving the quality of recommendations, as shown in [ 1 3 ] .

A scalable two-stage personalized news recommendation approach with a two-level representation [ 14] considers the exclusive characteristics of news items when performing rec­ommendation. The first level contains various topics relevant to users ' preference, and the second level includes specific news articles . They also presented a principled framework for news selection based on user interest with a good balance between the novelty and diversity of the recommended result.

Various methods to measure similarity between news arti­cles for a news recommender system have been researched by Tintarev and Masthoff [ I 5] . Commonly, news recommendation is performed using the combination of TF-IDF technique and cosine similarity measure [ 1 6] . However, they proposed two new approaches : SF-IDF (Synset Frequency - Inverse Document Frequency) that is similar to TF-IDF but uses WordNet synonym set, and Semantic Similarity by combining five semantic similarity measures . Cosine similarity approach for content-based news recommendation has also been researched by Kompan and Bielikova [ 1 7] .

In this research, we combined the methods that have been studied before and developed a news recommendation system in Indonesian language. The recommended news generated are from the discovery of association rules from web log files, which is one of the collaborative filtering approach in recommendation system, and also from the pre-computed similarity between news articles, which is one of the content­based approach. Furthermore, we analyze the most popular news articles in the associated hashtag and append it to generate more solid news recommendation.

III. PROPOSED SYSTEM

Web recommendation system in general is composed of 3 phases: data preparation and transformation, pattern discovery, and recommendation [6] . Figure 1 depicts the process of the proposed system.

Data is retrieved from the web server, in form of web server access logs that contain every transaction performed on the server. Then, data cleaning is performed to improve the accuracy of the recommendation result. Data filtering is then performed to make sure that the data processed is indeed necessary.

Session identification is executed to complete the data preparation process. After that, the pattern discovery phase

may begin, which has three levels : association rule discovery, news discovery from associated #hashtag, and discovery of similarity measure of news articles . Based on those three levels, the recommender systems generate the recommended articles.

Web Server Log I--t-t

Data collectIon

Association Rule Associated 'hashtag Discovery News Discovery

Recommendation Result

Fig. I . Proposed System

A. Data Preprocessing

Slm llar�v 01 News Recommendation

There are three steps in data preprocessing used in this study: data cleaning, data filtering, and session identification.

Every time a user browses a website, the web browser downloads an HTML document. The resources embedded in HTML files such as images, style files Uavascript files, css files) are also automatically downloaded without the user explicitly requesting it. They might be advertisements that have no relationship with the content of the web page . These actions are recorded in the web log file, therefore making it extremely congested and filled with information unrelated to the content of the news articles.

Therefore, all data on the web log server that is not needed for processing is cleaned off, such as image files, javascript files, multimedia files, style files, spider requests, error requests, and web robot requests . The records with HTTP status codes over 299 or under 200 are also removed, since only records with HTTP status codes between 200 and 299 give successful response . The records with POST or HEAD method also should be removed since transactions from the user is the only relevant information that is needed to be processed.

Data filtering consists of filtering the data needed to use in this study. Since we need to generate only the title of news articles from the web log files, we cleaned off other unnecessary records, such as category URLs, homepage URL, and other pages such as the login and registration page .

The next step in data preparation is session identification. Session identification is done by using Rapidminer, by trans­forming the records of web log files into set of sessions. Rapid­miner is an open-source data mining and data analytics tool which provides data mining and machine learning procedures .

B. Pattern Discovery

After the web logs are cleaned and sessions are identified, the process of pattern discovery can be performed. From each segmented and fragmented web log files, frequent item sets and association rules are generated, and then the results are combined.

Association rule mining, which is one of the data mining techniques, discovered unordered correlations between items found in a database of transactions [ 1 8] . These interesting rela­tionships can be represented in the form of association rules or frequent item sets [ 1 9] . First introduced by [20] , the association rule mining algorithms discovers all item associations or rules in the data that satisfy the user-specified constrains : minimum support, or called minsup, and minimum confidence, or called minconf [2 1 ] .

Frequent item sets are defined as "group o f items occurring frequently together in many transactions" [22] . We use FP­growth to generate the frequent item sets . The implementation is also done in Rapidminer. Because the FP-growth algorithm implementation in Rapidminer require the data to be in bino­mial form, further data preprocessing is necessary after session identification.

Given a set of items I = {i l ,i2, . . . ,ik } , where each i is an item or news articles, and let T = {tl ,t2, . . . ,q be a set of all transactions where each transaction t is a set of items such that t !; I . Association rules, on the other hand, capture relationships among pageviews based on the navigational patterns of users [6] . An association rule is a rule of the form X - Y, where X and Y are subsets of I and X n Y = 0 . Association rules are mainly defined by two metrics : support and confidence. Support is the percentage of transactions that contain both X and Y among all transactions in data set, while confidence is the percentage of transactions that contain Y among transactions that contain X [7] .

An example of association rule is as follows:

Example 1 . [/p/bingkisan-ulang-tahun-dari-google-play] -[/p/perubahan-desain-google-dahului-facebook] (support: 0 .003, confidence : 0 .750)

This rule says that 0.03% of transactions contain these two news articles, and for 75% transactions that contain the news article "Bingkisan Ulang Tahun dari Google Play" (Birthday present from Google Play) also contains the news article "Perubahan Desain Google Dahului Facebook" (Google+ pre­ceded Facebook in design changes). This means that users who

read the former article also read the latter article 75% of the time.

C. News Recommendation System

In this phase, a recommendation system will generate recommended news articles as related news for every news article automatically. The recommended news is generated in three stages.

J) Recommended News from Association Rule Discovery: First, the recommended news is generated from frequent itemsets . Sets of recommended news will be generated based on the mined associative patterns.

However, there are problems regarding this approach. For the recently developed news provider websites, the lack of "related news" section in every news article has resulted in a high number of bounce rate. As a result, there are few news articles in each session. Few association rules can be generated, and not every news article has associative rules of other news articles.

To generate accurate association rules, each session ideally has around 5-7 news articles. Yet each session in this Indone­sian news provider has only around 2-3 news articles based on our study. In order to increase the session length, we use the second level of the proposed news recommendation system.

2) Recommended News from Associated #hashtag: It is now common to have at least one #hashtag in every news article. A hashtag is considered as one category or subcategory. It may represent the broader level of topics, such as Politics, Music, or Technology; and it may also represent the more specific level topics such as Special News, Kpop, and Gadget.

One news article may have more than one hashtag, but at least must have one hashtag. These hashtags are useful to deliver related articles as recommendation since it cluster the news articles based on category.

To increase the user' s involvement in the news provider website and to reduce bounce rate, recommended news articles in "related news" section will be generated by identifying the hashtags inside a news article, then suggest five news articles of the most popular articles in that category which is represented by the hashtag. Since a news article may have more than one hashtag, there might be more than five recommended articles . The number of recommended news articles will be selected based on the number of hashtags a news article has, and is designed to fit the related news section in each news article .

3) Recommended News from Similar News: The third level of recommendation system is done by generating recom­mended news articles based on the similarity between them. Similarities between the title of news articles are being pre­computed by using semantic analysis, cosine similarity to be exact. Few algorithms to compute sentence similarity had been researched by [23] and semantic analysis has been chosen in this paper because of its robust performance and fast computation. The result of similarity between two article titles by using semantic analysis gives a higher rating compared to similarity using statistical approach.

IV. RESEARCH FfNDfNGS

We analyzed the web log files of an online Indonesian news provider from 1 October 20 12 to 26 March 20 1 3 . By doing data cleaning, the large size of raw web log files which was 980 . 1Mb has been successfully reduced to 93 .3Mb. However, the time taken to process the reduced file size is still unreasonably long, and therefore, the computer used to execute this process ran out of memory before the pattern extraction phase had been finished. Therefore, the cleaned web log files are then segmented based on months.

After being segmented, it turns out that segmented log files are varied in size, because it depends on the number of transactions that has been done in that one month. Each log file is then segmented again per day. Each file then will be around 400Kb- 1 .5Mb in size . The data will be easy and quick to be processed. The sample web log files after data cleaning is represented in Table I .

TABLE I . THE WEB LOG FILES AFTER DATA CLEANING

IP Address Date Time URL Referrer

071Mar/20 1 3 : /p/spam-bisa-9 1 .201 .64.24 00:06 : 1 6 berbuah-trending- -

+0700 topic 07/Mar/20 1 3 : /p/ceo-apple-

65 52.0 .95 00 :08 :57 tertarik-teknologi- -+0700 beats

07 /Mar/20 1 3 : /p/jokowi-

223 .255 .226 I O 00 : 1 3 : 1 4 masyarakat-tak-

http ://beritagar.com suka-pejabat-

+0700 eksklusif

071Mar/20 1 3 : /p/diet-yang-tepat-223 .255 .226 I O 00 : 1 3 : 4 1 http ://beritagar.com

+0700 untuk-pria

Sample of the generated aSSOCiatIOn rules are presented below. These association rules are generated from the frequent item sets which its samples are presented in Table II.

TABLE II . SAMPLE OF THE GENERATED FREQUENT ITEMSETS

Support Item I Item 2 Item 3

0 003 /p/konflik-sabah- /p/bisnis-batu-bata-paksa-tki-pulang lesu-akibat-hujan

/p/pesepeda- /p/bingkisan-ulang-/pl7-maret-gaya-

kotakkotak -0 003 bawa-instagram- tahun-dari-google-mondriaan-dan-

keliling-dunia play mondrian

/p/pesepeda- /p/bingkisan-ulang- /p/perubahan-0 003 bawa-instagram- tahun-dari-google- desain-google-

keliling-dunia play dahului-facebook /p/bingkisan- /p/perubahan- /pl7-maret-lagu-

0 003 ulang-tahun-dari- desain-google- amal-we-are-the-google-play dahului-facebook world

/p/film-berwarna- /pl7-maret-/p/bi sni s-batu-bata-0 003 pertama-buatan- mochtar-Iubis-dan-

lesu-akibat-hujan indonesia mac

• [/plbingkisan-ulang-tahun-dari-google-play] ­[/p/perubahan-desain-google-dahului-facebook] (confidence : 0 .750)

• [/plbingkisan-ulang-tahun-dari-google-play] ­[/p/pesepeda-bawa-instagram-keliling-dunia, /p/perubahan­desain-google-dahului-facebook ] (confidence : 0 .778)

• [/plbayi-sehat-tidak -butuh-suplemen-selain-asi] ­[/p/bahaya-air-putih-untuk-bayi, /p/pengaruh­makananminuman-ibu-pada-asi, /p/relasi-asi-dan- berat-badan­anak] (confidence : 0 . 857)

• [/p/facebook-juga-akan-pakai-hashtag] - [/p/chris­messina-si-penemu-hashtag, /p/akun-dong-hae- diretas-elf­bereaksi] (confidence : 0 .857)

The english translation of association rules above are:

• [birthday present from google play] - [google+ preceded facebook in design changes] (confidence : 0 .750)

• [birthday present from google play] - [cyclist carries instagram around the world, google+ preceded face- book in design changes] (confidence : 0 .778)

• [healthy baby doesn't need any suplement besides mother 's breast milk] - [drinking water hazards for baby, impact of mother 's meal to her baby, relation- ship between mother 's breast milk and child' s weight] (confidence : 0 .857)

• [facebook will use hashtag too] - [chris messina, the founder of hashtag] , [dong hae ' s account is hacked, elf reacted] (confidence : 0 . 857)

We perform an experiment of assigning support and confidence to be applied to the FP-growth algorithm in order to generate association rules . The experiment is done to select the suitable amount of support and confidence such that the association rules generated are optimal. We used a sample dataset that contains 2572 lines (records) of log files. The result of the experiment is presented in Table III.

TABLE Ill. NUMBER OF ASSOCIATION RULES GENERATED

Confidence Support 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.01 127 84 47 35 31 23 19 16 6 6 0.02 20 14 6 4 2 1 - - - -

0.03 2 I - - - - - - - -

0.04 - - - - - - - - - -

0.05 - - - - - - - - - -

0.06 - - - - - - - - - -

0.07 - - - - - - - - - -

0.08 - - - - - - - - - -

0.09 - - - - - - - - - -

To recommend news articles in the related news section of "Bingkisan ulang tahun dari Google Play" (Birthday present from Google Play) based on the popularity of news articles in associative #hashtag, we identify the #hashtag associated with that news article and the date it was created. Since the #hashtags are #tekno and #gadget, we find the most popular news articles in those two #hashtags or categories .

The similarity between news articles are done by compar­ing a particular news article to all news articles available for the time being. Table IV presents the similarity comparison of sample news articles to "Bingkisan ulang tahun dari Google Play" (Birthday present from Google Play), sorted by the rate of similarity between news articles .

TABLE IV. SIMILARITY OF SAMPLE NEWS ARTICLES

News Title Similarity Result perubahan-desain-google-dahului-facebook 0 .8613972559064530

google-Iuncurkan-sistem-peringatan-tsunami-di-jepang 0 .8376697 156768023

picasa-dileburkan-ke-dalam-google 0 .834722 19 19284370

bahaya-air-putih-untuk-bayi 0 .8308834468966 150

akun-dong-hae-diretas-elf-bereaksi 0 .82762630 19935370

jokowi-masyarakat-tak-suka-pejabat-eksklusif 0 .8090434224276720

bayi-sehat-tidak-butuh-suplemen-selain-asi 0 .7979 124279015620

tips-memilih-celana-denim 0 .78043 12953296 150

arti-penting-foto-jurnalistik 0 .7794151494702410

facebook-juga-akan-pakai-hashtag 0 .7726957209327540

keseharian-warga-permukiman-kumuh 0 .729 183 1 88 1 157720

Recommended news articles in the "related news " section: For a news article titled: "Bingkisan ulang tahun dari

Google Play" (Birthday present from Google Play), the rec­ommended news articles are:

• Perubahan desain Google+ dahului Facebook

• Pesepeda bawa Instagram keliling dunia

• Google luncurkan sistem peringatan tsunami di Jepang

• Picasa, dileburkan ke dalam Google+

The english translation of the recommended news articles above are :

• Google+ preceded Facebook in design changes

• Cyclist carries Instagram around the world

• Google published tsunami warning system in Japan

• Picasa, merged with Google+

V. CONCLUSION This study proposed a news recommendation system which

is implemented in one of Indonesia ' s online news website . We propose a three-level approach news recommendation system: based on association rule discovery, the most popular news articles in the same category, and similarity between news items.

This research shows that a three-level approach proposed showed promising results as recommended news. However, tweaking the system is necessary, such as adjusting the support and confidence for optimizing the association rule mining algorithm. By combining collaborative filtering approach and content-based filtering, experiment results show that the technique produces reliable news recommendation In Indonesian language.

ACKNOWLEDGMENT

The authors would like to thank Fakhrizal Lukman, a front­end web and mobile developer in the Indonesian news provider who has supported us in our research.

REFERENCES

[ I ] Cooley, R . , Mobasher, B . , Srivastava, J . , "Web mining : Information and pattern discovery on the world wide web," in International Conference on Tools with Artificial Intelligence, pp. 558-567, IEEE, 1997.

[2] V. Chitraa and A. S. Davamani, "A survey on preprocessing methods for web usage data," International Journal of Computer Science and Information Security, vol. 7, No. 3 , 20 10 .

[3] G. T . Raju and P. S . Satyanarayana, "Knowledge discovery from web usage data: Com- plete preprocessing methodology," International Journal of Computer Science and Network Security, vol. 8 No. 1 , 2008 .

[4] M. Helmy, A. Wahab, M. Norzali, H. Mohd, H. F. Hanafi, M. Farhan, and M. Mohsin, "Data pre-processing on web server logs for general ized association rules mining algorithm," December 2008 .

[5] H. Peng, "Discovery of interesting association rules based on web usage mining," in 2010 International Conference on Multimedia Communica­tions (Mediacom), pp. 272-275, IEEE, 20 10 .

[6] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, "Effective personaliza­tion based on association rule discovery from web usage data," in 3rd International Workshop on Web Information and Data Management, WTDM ' 0 1 , (Atlanta, Georgia, USA), pp. 9-1 5, ACM, November 9, 200 1 200 1 .

[7] W . Lin, S .A. Alvarez, C . Ruiz, "Collaborative recommendation via adaptive association rule mining," Data Mining and Knowledge DiscovelY, 2000.

[8] M. Dimitrijevic and Z. Bosnjak, "Discovering interesting association rules in the web log usage data," Interdisciplinary Journal of Informa­tion, Knowledge, and Management, vol. 5, pp. 191-207, 2010 .

[9] H. Liu and X. Liu, "A personalized recommendation system combining user clustering and association rules with multiple minimum supports," in 2nd International Conference on Future Computers in Education, 20 12.

[ 1 0] D. Mican and N. Tomai, "Association-rules-based recommender system for personalization in adaptive web-based applications," in 10th Inter­national Co1iference on Web Engineering, ICWE' I 0, (Vienna, Austria), pp. 85-90, Springer-Verlag, July 2010 .

[ I I ] A.M. Rashid, I . Albert, D . Cosley, S .K. Lam, S . M. McNee, J. A. Konstan, J. Riedl, "Getting to know you: Learning new user preferences

in recommender systems," in International Co1iference on Intelligent User Intelfaces, pp. pp. 127- 1 34, 2002 .

[ 12] L. M. de Campos, J. M. Fernandez-Luna, J. F. Huete, M. A. Rueda­Morales,"Combining content-based and collaborative recommendations: A hybrid approach based on bayesian networks," International Journal of Approximate Reasoning, vol. 5 1 , pp. 785-799, 2010 .

[ 1 3 ] J. Liu, P. Dolan, and E. R. Pedersen, "Personalized news recommen­dation based on click behavior," in 15th International Co1iference on Intelligent User Intelfaces, lUI ' 10 , (Hong Kong, China), pp. 3 1-40, ACM, February 7- 10 2010 .

[ 14] L. Li , D. Wang, T. Li , D. Knox, and B . Padmanabhan, "Scene : a scalable two-stage personalized news recommendation system," in Proceedings of the 34th international ACM SIGIR conference on Research and development in 11iformation Retrieval, SIG IR ' I I , (New York, NY, USA), pp. 125-134, ACM, 20 1 1 .

[ 1 5] N . Tintarev and J . Masthoff, "Similarity for news recommender sys­tems," in In Proceedings of the AHO 06 Workshop on Recommender Systems and Intelligent User Interfaces, 2006.

[ 1 6] M. Capelle, F . Frasincar, M. Moerland, and F . Hogenboom, "Semantics­based news recommendation," in Proceedings of the 2nd International Co1iference on Web Intelligence, Mining and Semantics, WIMS ' 12, (New York, NY, USA), pp. 27: 1-27 :9, ACM, 20 12 .

[ 1 7] M. Kompan and M. Bielikova, "Content-based news recommendation," in 11th International Conference, EC-Web, (Bilbao, Spain), Springer Berlin Heidelberg, September 1-3 20 10 .

[ 1 8] R. Cooley, B . Mobasher, J. Srivastava, "Data preparation for mining world wide web browsing patterns," Knowledge and Information Systems, pp. 5-32, 1999.

[ 19] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Pearson, 2006.

[20] R. Agrawal, T. Imielinski, A. Swami, "Mining association rules between sets of items in large databases," SIGMOD, pp. 402-4 1 1 , 1993 .

[2 1 ] B. Liu, W. Hsu, Y. Ma, "Mining association rules with multiple minimum supports," in SIGKDD International Co1iference on Knowledge DiscovelY & Data Mining, ACM, August 1 5- 1 8 1999.

[22] R. Agrawal and R. Sri kant, "Fast algorithms for mining association rules in large databases," in 20th International Co1iference on Very Large Data Bases, VLDB '94, (San Francisco, CA, USA), pp. 487- 499, Morgan Kaufmann Publishers Inc . , 1994.

[23] P. P . Tardan, A. Erwin, K. l. Eng, and W. Muliady, "Automatic text summarization based on semantic analysis approach for documents in bahasa indonesia," in Under Review for International Conference of 11iformation Technology and Electrical Engineering, 20 1 3 .