Applying web mining application for user behavior understanding

20
LOGO APPLYING WEB MINING APPLYING WEB MINING APPLICATION FOR USER APPLICATION FOR USER BEHAVIOR UNDERSTANDING BEHAVIOR UNDERSTANDING Dr. Zakaria Suliman Zubi Dr. Zakaria Suliman Zubi Associate Professor Associate Professor Computer Science Department Computer Science Department Faculty Of Science Faculty Of Science Sirte University, Libya Sirte University, Libya

description

 

Transcript of Applying web mining application for user behavior understanding

Page 1: Applying web mining application for user behavior understanding

LOGO

APPLYING WEB MINING APPLYING WEB MINING APPLICATION FOR USER APPLICATION FOR USER

BEHAVIOR UNDERSTANDINGBEHAVIOR UNDERSTANDING

Dr. Zakaria Suliman Zubi Dr. Zakaria Suliman Zubi

Associate Professor Associate Professor

Computer Science DepartmentComputer Science Department

Faculty Of ScienceFaculty Of Science

Sirte University, LibyaSirte University, Libya

Page 2: Applying web mining application for user behavior understanding

LOGO ContentsContents

Page 3: Applying web mining application for user behavior understanding

LOGO AbstractAbstractWeb usage mining (WUM) focuses on the discovering of potential knowledge from browsing patterns of the users. Which leads us to find the correlation between pages in the analysis stage.

The primary data source used in web usage mining is the server log-files (web-logs).

Browsing web pages by the user leaves a lot of information in the log-file. Analyzing log-files information drives us to understand the behavior of the user.

Web log is an essential part for the web mining to extract usage patterns and study the visiting characteristics of user.

Our paper focus on the use of web mining techniques to classify web pages type according to user visits.

This classification helps us to understand the user behavior.

We also uses some classification and association rule techniques for discovering the potential knowledge from the browsing patterns.

Page 4: Applying web mining application for user behavior understanding

LOGO ContentsContents

Page 5: Applying web mining application for user behavior understanding

LOGO

The Internet offers a huge, widely global information center for News, advertising, consume information, financial management, education, government, and e-commerce .

The aim of using web mining techniques for understanding user behavior is to profile user characteristics.

Web mining can be organized into three main categories: web content mining, web structure mining, and web usage mining.

INTRODUCTIONINTRODUCTION

Page 6: Applying web mining application for user behavior understanding

LOGO INTRODUCTION Cont..INTRODUCTION Cont..

1-Web content mining analyzes web content such as text, multimedia data, and structured data (within web pages or linked across web pages).

2 -Web structure mining is the process of using graph and network mining theory and methods to analyze the nodes and connection structures on the Web.

3- Web Usage Mining is a special type of web mining tool, which can discover the knowledge in the hidden browsing patterns and analyses the visiting characteristics of the users.

Web Mining

Web Usage Mining

Web Content Mining

Web Structure Mining

Page 7: Applying web mining application for user behavior understanding

LOGO INTRODUCTION Cont..INTRODUCTION Cont..

The Primary Data of Web Usage Mining

Fig 2:portion of a typical server logA standard log-file had the following formatremotehost; logname; username; date; request; status; bytes[ where:remotehost: is the remote hostname or its IP address;logname:is the remote log name of the user;  username: is the username with which the user has authenticated himself,date: is the date and time of the request,request: is the exact request line as it came from the client,status: is the HTTP status code returned to the client, andbytes: is the content-length of the document transferred.

1-Web server logs .2-Data about visitors of the sites.3-Registration forms.

Page 8: Applying web mining application for user behavior understanding

LOGO ContentsContents

Page 9: Applying web mining application for user behavior understanding

LOGO

Web usage mining is a complete process that includes various stages of data mining cycle, including Data Preprocessing, Pattern Discovery & Pattern Analysis.

Initially, at the data preprocessing stage web log is preprocessed to clean, integrate and transform into a common log.

In the pattern discovery: Data mining techniques are applied to discover the interesting characteristics in the hidden patterns.

Pattern Analysis is the final stage of web usage mining which can validate interested patterns from the output of pattern discovery that can be used to predict user behavior.

THE PHASES OF WEB USAGE MININGTHE PHASES OF WEB USAGE MINING

Page 10: Applying web mining application for user behavior understanding

LOGO THE PHASES OF WEB USAGE MININGTHE PHASES OF WEB USAGE MINING

Data Preprocessing Process

Data Cleaning:Data Cleaning:The log-file is first examined to remove irrelevant entries such as those that represent multimedia data and scripts or uninteresting entries such as those that belongs to top/bottom frames.

PageviewPageview Identification: Identification:Identification of page views is heavily dependent on the intra-page structure of the site, as well as on the page contents and the underlying site do-main knowledge. each pageview can be viewed as a collection of Web objects or resources representing a specific “user event,”.

SessionIdentification

UserIdentification

Pageview Identification

DataCleaning

Page 11: Applying web mining application for user behavior understanding

LOGO

Data Preprocessing Process

User Identification:User Identification:Since several users may share a single machine name, certain heuristics are used to identify users . We use the phrase user activity record to refer to the sequence of logged activities belonging to the same user. Session Identification:Session Identification: Aims to split the page access of each user into separated sessions. It defines the number of times the user has accessed a web page and time out defines a time limit for the access of particular web page for more than 30 minutes if more the session will be divided in more than one session.

Sample of user and sessions identification

THE PHASES OF WEB USAGE MININGTHE PHASES OF WEB USAGE MINING

Page 12: Applying web mining application for user behavior understanding

LOGO THE PHASES OF WEB USAGE MININGTHE PHASES OF WEB USAGE MINING

Pattern Discovery Process:Pattern Discovery Process:Discovering user access pattern from the user access log files is the main purpose of using web usage mining .

Association Rule Mining:Association Rule Mining:Association rule mining discovery and statistical correlation analysis can find groups of web pages types that are commonly accessed together (Association rule mining can be used to discover correlation between pages types found in a web log) this technique is applied to user and session identification consisting of item where every item represents a page type ,we will also use Apriori algorithm to find the correlation between pages based on the confidence and support vectors.

What are the set of pages type frequently accessed together by the web users. e.g (Sport, News, Social)What the page type will be fetched next. e.g Entertainment

Page 13: Applying web mining application for user behavior understanding

LOGO THE PHASES OF WEB USAGE MININGTHE PHASES OF WEB USAGE MINING

ClassificationClassificationClassification techniques play an important role in Web analytics applications for modeling the users according to various predefined metrics.

In the Web domain, we are interested in developing a profile of users belonging to a particular class or category . This requires extraction and selection of features that best describe the properties of a given class or category.

We will focus also on k-nearest neighbor (K-NN) which was considered as a predictive technique for classification models. Whereas;

k represents a number of similar cases or the number of items in the group.

Page 14: Applying web mining application for user behavior understanding

LOGO THE PHASES OF WEB USAGE MININGTHE PHASES OF WEB USAGE MINING

Pattern Analysis Process:Pattern Analysis Process:In this stage of process the discovered patterns will further processed ,filtered ,possibly resulting in aggregate user models that can be used as a visualizations tools ,the next figure summarizes the whole process:

Page 15: Applying web mining application for user behavior understanding

LOGO ContentsContents

Page 16: Applying web mining application for user behavior understanding

LOGO RESULTS OF USING ASSOCIATION RULESRESULTS OF USING ASSOCIATION RULES

Log-file in a flat file format.Import log-file database to our implemented application.

Page 17: Applying web mining application for user behavior understanding

LOGO

Extract the transactional database of web sever log for every user where every transaction represents a session.

Find the association rules of user behavior after applying the Aprori algorithm to the transactional database of the user.

RESULTS OF USING ASSOCIATION RULESRESULTS OF USING ASSOCIATION RULES

Page 18: Applying web mining application for user behavior understanding

LOGO ContentsContents

Page 19: Applying web mining application for user behavior understanding

LOGO CONCLUSION CONCLUSION We used web data that contained all the information about the user. When the user leaves accessing the web pages. This data is called web logs or (server-logs)

A statistical methods such as classification, association rule mining discovery and statistical correlation analysis which can find groups of web pages types that are commonly accessed together are applied as well.

Classification is used to map the data item into one of several predefined classes. The class will belongs into one category such as sport or politics or education or..etc. We also uses the k-nearest neighbor (K-NN) algorithm as a common classification method to select the best class.

Association rule mining was used to discover correlation between sites types found in a web log.

The implemented application program was designed in C# programming language.

Page 20: Applying web mining application for user behavior understanding

LOGO

Any Questions????Any Questions????