WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data...

WebKDD 2001 Aristotle University of Thessaloniki 1

Effective Prediction of Web-user Accesses: A Data Mining Approach

Nanopoulos AlexandrosKatsaros Dimitrios

Yannis ManolopoulosAristotle Univ. of Thessaloniki,

Greece

Presentation:Spyros Papadimitriou, Carnegie Mellon Univ.


Introduction (1/2)

• Web Prefetching: Deducing forthcoming user accesses based on log information

• Focus on:– Predictive prefetching (use of history)– Server initiated (server makes

predictions and piggybacks them to the clients)


• Within a site, users navigate following links [5]

• For server-initiated predictive prefetching interest is for access patterns reflecting this behavior

Introduction (2/2)


• Motivation & Related work• Proposed method• Comparative performance

evaluation• Conclusions

Outline




Presentation Outline


• Site structure and contents impose1. The order of dependencies (first or higher)

among the documents2. The interleaving of documents belonging

to patterns with random visits (noise)

• Discovered patterns should respect these factors

Requirements


• Dependency graph (DG) [9]– A graph maintains pairwise accesses

• Prediction by Partial Match (PPM) [10]– A trie maintains sequences of consecutive

accesses• LBOT [6]

– Special form of association rules of length 2• Others (variations of the above) [3,11]

Related work


Motivation

DG No Yes

PPM Yes No

LBOT No No

Order(1st Req.)

Proposed Yes Yes

Noise(2nd Req.)


• Novel Web log mining algorithm (WMo)– Apriori-like – Effective

•Immune to noise•Considers high order dependencies

– Efficient•Significant reduction in the number

of candidates

Proposed Method (1)


• Session (or transaction): A sequence of requests that occur in a specified time interval from each other [2]

• Containment relationship addresses the 1st requirement (avoiding noise)

• Example:T = A, X, B, Y, C X, Y noiseS = A, B, C the patternS is contained by T

• Comment:With contiguous subsequences based only on support S (the pattern) will be missed.

Proposed Method (2)


• Candidate generation respects the ordering of accesses in transactions.

• Example: A,B B,A

• Dramatic increase in the number of candidates

• Exploits the site structure for pruning [7,8]

Proposed Method (3)


Algorithm genCandidates(Lk, G)//Lk the set of large k-paths and G the graphbeginforeach L=l1, …, lk, L Lk {

N+(lk) = {v| arc lk v G}foreach v N+(lk) {

//apply modified apriori pruningif v L and L’ = l2, …, lk,v Lk {

C= l1, …, lk , vif ( S C, S L’ S Lk )

insert C in the candidate-trie}

}}end

Proposed Method (4)


• Sequential patterns [1]– Reduction when “customer-sequence” = “user-

session”– Suffers from large number of candidates (by not

considering the site structure)• Path Fragments [4] (containment relationship is

performed with regular expressions and the “*” label ) – Focus on semantics (recommendation systems)

• Prefetching: patterns are for system and not for human consumption

• WMo focuses on efficiency/effectiveness rather on expressiveness (semantics)

Discussion


• Synthetic (sample site with 1000 nodes)– Synthetic data generator (see the paper)

• Modeling site nodes, site linkage, size of documents

• Real data sets (see the paper)• Examine the impact of:

– noise– order– client cache (see the paper)– efficiency

Methodology


Accuracy w.r.t. noise

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1.6 1.8 2 2.2 2.4 2.6 2.8 3

accu

racy

mean noise

DGPPMWM

WMoLBOT


Usefulness w.r.t. noise

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

1.6 1.8 2 2.2 2.4 2.6 2.8 3

usefu

lness

mean noise

DGPPMWM

WMoLBOT


Traffic w.r.t. noise

1.25

1.3

1.35

1.4

1.45

1.5

1.55

1.6

1.65

1.7

1.6 1.8 2 2.2 2.4 2.6 2.8 3

netw

ork

tra

ffic

mean noise

DGPPMWM

WMoLBOT


Accuracy w.r.t. order

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

accu

racy

higher order percentage

DGPPMWM

WMoLBOT


Usefulness w.r.t. order

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

usefu

lness


DGPPMWM

WMoLBOT


Traffic w.r.t. order

1.35

1.4

1.45

1.5

1.55

1.6

1.65

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

netw

ork

tra

ffic


DGPPMWM

WMoLBOT


Efficiency (see also [7,8])

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1e+006

1.1e+006

0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26

nu

mb

er

of

can

did

ate

s

support threshold (percentage)

WMWMo/wp

WMo


• Factors that influence Web Prefetching– Noise– Order

• A new algorithm WMo was presented based on data mining

• Compares favorably with previously proposed algorithms

• WMo is an effective and efficient Web prefetching algorithm

Conclusions


1. R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995.2. R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web

Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999.3. M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page

Accesses, SIAM Data Mining, 2001.4. W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD

2000.5. B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World

Wide Web Surfing. Science, 280, pp. 95-97, 1998.6. B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999.7. A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log

Data Mining, ADBIS-DASFAA 2000.8. A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3),

pp.243-266, 2001.9. V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web

Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996.10. T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW

1999.11. J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide

Web Surfing, USITS, 1999.12. L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path

Features, WebKDD 2001.

References

WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data...

Documents

Transcript of WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data...