WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data...

26
WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web- user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios Yannis Manolopoulos Aristotle Univ. of Thessaloniki, Greece Presentation: Spyros Papadimitriou, Carnegie Mellon Univ.
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data...

WebKDD 2001 Aristotle University of Thessaloniki 1

Effective Prediction of Web-user Accesses: A Data Mining Approach

Nanopoulos AlexandrosKatsaros Dimitrios

Yannis ManolopoulosAristotle Univ. of Thessaloniki,

Greece

Presentation:Spyros Papadimitriou, Carnegie Mellon Univ.

WebKDD 2001 Aristotle University of Thessaloniki 2

Introduction (1/2)

• Web Prefetching: Deducing forthcoming user accesses based on log information

• Focus on:– Predictive prefetching (use of history)– Server initiated (server makes

predictions and piggybacks them to the clients)

WebKDD 2001 Aristotle University of Thessaloniki 3

• Within a site, users navigate following links [5]

• For server-initiated predictive prefetching interest is for access patterns reflecting this behavior

Introduction (2/2)

WebKDD 2001 Aristotle University of Thessaloniki 4

• Motivation & Related work• Proposed method• Comparative performance

evaluation• Conclusions

Outline

WebKDD 2001 Aristotle University of Thessaloniki 5

• Motivation & Related work• Proposed method• Comparative performance

evaluation• Conclusions

Presentation Outline

WebKDD 2001 Aristotle University of Thessaloniki 6

• Site structure and contents impose1. The order of dependencies (first or higher)

among the documents2. The interleaving of documents belonging

to patterns with random visits (noise)

• Discovered patterns should respect these factors

Requirements

WebKDD 2001 Aristotle University of Thessaloniki 7

• Dependency graph (DG) [9]– A graph maintains pairwise accesses

• Prediction by Partial Match (PPM) [10]– A trie maintains sequences of consecutive

accesses• LBOT [6]

– Special form of association rules of length 2• Others (variations of the above) [3,11]

Related work

WebKDD 2001 Aristotle University of Thessaloniki 8

Motivation

DG No Yes

PPM Yes No

LBOT No No

Order(1st Req.)

Proposed Yes Yes

Noise(2nd Req.)

WebKDD 2001 Aristotle University of Thessaloniki 9

• Motivation & Related work• Proposed method• Comparative performance

evaluation• Conclusions

Presentation Outline

WebKDD 2001 Aristotle University of Thessaloniki 10

• Novel Web log mining algorithm (WMo)– Apriori-like – Effective

•Immune to noise•Considers high order dependencies

– Efficient•Significant reduction in the number

of candidates

Proposed Method (1)

WebKDD 2001 Aristotle University of Thessaloniki 11

• Session (or transaction): A sequence of requests that occur in a specified time interval from each other [2]

• Containment relationship addresses the 1st requirement (avoiding noise)

• Example:T = A, X, B, Y, C X, Y noiseS = A, B, C the patternS is contained by T

• Comment:With contiguous subsequences based only on support S (the pattern) will be missed.

Proposed Method (2)

WebKDD 2001 Aristotle University of Thessaloniki 12

• Candidate generation respects the ordering of accesses in transactions.

• Example: A,B B,A

• Dramatic increase in the number of candidates

• Exploits the site structure for pruning [7,8]

Proposed Method (3)

WebKDD 2001 Aristotle University of Thessaloniki 13

Algorithm genCandidates(Lk, G)//Lk the set of large k-paths and G the graphbeginforeach L=l1, …, lk, L Lk {

N+(lk) = {v| arc lk v G}foreach v N+(lk) {

//apply modified apriori pruningif v L and L’ = l2, …, lk,v Lk {

C= l1, …, lk , vif ( S C, S L’ S Lk )

insert C in the candidate-trie}

}}end

Proposed Method (4)

WebKDD 2001 Aristotle University of Thessaloniki 14

• Sequential patterns [1]– Reduction when “customer-sequence” = “user-

session”– Suffers from large number of candidates (by not

considering the site structure)• Path Fragments [4] (containment relationship is

performed with regular expressions and the “*” label ) – Focus on semantics (recommendation systems)

• Prefetching: patterns are for system and not for human consumption

• WMo focuses on efficiency/effectiveness rather on expressiveness (semantics)

Discussion

WebKDD 2001 Aristotle University of Thessaloniki 15

• Motivation & Related work• Proposed method• Comparative performance

evaluation• Conclusions

Presentation Outline

WebKDD 2001 Aristotle University of Thessaloniki 16

• Synthetic (sample site with 1000 nodes)– Synthetic data generator (see the paper)

• Modeling site nodes, site linkage, size of documents

• Real data sets (see the paper)• Examine the impact of:

– noise– order– client cache (see the paper)– efficiency

Methodology

WebKDD 2001 Aristotle University of Thessaloniki 17

Accuracy w.r.t. noise

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1.6 1.8 2 2.2 2.4 2.6 2.8 3

accu

racy

mean noise

DGPPMWM

WMoLBOT

WebKDD 2001 Aristotle University of Thessaloniki 18

Usefulness w.r.t. noise

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

1.6 1.8 2 2.2 2.4 2.6 2.8 3

usefu

lness

mean noise

DGPPMWM

WMoLBOT

WebKDD 2001 Aristotle University of Thessaloniki 19

Traffic w.r.t. noise

1.25

1.3

1.35

1.4

1.45

1.5

1.55

1.6

1.65

1.7

1.6 1.8 2 2.2 2.4 2.6 2.8 3

netw

ork

tra

ffic

mean noise

DGPPMWM

WMoLBOT

WebKDD 2001 Aristotle University of Thessaloniki 20

Accuracy w.r.t. order

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

accu

racy

higher order percentage

DGPPMWM

WMoLBOT

WebKDD 2001 Aristotle University of Thessaloniki 21

Usefulness w.r.t. order

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

usefu

lness

higher order percentage

DGPPMWM

WMoLBOT

WebKDD 2001 Aristotle University of Thessaloniki 22

Traffic w.r.t. order

1.35

1.4

1.45

1.5

1.55

1.6

1.65

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

netw

ork

tra

ffic

higher order percentage

DGPPMWM

WMoLBOT

WebKDD 2001 Aristotle University of Thessaloniki 23

Efficiency (see also [7,8])

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1e+006

1.1e+006

0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26

nu

mb

er

of

can

did

ate

s

support threshold (percentage)

WMWMo/wp

WMo

WebKDD 2001 Aristotle University of Thessaloniki 24

• Motivation & Related work• Proposed method• Comparative performance

evaluation• Conclusions

Presentation Outline

WebKDD 2001 Aristotle University of Thessaloniki 25

• Factors that influence Web Prefetching– Noise– Order

• A new algorithm WMo was presented based on data mining

• Compares favorably with previously proposed algorithms

• WMo is an effective and efficient Web prefetching algorithm

Conclusions

WebKDD 2001 Aristotle University of Thessaloniki 26

1. R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995.2. R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web

Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999.3. M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page

Accesses, SIAM Data Mining, 2001.4. W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD

2000.5. B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World

Wide Web Surfing. Science, 280, pp. 95-97, 1998.6. B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999.7. A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log

Data Mining, ADBIS-DASFAA 2000.8. A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3),

pp.243-266, 2001.9. V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web

Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996.10. T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW

1999.11. J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide

Web Surfing, USITS, 1999.12. L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path

Features, WebKDD 2001.

References