Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 ·...

40
1 Web Usage Mining

Transcript of Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 ·...

Page 1: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

1

Web Usage Mining

Page 2: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

2

Page 3: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

3

.(Markov and Russell2009)

Arotaritei and Mitra2004

Arotaritei and Mitra2004 .

Arotaritei and

Mitra2004

1 Query 2 Web content mining 3 Web structure mining 4 Web usage mining 5 Pattern

Page 4: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

4

........................................................................................................................ 5

................................................................................................................. 6

.................................................................................................... 8

..................................................................................................... 9

.................................................................................. 21

.......................................................... 13

........................................................................ 16

................................................................................ 18

............................................................................... 21

...................................................................................... 11

WEBSIFT ............................................................................................... 12

............................................................................. 16

................................................................................... 26

...................................................................................... 22

................................................................................... 28

............................................ 12

.............................................................................................................. 31

............................................................................................. 32

.............................................................................................. 32

........................................................................................................... 32

........................................... 11

................................................................................................. 33

................................................................... 34

............................................................................ 34

...................................................................... 36

....................... 13

............................................................................................................... 13

........................................................................................................................... 24

Page 5: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

5

.

.

6 Log files 7 Web usage mining 8 Click stream 9 Profile 10 Web mining 11 Web usage mining 12 Web content mining 13 web structure mining 14 Pre processing 15 Pattern discovery 16 Pattern analyzing

Page 6: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

6

.

.

[3-PP2]

[6-PP14]

17 HTML 18 XML 19 IP Address 20 Page references 21 Log files 22 Users 23 Page View 24 Click Stream 25 Server Sessions 26 Data mining

Page 7: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

2

M. Spiliopoulou

[5-PP4]

HTMLHTML

[5-PP5]

[5-PP6]

M. Spiliopoulou

[5-PP7]

27 User Modeling 28 Meta data 29 Hyper link

Page 8: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

8

[3-PP4]

[3-PP4]

[3-PP5-part1]

[3-PP5]

www.umn.edu/script?IDD8376&locDStore&destDpage1

www.umn.edu/script?IDD8376&locDStore&destDprod&prodDitem1

www.umn.edu/script?IDD4596&locDStore&destDpage1

www.umn.edu/script?IDD9432&locDStore&destDprod&prodDitem2

.*destDprod&prod\(.*)\

.*destD\(.*)\

page1

item1

page1

item2

30 URI 31 Page view identification 32 One to one 33 Session identification 34 WebSift

Page 9: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

9

[3-PP5-part2]

XY

ZX&Y&Z

[3-PP6]

( ) M = [<F1; …; F n>]

( ) F = {hf, L1; …; Lm}

( ) L = <r, (h1, g1)| … | (hp, gp)>

MhiHTMLrgi

GET

post, hidden post, frame, ftpmail[3-PP6]

35 Content hierarchies 36 Page classifications 37 Page clusters 38 Tags 39 Targets 40 Page File

Page 10: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

11

[3-PP7]

[3-PP7]

M= [{index, (frame, 1, left|frame, A, main)};

{1, (get, A, main), (get, B, main), (get, C, main)};

{2, (get, E, main), (get, F, main), (get, G, main)};

{3, (get, E1, main); …; (get, E7, main)};

{4, (get, F1, main); …; (get, F4, main)};

{5, (get, G1, main); …; (get, G8, main)};

{A, (get, D, top)};

{B, (get, 2, leftjE, main), (get, 2, leftjF, main), (get, 2, leftjG, main)};

{C, (get, H, top), (get, I, top), (get, J, top)};

{D};

{E, (get, 3, left E1, main); …; (get, 3, left|E7, main)};

{F, (get, 4, left F1, main); …; (get, 4, left|F4, main)};

{G, (get, 5, left|G1, main); …; (get, 5, left|G8, main)};

{E1}; …; {E7}; {F1}; …; {F4}; {G1}; …; {G8};

{H}; {I}; {J}]

Page 11: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

11

Homepage1-A

index.htm

index.htmhome.htm

[3-PP8]

main body[3-PP9]

41 Page views 42 Page files

Page 12: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

12

[5-PP8]

[8-PP3]

[11-PP3]

[3-PP1]

SIFT

[3-PP2]

43 Web usage mining 44 CGI 45 Proxy server log 46 Browser log 47 User profile 48 Record Data 49 User session 50 User Transaction 51 User query 52 Book marks 53 Data mining 54 Click stream

Page 13: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

13

[3-PP2]

[3-PP10]

55 Web site information filter system 56 Preprocessing 57 Pattern Discovery 58 Pattern Analysis 59 Site design

Page 14: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

14

.

[3-PP10]

[2-PP2]

60 Business Marketing Decision Support 61 Personalization 62 Usability Studies 63 Security 64 Network Traffic Analysis 65 Data mining 66 Click streams

Page 15: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

15

[2-PP2]

[2-PP3]

[2-PP3]

67 Pre processing 68Fusion 69 Data cleaning 70 User identification 71 Session identification 72 Page view identification

Page 16: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

16

[2-PP7]

[2-PP7]

[2-PP7]

[2-PP8]

73 Synchronization 74 Multiple log files 75 Data cleaning 76 Page view identification 77 User identification 78 Session identification 79 Episode identification 80 Click stream 81 Data Cleaning 82 Load 83 Data fusion 84 Referrer 85 Log server 86 Remove 87 Style 88 Page view Identification 89 Template 90 User identification 91 visit

Page 17: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

12

[2-PP8,9]

IP+Agent[2-PP9]

[2-PP10]

A

92 User activity record 93 User authentication 94 Cookies 95 IP 96 Agent 97 Sessionization 98 Click stream 99 Path complete 100 Cashing 101 Proxy

Page 18: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

18

A

[2-pp12]

[2-pp13]

SIFT

[5-PP11]

[2-PP18]

102 Data integration 103 statictics 104 Associaition ruls 105 Clustering 106 Classification 107 Sequential pattern 108 Dependency model 109 Sessions 110 Visitors

Page 19: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

19

.

[2-PP19]

[2-PP20]

[2-PP15]

111 OLAP 112 Clustering 113 Segmentation

Page 20: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

21

.

[2-PP20]

.

Apriori .

[2-PP24]

[2-PP28]

Naive

114 Page view 115 Correlation 116 Association 117 Frequent itemsets118 Sequential 119 Navigational 120 inter session 121 Classification 122 Prediction

Page 21: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

21

BayesianK-Nearest

[2-PP32]

SQL

OLAP

[5-PP14]

[9-PP1]

123 Query 124 On-line analytical processing

Page 22: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

22

[8-PP6]

[8-PP7]

125 Transaction 126 Integration 127 Data cleaning 128 Log servers 129 Clean log 130 Dividing 131 Merging 132 Transaction data 133 Mining 134 Data integration

Page 23: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

23

[8-PP6]

135 Registration data 136 Usage attributes 137 Integrated data 138 Transformation 139 Pattern discovery 140 Pattern analysis

Page 24: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

24

WebSIFT

WebSIFT

NSCA

[5-PP15]

WebSIFT[5-PP16]

141 Web Site Information Filter System 142 National Strength & Conditioning Association

Page 25: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

25

WebSIFT

HTML

OLAP

[5-PP17]

143 Access logs 144 Referrer Logs 145 Agent Logs

Page 26: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

26

[1-PP1895]

146 Web browsing 147 IBD: Intentional Browsing Data 148 Browsing 149 Open File

Page 27: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

22

[1-PP1896]

Open File

[1-PP1896]

150 Web usage mining 151 Tool bar

Page 28: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

28

[1-PP1896]

IP

ftp

A-D

AB

A

B

C

D

AB[1-PP1897]

Page 29: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

29

[1-PP1897]

ID

IP

[1-PP1897]

152 Session identification 153 Hyper link

Page 30: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

31

[1-PP1897]

Page 31: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

31

[4-PP1]

WUM[4-PP8]

154 Data collections 155 Visitor 156 Web usage mining

Page 32: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

32

WUM_prep

WUM_agService

MINT

WUM_gseqm

WUM_visualizer

WUM

[4-PP8-part1]

[4-PP8-part2]

[4-PP10]

WUM_visualizer

g-sequence, navigation pattern

[4-PP11]

157 Data set 158 Aggregated Log

Page 33: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

33

[7-PP1].

[7-PP2].

159 Web usage mining 160Personalization 161 Over load 162 Web personalization 163 Mach 164 Decision-support knowledge 165 Static 166 Operational knowledge 167 Dynamic 168 Personalization function 169 Value added 170 Browsing

Page 34: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

34

[7-PP3-Part1].

[7-PP3-Part2].

[7-PP4-Part1].

[7-PP4-Part2].

[7-PP5].

[7-PP6].

171 Memorization: 172 Guidance: 173 Customization 174 Task performance support 175 Over load 176 load 177 Domain Specification 178 User Identification 179 Efficient Acquisition of User Data 180 Flexible Data Elaboration 181 Efficient Construction of User Models 182 Practical and Legal Considerations

Page 35: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

35

[7-PP9].

[7-PP10]

[7-PP9].

[7-PP15].

[7-PP20].

183 Data collection 184 Preproccessing 185 Pattern discovery 186 Knowledge discovery 187 Proxy servers 188 Browsing

Page 36: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

36

[7-PP34].

[7-PP36]

189 Clustering 190 Classification 191 Association discovery 192 Sequence pattern discovery 193 User modeling

Page 37: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

32

[7-PP37]

[7-PP37]

[7-PP39]

SETA ،Tellim ،Schwarzkopf ،Oracle9iAS Personalization ،Netmindو Reaction

Mobasher et al. (2000b) Yan et al. (1996) and Kamdar and Joshi (2000) ،

SiteHelper (Ngu and Wu, 1997) WUM (Spiliopoulou et al., 1999 b)

194 Single-user/Multi-user 195 Static/Dynamic 196 Context-Sensitive/Context-Insensitive 197 Explanatory/Non-Explanatory 198 Proactive/Conservative 199 Converging/Diverging

Page 38: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

38

[10-PP1]

.

[10-PP1]

[10-PP3]

.

[10-PP4]

[10-PP5,6]

200 Cluster 201 Session

Page 39: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

39

WebSIFT

202 Web usage mining 203 Data mining 204 Web mining 205 Mining 206 Web Site Information Filter System 207 Browsing

Page 40: Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 · 34.[7-PP3-Part1].[7-PP3-Part2].[7-PP4-Part1].[7-PP4-Part2].[7-PP5].[7-PP6] 171 Memorization:

41

[1] Yu-Hui Tao, Tzung-Pei Hong, Yu-Ming Su; “Web usage mining with intentional browsing data”,

Expert Systems with Applications 34,

http://www.sciencedirect.com/science/article/pii/S0957417407000668, 1893–1904, April2008.

[2] B.Mobasher; “Chapter 12 Web Usage Mining”, http://maya.cs.depaul.edu/~mobasher/papers/12-

web-usage-mining.pdf, 2005.

[3] ROBERT COOLEY;“ The use of web structure and content to identify subjectively interesting web

usage pattern”, ACM Transactions on Internet Technology, Vol. 3, No. 2, Pages 93–116, May 2003.

[4] Berendt, Myra Spiliopoulou; “Analysis of navigation behavior in web site”, The VLDB Journal 9,

56–75, 2000.

[5] Yan Wang; “Web Mining and Knowledge Discovery of Usage Patterns”,

http://softbase.uwaterloo.ca/~tozsu/courses/cs748t/surveys/wang-slides.pdf, February, 2000

[6] DANIEL T. LAROSE; “Discovering Knowledge in Data An Introduction to Data Mining”, Willy Interscience, Published by John Wiley & Sons, Inc., Hoboken, New Jersey, 2005.

[7] Dimitrios Peirrakos, Georgios Paliouras; “Web Personalization based on web usage mining”, User

Modeling and User-Adapted Interaction 13: 311-372, April 2003

[8] R.Cooley, B.Mobasher, J.Srivastava; “Web Mining: Information And Pattern Discovery on the

World Wide Web”, 1997

[9] Yao-Te Wang, Anthony J.T.Lee; “Mining Web Navigation Patterns With a Path Traversal Graph”,

Expert Systems with Applications 38, 7112–7122, www.elsevier.com/locate/eswa, 2011

][

][