Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 ·...
Transcript of Web Usage Mining - Postpost.ir/_itcenter/documents/webusagemining_20120708... · 2012-07-08 ·...
1
Web Usage Mining
2
3
.(Markov and Russell2009)
Arotaritei and Mitra2004
Arotaritei and Mitra2004 .
Arotaritei and
Mitra2004
1 Query 2 Web content mining 3 Web structure mining 4 Web usage mining 5 Pattern
4
........................................................................................................................ 5
................................................................................................................. 6
.................................................................................................... 8
..................................................................................................... 9
.................................................................................. 21
.......................................................... 13
........................................................................ 16
................................................................................ 18
............................................................................... 21
...................................................................................... 11
WEBSIFT ............................................................................................... 12
............................................................................. 16
................................................................................... 26
...................................................................................... 22
................................................................................... 28
............................................ 12
.............................................................................................................. 31
............................................................................................. 32
.............................................................................................. 32
........................................................................................................... 32
........................................... 11
................................................................................................. 33
................................................................... 34
............................................................................ 34
...................................................................... 36
....................... 13
............................................................................................................... 13
........................................................................................................................... 24
5
.
.
6 Log files 7 Web usage mining 8 Click stream 9 Profile 10 Web mining 11 Web usage mining 12 Web content mining 13 web structure mining 14 Pre processing 15 Pattern discovery 16 Pattern analyzing
6
.
.
[3-PP2]
[6-PP14]
17 HTML 18 XML 19 IP Address 20 Page references 21 Log files 22 Users 23 Page View 24 Click Stream 25 Server Sessions 26 Data mining
2
M. Spiliopoulou
[5-PP4]
HTMLHTML
[5-PP5]
[5-PP6]
M. Spiliopoulou
[5-PP7]
27 User Modeling 28 Meta data 29 Hyper link
8
[3-PP4]
[3-PP4]
[3-PP5-part1]
[3-PP5]
www.umn.edu/script?IDD8376&locDStore&destDpage1
www.umn.edu/script?IDD8376&locDStore&destDprod&prodDitem1
www.umn.edu/script?IDD4596&locDStore&destDpage1
www.umn.edu/script?IDD9432&locDStore&destDprod&prodDitem2
.*destDprod&prod\(.*)\
.*destD\(.*)\
page1
item1
page1
item2
30 URI 31 Page view identification 32 One to one 33 Session identification 34 WebSift
9
[3-PP5-part2]
XY
ZX&Y&Z
[3-PP6]
( ) M = [<F1; …; F n>]
( ) F = {hf, L1; …; Lm}
( ) L = <r, (h1, g1)| … | (hp, gp)>
MhiHTMLrgi
GET
post, hidden post, frame, ftpmail[3-PP6]
35 Content hierarchies 36 Page classifications 37 Page clusters 38 Tags 39 Targets 40 Page File
11
[3-PP7]
[3-PP7]
M= [{index, (frame, 1, left|frame, A, main)};
{1, (get, A, main), (get, B, main), (get, C, main)};
{2, (get, E, main), (get, F, main), (get, G, main)};
{3, (get, E1, main); …; (get, E7, main)};
{4, (get, F1, main); …; (get, F4, main)};
{5, (get, G1, main); …; (get, G8, main)};
{A, (get, D, top)};
{B, (get, 2, leftjE, main), (get, 2, leftjF, main), (get, 2, leftjG, main)};
{C, (get, H, top), (get, I, top), (get, J, top)};
{D};
{E, (get, 3, left E1, main); …; (get, 3, left|E7, main)};
{F, (get, 4, left F1, main); …; (get, 4, left|F4, main)};
{G, (get, 5, left|G1, main); …; (get, 5, left|G8, main)};
{E1}; …; {E7}; {F1}; …; {F4}; {G1}; …; {G8};
{H}; {I}; {J}]
11
Homepage1-A
index.htm
index.htmhome.htm
[3-PP8]
main body[3-PP9]
41 Page views 42 Page files
12
[5-PP8]
[8-PP3]
[11-PP3]
[3-PP1]
SIFT
[3-PP2]
43 Web usage mining 44 CGI 45 Proxy server log 46 Browser log 47 User profile 48 Record Data 49 User session 50 User Transaction 51 User query 52 Book marks 53 Data mining 54 Click stream
13
[3-PP2]
[3-PP10]
55 Web site information filter system 56 Preprocessing 57 Pattern Discovery 58 Pattern Analysis 59 Site design
14
.
[3-PP10]
[2-PP2]
60 Business Marketing Decision Support 61 Personalization 62 Usability Studies 63 Security 64 Network Traffic Analysis 65 Data mining 66 Click streams
15
[2-PP2]
[2-PP3]
[2-PP3]
67 Pre processing 68Fusion 69 Data cleaning 70 User identification 71 Session identification 72 Page view identification
16
[2-PP7]
[2-PP7]
[2-PP7]
[2-PP8]
73 Synchronization 74 Multiple log files 75 Data cleaning 76 Page view identification 77 User identification 78 Session identification 79 Episode identification 80 Click stream 81 Data Cleaning 82 Load 83 Data fusion 84 Referrer 85 Log server 86 Remove 87 Style 88 Page view Identification 89 Template 90 User identification 91 visit
12
[2-PP8,9]
IP+Agent[2-PP9]
[2-PP10]
A
92 User activity record 93 User authentication 94 Cookies 95 IP 96 Agent 97 Sessionization 98 Click stream 99 Path complete 100 Cashing 101 Proxy
18
A
[2-pp12]
[2-pp13]
SIFT
[5-PP11]
[2-PP18]
102 Data integration 103 statictics 104 Associaition ruls 105 Clustering 106 Classification 107 Sequential pattern 108 Dependency model 109 Sessions 110 Visitors
19
.
[2-PP19]
[2-PP20]
[2-PP15]
111 OLAP 112 Clustering 113 Segmentation
21
.
[2-PP20]
.
Apriori .
[2-PP24]
[2-PP28]
Naive
114 Page view 115 Correlation 116 Association 117 Frequent itemsets118 Sequential 119 Navigational 120 inter session 121 Classification 122 Prediction
21
BayesianK-Nearest
[2-PP32]
SQL
OLAP
[5-PP14]
[9-PP1]
123 Query 124 On-line analytical processing
22
[8-PP6]
[8-PP7]
125 Transaction 126 Integration 127 Data cleaning 128 Log servers 129 Clean log 130 Dividing 131 Merging 132 Transaction data 133 Mining 134 Data integration
23
[8-PP6]
135 Registration data 136 Usage attributes 137 Integrated data 138 Transformation 139 Pattern discovery 140 Pattern analysis
24
WebSIFT
WebSIFT
NSCA
[5-PP15]
WebSIFT[5-PP16]
141 Web Site Information Filter System 142 National Strength & Conditioning Association
25
WebSIFT
HTML
OLAP
[5-PP17]
143 Access logs 144 Referrer Logs 145 Agent Logs
26
[1-PP1895]
146 Web browsing 147 IBD: Intentional Browsing Data 148 Browsing 149 Open File
22
[1-PP1896]
Open File
[1-PP1896]
150 Web usage mining 151 Tool bar
28
[1-PP1896]
IP
ftp
A-D
AB
A
B
C
D
AB[1-PP1897]
29
[1-PP1897]
ID
IP
[1-PP1897]
152 Session identification 153 Hyper link
31
[1-PP1897]
31
[4-PP1]
WUM[4-PP8]
154 Data collections 155 Visitor 156 Web usage mining
32
WUM_prep
WUM_agService
MINT
WUM_gseqm
WUM_visualizer
WUM
[4-PP8-part1]
[4-PP8-part2]
[4-PP10]
WUM_visualizer
g-sequence, navigation pattern
[4-PP11]
157 Data set 158 Aggregated Log
33
[7-PP1].
[7-PP2].
159 Web usage mining 160Personalization 161 Over load 162 Web personalization 163 Mach 164 Decision-support knowledge 165 Static 166 Operational knowledge 167 Dynamic 168 Personalization function 169 Value added 170 Browsing
34
[7-PP3-Part1].
[7-PP3-Part2].
[7-PP4-Part1].
[7-PP4-Part2].
[7-PP5].
[7-PP6].
171 Memorization: 172 Guidance: 173 Customization 174 Task performance support 175 Over load 176 load 177 Domain Specification 178 User Identification 179 Efficient Acquisition of User Data 180 Flexible Data Elaboration 181 Efficient Construction of User Models 182 Practical and Legal Considerations
35
[7-PP9].
[7-PP10]
[7-PP9].
[7-PP15].
[7-PP20].
183 Data collection 184 Preproccessing 185 Pattern discovery 186 Knowledge discovery 187 Proxy servers 188 Browsing
36
[7-PP34].
[7-PP36]
189 Clustering 190 Classification 191 Association discovery 192 Sequence pattern discovery 193 User modeling
32
[7-PP37]
[7-PP37]
[7-PP39]
SETA ،Tellim ،Schwarzkopf ،Oracle9iAS Personalization ،Netmindو Reaction
Mobasher et al. (2000b) Yan et al. (1996) and Kamdar and Joshi (2000) ،
SiteHelper (Ngu and Wu, 1997) WUM (Spiliopoulou et al., 1999 b)
194 Single-user/Multi-user 195 Static/Dynamic 196 Context-Sensitive/Context-Insensitive 197 Explanatory/Non-Explanatory 198 Proactive/Conservative 199 Converging/Diverging
38
[10-PP1]
.
[10-PP1]
[10-PP3]
.
[10-PP4]
[10-PP5,6]
200 Cluster 201 Session
39
WebSIFT
202 Web usage mining 203 Data mining 204 Web mining 205 Mining 206 Web Site Information Filter System 207 Browsing
41
[1] Yu-Hui Tao, Tzung-Pei Hong, Yu-Ming Su; “Web usage mining with intentional browsing data”,
Expert Systems with Applications 34,
http://www.sciencedirect.com/science/article/pii/S0957417407000668, 1893–1904, April2008.
[2] B.Mobasher; “Chapter 12 Web Usage Mining”, http://maya.cs.depaul.edu/~mobasher/papers/12-
web-usage-mining.pdf, 2005.
[3] ROBERT COOLEY;“ The use of web structure and content to identify subjectively interesting web
usage pattern”, ACM Transactions on Internet Technology, Vol. 3, No. 2, Pages 93–116, May 2003.
[4] Berendt, Myra Spiliopoulou; “Analysis of navigation behavior in web site”, The VLDB Journal 9,
56–75, 2000.
[5] Yan Wang; “Web Mining and Knowledge Discovery of Usage Patterns”,
http://softbase.uwaterloo.ca/~tozsu/courses/cs748t/surveys/wang-slides.pdf, February, 2000
[6] DANIEL T. LAROSE; “Discovering Knowledge in Data An Introduction to Data Mining”, Willy Interscience, Published by John Wiley & Sons, Inc., Hoboken, New Jersey, 2005.
[7] Dimitrios Peirrakos, Georgios Paliouras; “Web Personalization based on web usage mining”, User
Modeling and User-Adapted Interaction 13: 311-372, April 2003
[8] R.Cooley, B.Mobasher, J.Srivastava; “Web Mining: Information And Pattern Discovery on the
World Wide Web”, 1997
[9] Yao-Te Wang, Anthony J.T.Lee; “Mining Web Navigation Patterns With a Path Traversal Graph”,
Expert Systems with Applications 38, 7112–7122, www.elsevier.com/locate/eswa, 2011
][
][