Mining Web Logs to improve Website Organization Ramakrishnan Srikant and Yinghui Yang Professor...

19
Mining Web Logs to Mining Web Logs to improve Website improve Website Organization Organization Ramakrishnan Srikant and Yinghui Yang Ramakrishnan Srikant and Yinghui Yang Professor :Wan-Shiou Yang Professor :Wan-Shiou Yang The algorithm to The algorithm to automatically find pages in a automatically find pages in a website whose location is website whose location is different from where visitors different from where visitors expect to find them expect to find them

Transcript of Mining Web Logs to improve Website Organization Ramakrishnan Srikant and Yinghui Yang Professor...

Mining Web Logs to Mining Web Logs to improve Website improve Website

OrganizationOrganizationRamakrishnan Srikant and Yinghui YangRamakrishnan Srikant and Yinghui Yang

Professor :Wan-Shiou YangProfessor :Wan-Shiou Yang

The algorithm to automatically The algorithm to automatically find pages in a website whose find pages in a website whose

location is different from location is different from where visitors expect to find where visitors expect to find

themthem

IntroductionIntroduction

The Key insight is that visitors will backtrack if The Key insight is that visitors will backtrack if they don’t find the page where they expect it: they don’t find the page where they expect it:

• The point from where they backtrack is the The point from where they backtrack is the expected location for this page.expected location for this page.

• Expected locations with a significant number Expected locations with a significant number of hits are presented to Website administrator of hits are presented to Website administrator for adding navigation links from the expected for adding navigation links from the expected location to the target page.location to the target page.

Model of Visitor Search patternModel of Visitor Search pattern

Identifying Target PagesIdentifying Target Pages

Analysis the present famous WebsitesAnalysis the present famous Websites• Amazon :There is a clear separation betAmazon :There is a clear separation bet

ween content pages and index such as ween content pages and index such as Reference itemsetsReference itemsets

• Yahoo: List website on the internal nodeYahoo: List website on the internal nodes of its hierarchy , not just on the leaf nos of its hierarchy , not just on the leaf nodes.des.

Website & Search PatternWebsite & Search Pattern

Test Find Expected LocationTest Find Expected LocationFor i:= 2 to n-2 beginFor i:= 2 to n-2 begin

If ((Pi-1)=(Pi+1) or (no link from Pi to Pi=1)If ((Pi-1)=(Pi+1) or (no link from Pi to Pi=1)

Add Pi to BAdd Pi to B

WhenWhen

i=2 i=2 P1=P3P1=P3 or or no link P2 -> P3no link P2 -> P3 ,P2=>X B ,P2=>X B

i=3 i=3 P2=P4P2=P4 or or no link P3 -> P4no link P3 -> P4 ,P3=> B ,P3=> B

i=4 i=4 P3=P5P3=P5 or or no link P4 -> P5no link P4 -> P5 ,P4=>X B ,P4=>X B

i=5 i=5 P4=P6P4=P6 or or no link P5 -> P6no link P5 -> P6 ,P5=> B ,P5=> B

i=6 i=6 P5=P7P5=P7 or or no link P6 -> P7no link P6 -> P7 ,P6=> B ,P6=> B

Algorithm: Find Expected Algorithm: Find Expected LocationLocation

LimitationsLimitations

• When the website doesn’t have a When the website doesn’t have a clear separation between content and clear separation between content and index page, it can hard to distinguish index page, it can hard to distinguish target pages and other pages.target pages and other pages.

• Another limitation is that only people Another limitation is that only people who can successfully find a target who can successfully find a target page will generate an expected page will generate an expected location for that page.location for that page.

Optimizing The Set of Optimizing The Set of Navigation LinksNavigation Links

We consider three approaches for recommendiWe consider three approaches for recommending additional links to the web site administrng additional links to the web site administrator ator

1.1. FirstOnly: Easy and Simple.FirstOnly: Easy and Simple.

2.2. OptimizeBenefit: Order and all elements.OptimizeBenefit: Order and all elements.

3.3. OptimizeTime: Reduce time for both.OptimizeTime: Reduce time for both.

FirstOnlyFirstOnly

The algorithm recommends the The algorithm recommends the frequency first expected locations (the frequency first expected locations (the page that occur frequency in ^E1) to page that occur frequency in ^E1) to the website administrator, ignoring the website administrator, ignoring any subsequent expected locations any subsequent expected locations the visitor may have considered.the visitor may have considered.

• Disadvantage: It Just satisfied with Disadvantage: It Just satisfied with information a little of people neededinformation a little of people needed

Example FirstOnlyExample FirstOnly

Algorithm: FirstOnlyAlgorithm: FirstOnly

OptimizeBenefitOptimizeBenefit The is a greedy algorithm that attempt tThe is a greedy algorithm that attempt t

o maximize the benefit to the website o maximize the benefit to the website of adding additional links. of adding additional links.

• In each pass, it find the page with the In each pass, it find the page with the maximum benefit.maximum benefit.

• adds it to the set of recommendations. adds it to the set of recommendations. • null out all instances of this page and snull out all instances of this page and s

ucceeding page, and recomputes the bucceeding page, and recomputes the benefit. enefit.

Example OptimizeBenefitExample OptimizeBenefit

Algorithm: OptimizeBenefitAlgorithm: OptimizeBenefit

OptimizeTimeOptimizeTime

• The goal of the algorithm is to minimize The goal of the algorithm is to minimize the number of backtrack the visitor has tthe number of backtrack the visitor has to make. o make.

• Saving time for each record (person) maSaving time for each record (person) makes good performance for website.kes good performance for website.

• The algorithm also a greedy search ,and The algorithm also a greedy search ,and is quit similar to OptimizeBenefit.is quit similar to OptimizeBenefit.

Example OptimizeTimeExample OptimizeTime

Algorithm: OptimizeTimeAlgorithm: OptimizeTime

Algorithm: OptimizeTime&ProfAlgorithm: OptimizeTime&Profit it • We can emphasize that adding Pi_num oWe can emphasize that adding Pi_num o

f the special recommend from Webdesigf the special recommend from Webdesigner view.ner view.

• P:=Page with highest support from TimeP:=Page with highest support from Timesaved (Pi) * Pi_numsaved (Pi) * Pi_num

• We can get the list of recommendations We can get the list of recommendations with Web-designer focus.with Web-designer focus.