BA 572 - J. Galván 2
An Introduction to Web Traffic Measurement
What is log file analysis? Commercial Tools Definitions Examples of Analysis Pitfalls Other issues/topics Applications and Marketing Issues. Wrap Up
BA 572 - J. Galván 3
DEFINITION
Web traffic analysis is the process of measuring extent and character of activity of
users on a web site, interpreting the measurements, and applying the conclusions.
BA 572 - J. Galván 4
LOG FILE ANALYSIS - BIG PICTURE
Web Server
AdminConsole
AnalysisServer &AnalysisPackage
Data Store(Oracle DB)
BrowserPower userGUI/SQL.Special Reports
"Canned"Reports
Log files
Nightly
BA 572 - J. Galván 5
LOG FILE EXAMPLE(Combined Log Format)
limestone.uoregon.edu andred - [19/Jun/1999:00:49:41 - 0500] "GET/service/contracts.gif HTTP/1.0" 200 1341 "http://www.netgen.com/"Mozilla/2.0(compatible; MSIE 4.0; AOL 4.0; Windows95)"
Hostname or IP addressRegistered user name (usually blank)Date and time of requestObject requestedStatus codeBytes transferredReferral informationBrowser information
Courtesy netGenesis Corp.
BA 572 - J. Galván 6
COMMERCIAL WEB TRAFFIC ANALYSIS TOOLS
Tool Method Comments
netGenesis log file analysis Highly capable systemwww.netgenesis.com
Wusage log file analysis Inexpensive, easy to maintainwww.boutel.com/wusage Limited capability
Accrue log file analysis Troubles with earlier versionswww.accrue.com
NetAcumen FTP log files to vendor Privacy issues. Website for reports.www.netacumen.com $1K/mo for 10K visits/mo. $2.5K set up charge.
Hitbox Client side scripting Vend out admin. Good for simple sites.www.hitbox.com 250K pvs/mo = $700/mo. 1M pvs/mo = $1.5K/mo.
ARIA TCP/IP packet sniffing Does not use log files.www.macromedia.com
Courtesy L. Johnson, Sun Microsystems
BA 572 - J. Galván 7
TERMS
Resource - Any file on a server available to be downloaded to a client.
Request - An instruction made to a webserver to download a resource. (Sometimes called a "hit".)
Page - An html document, usually containing text and references to images and other objects. A page has its own URL.
Page View - A request for a document on a web site.
Page views: .html, .pl, .txt, .shtml, .exe, .cgi, .bat, ...
Not page views: .gif, .jpeg, .movie, .tcl, .tif, .wav, ...
BA 572 - J. Galván 8
TERMS(Cont.)
Visit - A specific session at a web site that ends when no more requests are made after a defined time period, usually 30 minutes.
User (Visitor) - A person or agent who makes requests to a web site.
Daily Unique User (duUser) - A unique user who visited your web site on a given day.
Weekly Unique User (wuUser) - A unique user who visited your site in a given week.
BA 572 - J. Galván 11
TOP PAGES EXAMPLE
1 / 2,136,650 38.5 38.52/bigadmin/downloads/ 228,679 4.1 42.73/MySun/ 198,430 3.6 46.24/bigadmin/docs/ 131,694 2.4 48.65/search/index.cgi/ 103,248 1.9 50.56/staroffice/ 65,347 1.2 51.67/products-n-solutions 65,038 1.2 52.88/corp_emp/scripts/showjob.cgi/ 63,601 1.1 54.09/products/staroffice/get.cgi/ 60,260 1.1 55.010/forte/ffj/overview.html/ 58,103 1.0 56.111Other 2,434,788 43.9 100
HTTP Resource # of Page Views %of Total Cum %
(Altered data.)
BA 572 - J. Galván 12
Drill Down, Top Hostnames Example
Top Hostnames for /corp_emp/scripts/showjob.cgi, for time period in previous report:
1 serv3.hwka.com 17,161 27.0 27.02 209.67.186.119 5,998 9.4 36.43 216.34.97.92 5,736 9.0 45.44 ip22.digibahn.net 1,103 1.6 47.05 areil.sun.com 501 0.8 47.86 mailgate.cwhkt.com 363 0.6 48.47 pix89.pgexch.com 249 0.4 52.2 other (5152) 30,115 47.2 100
(Altered data.)
Hostname # Page Views %of Total Cum %
BA 572 - J. Galván 13
Top Referrers Example
1No Referral Information Sent 3,655,598 65.3%2 www.sun.com 525,225 9.4%3 java.sun.com 161,909 2.9%4www.google.com 110,019 2.0%5 www.slashdot.org 40,280 0.7%6 slashdot.org 35,263 0.6%7www.javasoft.com 31,128 0.6%8 web.icq.com 28,401 0.5%9google.yahoo.com 27,622 0.5%10www.java.sun.com 24,400 0.4% other 955,485 17%
Referring Web Site # Visits % of Total
(Altered data.)
BA 572 - J. Galván 14
CLICKSTREAM EXAMPLE
Number of Visits % of TotalFirst page 11507 100.00%
Second page 9096 79.00%Third page 7000 61.00%Third page 1500 13.00%Third page 500 4.30%Third page 96 0.80%
Second page 1214 10.60%Third page 577 5.00%Third page 394 3.40%Third page 134 1.20%Third page 109 0.90%
Second page 1137 9.90%Third page 800 7.00%Third page 200 1.70%Third page 100 0.90%Third page 37 0.30%
Second page 20 0.20%Third page 10 0.10%Third page 5 0.05%Third page 3 0.03%
BA 572 - J. Galván 15
PIT FALLS
What do you know?
Date and time of the request.What file was requested.Internet address of the host.Usually are told what page referred the visitor to you.Usually are told the make and model of the browser.
In other words, you know what is in the log file.
The rest you are assuming, calculating,
estimating, or believing.
BA 572 - J. Galván 16
PITFALLS (CONT.)
Cacheing:Browser cacheing.ISP cacheing (AOL).National cacheing.
Affects traffic quantity (views, visits, users).Affects apparent behavior (e.g.click streams).
Proxy Servers:Many real users might look like one user.Distorts the number of users, visits, and click streams.
Merged visits.
Robots:Inflate page views.
BA 572 - J. Galván 17
PIT FALLS (CONT.)
Internal vs. External Traffic.
Complicated web sites: Multiple servers. Need all log files in the same data store. Changing web site design.
Load balancing: Front end server can shuffle traffic between
different backend servers. Where are your log files actually coming from?
BA 572 - J. Galván 18
OTHER ISSUES/TOPICS
•Cookies:•Partial solution to Unique User problem caused by proxy servers. Improves user and visit count accuracy; untangles clickstreams.•Privacy issues must be dealt with.•Authenticated User data has highest confidence.
•Query Strings:https://sun.com/service/Router?country=US&feature=SoftwareUpdatehttps://sun.com/service/Router?country=JP&feature=ServiceRequest
RESOURCE?KEY=value&KEY=value
Analysis packages must be configured to handle these.
BA 572 - J. Galván 19
OTHER ISSUES/TOPICS
.
Dynamic Content:
Web pages which are generated on the fly by pulling data
from a database. URLs can be very cryptic. Measurement
tool must be specially configured
•Dynamic Content:•Web pages which are generated on the fly by pulling data from a database. •URLs can be very cryptic. •Measurement tool must be specially configured
•Transactions and Other Metrics:• Purchases• Submittals• Linkages to backend servers and databases.• Telephone data.• Traditional order channels.• Financial impact.• Return on investment (ROI).
BA 572 - J. Galván 20
HOW IS WEB TRAFFIC ANALYSIS USED?
Customer Web Site Financials
Web Traffic is a link between financial performance and customer behaviour.
Want this!Use This!Understand this!
BA 572 - J. Galván 21
STAGES OF CUSTOMER UNDERSTANDING
Machine.Basic Stats.
PersistentUser Identifier.Retention, frequency,recency.
Anonymous UserProfile.One-to-fewdemographic
Discrete UserIdentity.One-to-onetargeting
BA 572 - J. Galván 22
CONVERSION
Store CatalogAdd toCart
CheckOut
Receipt40% 7% 30% 40%
3%
1%
0.4%
What web traffic metrics would you use to improve this?How might the user interface affect loss at each step?
BA 572 - J. Galván 23
SHIPPING COMPANY EXAMPLE. SIX TYPES OF USERS
Segment 1: Trackers - 37% Tracking past shipments. Characterized by low duration.
Segment 2: Reservers - 3% Complete online reservations. Low duration per page view.
Segment 3: Uncommitted - 10% Characterized by long duration. Fail to complete transaction.
Segment 4: Info Gatherers - 4%Concentrated in information areas.Rarely reach transaction areas.
Segment 5: Single-clickers - 32%Visit homepage only.Not qualified customers or prospects.
Segment 6: Wanderers - 15%Very few, very random pages.Few hits, but long duration per page view.
Courtesy netGenesis
What strategy would you use to help each segment? Would you change the user interface per segment?
BA 572 - J. Galván 24
SUMMARY
Server log files can be used to record web traffic.
Page views, visits, users (various uniquenesses), top pages, referrers, and clickstreams are used to describe web traffic.
Pitfalls to accurate data are cacheing, proxy servers, robots, complicated architecture, ...
Web traffic is just part of the picture.
Traffic data needs to be interpreted in a broader context to better serve customer, to steer user interface decisions, and ultimately help company bottom line.
Top Related