Automating Assessment of Web Site Usability
description
Transcript of Automating Assessment of Web Site Usability
Automating Assessment of
Web Site Usability
Marti Hearst
Melody Ivory
Rashmi Sinha
University of California, Berkeley
ASIS IA Summit, Feb 2001
The Usability Gap
196M new Web sites in the next 5 years [Nielsen99]
~20,000 user interface professionals [Nielson99]
ASIS IA Summit, Feb 2001
The Usability Gap
Most sites have inadequate usability [Forrester, Spool, Hurst]
(users can’t find what they want 39-66% of the time)
196M new Web sites in the next 5 years [Nielsen99]
A shortage of user interface professionals [Nielson99]
ASIS IA Summit, Feb 2001
The Problem
NON-professionals need to create websites
Guidelines are helpful, but Sometimes imprecise Sometimes conflict Usually not empirically founded
ASIS IA Summit, Feb 2001
Ultimate Goal: Tools to Help Non-Professional Designers
Examples: A “grammar checker” to assess guideline
conformance Imperfect Only suggestions – not dogma
Automatic comparison to highly usable pages/sites
Automatic template suggestions
ASIS IA Summit, Feb 2001
A View of Web Site Structure (Newman et al. 00)
Information design structure, categories of
information
Navigation design interaction with
information structure
Graphic design visual presentation of
information and navigation (color, typography, etc.)
Courtesy of Mark Newman
ASIS IA Summit, Feb 2001
Information Architecture includes management
and more responsibility for content
User Interface Design includes testing and
evaluation
A View of Web Site Design(Newman et al. 00)
Courtesy of Mark Newman
ASIS IA Summit, Feb 2001
The Goal
Eventually want to assess navigation structure and graphic design at the page and site level.
Farther down the line: information design and scent
Note: we are NOT suggesting we can characterize: Aesthetics Subjective preferences
ASIS IA Summit, Feb 2001
The Investigation
Can we place web design guidelines onto an empirical foundation?
Can we build models of good design by looking at existing designs?
ASIS IA Summit, Feb 2001
Example Empirical Investigation
Is it all about the content?
ASIS IA Summit, Feb 2001
Webby Awards 2000
6 criteria 27 categories
We used finance, education, community, living, health, services
100 judges International Academy of Digital Arts & Sciences 3 rounds of judging
2000 sites initially
ASIS IA Summit, Feb 2001
Webby Awards 2000 6 criteria
1. Content2. Structure & navigation3. Visual design4. Functionality5. Interactivity6. Overall experience
Scale: 1-10 (highest) Nearly normally distributed across judged sites What are Webby judgements about?
ASIS IA Summit, Feb 2001
Webby Awards 2000 The best predictor of the overall score is
the score for content The worst predictor is visual design
ASIS IA Summit, Feb 2001
So … Webbys focus on content!
ASIS IA Summit, Feb 2001
Comparing Two Categories
news
arts
ASIS IA Summit, Feb 2001
Guidelines
There are MANY usability guidelines A survey of 21 sets of web guidelines
found little overlap (Ratner et al. 96) Why?
Our hypothesis: not empirically validated So … let’s figure out what works!
ASIS IA Summit, Feb 2001
Web Page Metrics
Web metric analysis tools report on what is easy to measure Predicted download time Depth/breadth of site
We want to worry about Content User goals/tasks
We also want to compare alternative designs.
ASIS IA Summit, Feb 2001
Another Empirical Study:
Which features distinguish well-designed web pages?
ASIS IA Summit, Feb 2001
Quantitative Metrics
Identified 42 attributes from the literature
Roughly characterized: Page Composition (e.g., words, links, images) Page Formatting (e.g., fonts, lists, colors) Overall Page Characteristics
(e.g., information & layout quality, download speed)
ASIS IA Summit, Feb 2001
Metrics Used in Study
Word Count Body Text Percentage Emphasized Body
Text Percentage Text Positioning Count Text Cluster Count
Link Count Page Size Graphic Percentage Graphics Count Color Count Font Count
ASIS IA Summit, Feb 2001
Data Collection
Collected data for 1898 pages from 163 sites Attempted to collect from 3 levels within each site
Six Webby categories Health, Living, Community, Education, Finance,
Services Data constraints
At least 30 words No pages with forms Exhibit high self-containment (i.e., no style sheets,
scripts, applets, etc.)
ASIS IA Summit, Feb 2001
Method
Collect metrics from sites evaluated for Webby Awards 2000
Two comparisons Top 33% of sites vs. the rest (using the overall
Webby score) Top 33% of sites vs. bottom 33% (using the Webby
factor) Goal: see if we can use the metrics to predict
membership in top vs. other groups.
ASIS IA Summit, Feb 2001
Questions:
Can we use the metrics to predict membership in top vs. other groups?
Do we see a difference in how the metrics behave in different content categories?
ASIS IA Summit, Feb 2001
Findings
We can accurately classify web pages Linear discriminant analysis For top vs. rest
67% correct for overall 73% correct when taking categories into account
For top vs. bottom 65% correct for overall 80% correct using categories
ASIS IA Summit, Feb 2001
Why does this work?
Content is most important predictor of overall score
BUT there is some predictive power in the visual design / navigation criteria
Also, it may just be that good design is good design all over Film making analogy This happens in other domains – automatic essay
grading for one
ASIS IA Summit, Feb 2001
Deeper Analysis
Which metrics matter? All played a role
To get more insight: We noticed that small, medium, and large
pages behave differently We subdivided pages according to size and
category to find out which metrics matter and if they should have high or low values
ASIS IA Summit, Feb 2001
Small pages (66 words on average)
Good pages have slightly more content, smaller page sizes, less graphics and employ more font variations
The smaller page sizes and graphics count suggests faster download times for these pages (corroborated by a download time metric, not discussed in detail here).
Correlations between font count and body text suggest that good pages vary fonts used between header and body text.
ASIS IA Summit, Feb 2001
Medium pages (230 words on average)
Good pages emphasize less of the body text Text positioning and text cluster count indicate
medium-sized good pages appear to organize text into clusters (e.g., lists and shaded table areas).
Negative correlations between body text and color count suggests that good medium-sized pages use colors to distinguish headers.
ASIS IA Summit, Feb 2001
Large pages (827 words on average)
Good pages have less body text and more colors (suggesting pages have more headers and text links)
Good pages are larger but have fewer graphics
ASIS IA Summit, Feb 2001
Future work
Distinguish according to page role Home page vs. content vs. index …
Better metrics Separate info design, nav design, graphic
design Site level as well as page level Compare against results of live user
studies
ASIS IA Summit, Feb 2001
Future work
Category-based profiles Can use clustering to create profiles of good
and poor sites for each category These can be used to suggest alternative
designs More information: CHI 2001 paper
ASIS IA Summit, Feb 2001
Ramifications
It is remarkable that such simple metrics predict so well Perhaps good design is good overall There may be other factors
A foundation for a new methodology Empirical, bottom up
But, there is no one path to good design!
ASIS IA Summit, Feb 2001
In Summary
Automated Usability Assessment should help close the Web Usability Gap
We can empirically distinguish between highly rated web pages and other pages Empirical validation of design guidelines Can build profiles of good vs. poor sites Are validating expert judgements with usability
assessments via a user study Eventually want to build tools to help end-users
assess their designs
ASIS IA Summit, Feb 2001
More information: http://webtango.berkeley.edu http://www.sims.berkeley.edu/~hearst