Technical Developments Related to Quality Issues
-
Upload
arthur-england -
Category
Documents
-
view
18 -
download
0
description
Transcript of Technical Developments Related to Quality Issues
1
Technical Developments Related to Quality Issues
Brian Kelly
UK Web Focus
UKOLNUniversity of Bath
Bath, BA2 [email protected]
http:/www.ukoln.ac.uk/UKOLN is funded by the British Library Research and Innovation Centre, the Joint
Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.
UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.
Contents• Application-based
Developments• Protocol
Developments• Conclusions
Contents• Application-based
Developments• Protocol
Developments• Conclusions
2
Application-Based Solutions
Sophisticated search engines are being developed:
Google• Large-scale search engine for the research
community (now commercial)
Clever• IBM research project
Direct Hit!• Records how users make use of search engines
Alexa• Allows end users to vote on resources
3
GoogleGoogle uses a "PageRank" technique - important resources are pointed to from many sites and important sites (e.g. Yahoo).
See <URL: http://www.google.com/>
Search for Digital LibrariesSearch for Digital Libraries Following the link to the first hit
Following the link to the first hit
4
Clever
Aims to find small set of documents the most authoritative information on the requested subject.• Uses a standard search engine to gather a "root set" of
pages matching the query. Next, adds all pages pointing to or pointed to by the root set. Thereafter, it uses only the links between these pages to distill the best authorities and hubs.
See <URL: http://www.almaden.ibm.com/cs/k53/clever.html>)
See <URL: http://www.almaden.ibm.com/cs/k53/clever.html>)
AltaVista results include sites selling medical services.
AltaVista results include sites selling medical services.
Distinct pages found using Clever
Distinct pages found using Clever
Clever finds the key Baseball sites.
Clever finds the key Baseball sites.
5
Direct Hit
Direct Hit:• Integrated with
search engines such as Yahoo
• Ranks results based on clicking profile from other users of the search service
http://www.directhit.com/
Users searching for Dublin Core typically click on links related to metadata. Therefore put these at the top of the search results.
Users searching for Dublin Core typically click on links related to metadata. Therefore put these at the top of the search results.
6
Alexa
Alexa:• Enables end users to
"rate" site when surfing• Includes access to
related links• Based on central archive
of the web (see <URL: http://www.archive.org/>
See also Netscape's What's Related facility
http://www.alexa.com/
Possibilities:• Signed votes• Use Alexa model with UK database of
resources
Possibilities:• Signed votes• Use Alexa model with UK database of
resources
7
Summary
Good News• New generation of experimental search
engines are being developed• Algorithms include:
– Making use of link information
– Making use of end users input
– Collaborative bookmarks (cf FireFly - You like "Sex" and "Drugs". So does he, and he also likes "Rock'n'Roll")
But such techniques make use of "brute strength" approach
Is there a more elegant solution?
8
We Need Metadata!Web originally based on 3 architectural components.
Metadata is the missing component.
Metadata / RDF
PICS, IPR,
MCF, DSig,
DC,...
AddressingURL
Data formatHTML
TransportHTTP
The W3C is developing a machine-understandable metadata framework which can automate a variety of tasks (resource discovery, content filtering, etc.)
The W3C is developing a machine-understandable metadata framework which can automate a variety of tasks (resource discovery, content filtering, etc.)
9
RDF
RDF (Resource Description Framework):• Provides a metadata framework ("machine
understandable metadata for the web")• Based on ideas from content rating (PICS),
resource discovery (Dublin Core), etc.• Based on a formal data model (direct label graphs) • Applications include:
– cataloging resources – resource discovery– intellectual property rights – content rating– digital signatures– privacy Resource ValuePropertyType
PropertyRDF Data Model
10
Certificates
Certificates can be provided for:• Services • Users• Code (Java, ActiveX)
Certificate Authorities (CAs) can distribute certificates:• Global CAs (Verisign, Thawte)• National CAs (Post Office, central University
body, British Library, etc)
Government legislation this session related to digital signatures
11
Certificates Within An OrganisationDigital signatures will enable publishers (e.g. Universities) to give an authoritative stamps to digital resources
PhDThesis
MSc
UniversityResearch
OfficePressOffice
Prospectus
Within the University, the Research Office and PR Office can allocate legally-binding signatures to authorised publications
Within the University, the Research Office and PR Office can allocate legally-binding signatures to authorised publications
Ad
mis
sion
s
Staff and students can be given a certificate which is used for authentication
Staff and students can be given a certificate which is used for authentication
The CVCP could give certificates to Universities, who would then be authorised to distribute certificates within the university
12
Developments for Gateways
Quality information gateways:• Can make use of signed resources to help cataloguing
• Can provide input to sophisticated search engines (similar to Google)
InformationGateway
SignedPhD
Thesis
Quality Resources
Advanced search engine
A central organisation could give certificates to approved information gateways
Signed gateway: this gateway follows
xx quality conventions
Signed Gateway
Unsigned Gateway
13
Conclusions
Automated Indexing• AltaVista approach Comprehensive Junk indexed Too may hits
Automated Indexing• AltaVista approach Comprehensive Junk indexed Too may hits
Manual Indexing• Subject Gateway approach Quality Value-added services Incomplete Expensive
Manual Indexing• Subject Gateway approach Quality Value-added services Incomplete Expensive
A Third Way• Combination of automated and manual approaches
• Involvement from SBIG, author and end user Exciting possibilities Uncertainty of timescales and success Coordination required - political issues (ownership of
metadata, selling ads, etc.)
A Third Way• Combination of automated and manual approaches
• Involvement from SBIG, author and end user Exciting possibilities Uncertainty of timescales and success Coordination required - political issues (ownership of
metadata, selling ads, etc.)