Technical Developments Related to Quality Issues

13
1 Technical Developments Related to Quality Issues Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY [email protected] http:/ www.ukoln.ac.uk/ UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based. Contents •Application- based Developments •Protocol Developments •Conclusions

description

Technical Developments Related to Quality Issues. Contents Application-based Developments Protocol Developments Conclusions. Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY [email protected] http:/www.ukoln.ac.uk/. - PowerPoint PPT Presentation

Transcript of Technical Developments Related to Quality Issues

Page 1: Technical Developments  Related to Quality  Issues

1

Technical Developments Related to Quality Issues

Brian Kelly

UK Web Focus

UKOLNUniversity of Bath

Bath, BA2 [email protected]

http:/www.ukoln.ac.uk/UKOLN is funded by the British Library Research and Innovation Centre, the Joint

Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.

UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.

Contents• Application-based

Developments• Protocol

Developments• Conclusions

Contents• Application-based

Developments• Protocol

Developments• Conclusions

Page 2: Technical Developments  Related to Quality  Issues

2

Application-Based Solutions

Sophisticated search engines are being developed:

Google• Large-scale search engine for the research

community (now commercial)

Clever• IBM research project

Direct Hit!• Records how users make use of search engines

Alexa• Allows end users to vote on resources

Page 3: Technical Developments  Related to Quality  Issues

3

GoogleGoogle uses a "PageRank" technique - important resources are pointed to from many sites and important sites (e.g. Yahoo).

See <URL: http://www.google.com/>

Search for Digital LibrariesSearch for Digital Libraries Following the link to the first hit

Following the link to the first hit

Page 4: Technical Developments  Related to Quality  Issues

4

Clever

Aims to find small set of documents the most authoritative information on the requested subject.• Uses a standard search engine to gather a "root set" of

pages matching the query. Next, adds all pages pointing to or pointed to by the root set. Thereafter, it uses only the links between these pages to distill the best authorities and hubs.

See <URL: http://www.almaden.ibm.com/cs/k53/clever.html>)

See <URL: http://www.almaden.ibm.com/cs/k53/clever.html>)

AltaVista results include sites selling medical services.

AltaVista results include sites selling medical services.

Distinct pages found using Clever

Distinct pages found using Clever

Clever finds the key Baseball sites.

Clever finds the key Baseball sites.

Page 5: Technical Developments  Related to Quality  Issues

5

Direct Hit

Direct Hit:• Integrated with

search engines such as Yahoo

• Ranks results based on clicking profile from other users of the search service

http://www.directhit.com/

Users searching for Dublin Core typically click on links related to metadata. Therefore put these at the top of the search results.

Users searching for Dublin Core typically click on links related to metadata. Therefore put these at the top of the search results.

Page 6: Technical Developments  Related to Quality  Issues

6

Alexa

Alexa:• Enables end users to

"rate" site when surfing• Includes access to

related links• Based on central archive

of the web (see <URL: http://www.archive.org/>

See also Netscape's What's Related facility

http://www.alexa.com/

Possibilities:• Signed votes• Use Alexa model with UK database of

resources

Possibilities:• Signed votes• Use Alexa model with UK database of

resources

Page 7: Technical Developments  Related to Quality  Issues

7

Summary

Good News• New generation of experimental search

engines are being developed• Algorithms include:

– Making use of link information

– Making use of end users input

– Collaborative bookmarks (cf FireFly - You like "Sex" and "Drugs". So does he, and he also likes "Rock'n'Roll")

But such techniques make use of "brute strength" approach

Is there a more elegant solution?

Page 8: Technical Developments  Related to Quality  Issues

8

We Need Metadata!Web originally based on 3 architectural components.

Metadata is the missing component.

Metadata / RDF

PICS, IPR,

MCF, DSig,

DC,...

AddressingURL

Data formatHTML

TransportHTTP

The W3C is developing a machine-understandable metadata framework which can automate a variety of tasks (resource discovery, content filtering, etc.)

The W3C is developing a machine-understandable metadata framework which can automate a variety of tasks (resource discovery, content filtering, etc.)

Page 9: Technical Developments  Related to Quality  Issues

9

RDF

RDF (Resource Description Framework):• Provides a metadata framework ("machine

understandable metadata for the web")• Based on ideas from content rating (PICS),

resource discovery (Dublin Core), etc.• Based on a formal data model (direct label graphs) • Applications include:

– cataloging resources – resource discovery– intellectual property rights – content rating– digital signatures– privacy Resource ValuePropertyType

PropertyRDF Data Model

Page 10: Technical Developments  Related to Quality  Issues

10

Certificates

Certificates can be provided for:• Services • Users• Code (Java, ActiveX)

Certificate Authorities (CAs) can distribute certificates:• Global CAs (Verisign, Thawte)• National CAs (Post Office, central University

body, British Library, etc)

Government legislation this session related to digital signatures

Page 11: Technical Developments  Related to Quality  Issues

11

Certificates Within An OrganisationDigital signatures will enable publishers (e.g. Universities) to give an authoritative stamps to digital resources

PhDThesis

MSc

UniversityResearch

OfficePressOffice

Prospectus

Within the University, the Research Office and PR Office can allocate legally-binding signatures to authorised publications

Within the University, the Research Office and PR Office can allocate legally-binding signatures to authorised publications

Ad

mis

sion

s

Staff and students can be given a certificate which is used for authentication

Staff and students can be given a certificate which is used for authentication

The CVCP could give certificates to Universities, who would then be authorised to distribute certificates within the university

Page 12: Technical Developments  Related to Quality  Issues

12

Developments for Gateways

Quality information gateways:• Can make use of signed resources to help cataloguing

• Can provide input to sophisticated search engines (similar to Google)

InformationGateway

SignedPhD

Thesis

Quality Resources

Advanced search engine

A central organisation could give certificates to approved information gateways

Signed gateway: this gateway follows

xx quality conventions

Signed Gateway

Unsigned Gateway

Page 13: Technical Developments  Related to Quality  Issues

13

Conclusions

Automated Indexing• AltaVista approach Comprehensive Junk indexed Too may hits

Automated Indexing• AltaVista approach Comprehensive Junk indexed Too may hits

Manual Indexing• Subject Gateway approach Quality Value-added services Incomplete Expensive

Manual Indexing• Subject Gateway approach Quality Value-added services Incomplete Expensive

A Third Way• Combination of automated and manual approaches

• Involvement from SBIG, author and end user Exciting possibilities Uncertainty of timescales and success Coordination required - political issues (ownership of

metadata, selling ads, etc.)

A Third Way• Combination of automated and manual approaches

• Involvement from SBIG, author and end user Exciting possibilities Uncertainty of timescales and success Coordination required - political issues (ownership of

metadata, selling ads, etc.)