Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt fox.cs.vt/talks

110
Future: Integration through the 5S Framework Sixth National Russian Research Conference Sep. 28 - Oct. 1, 2004 Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA [email protected] http://fox.cs.vt.edu/talks

description

Digital Libraries of the Future: Integration through the 5S Framework Sixth National Russian Research Conference Sep. 28 - Oct. 1, 2004. Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA [email protected] http://fox.cs.vt.edu/talks. Acknowledgements: Sponsors. Conference Organizers and Staff - PowerPoint PPT Presentation

Transcript of Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt fox.cs.vt/talks

Page 1: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Libraries of the Future: Integration through the 5S Framework

Sixth National Russian Research ConferenceSep. 28 - Oct. 1, 2004

Edward A. FoxVirginia Tech, Blacksburg, VA 24061 [email protected] http://fox.cs.vt.edu/talks

Page 2: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Acknowledgements: Sponsors Conference Organizers and Staff NSF Grants

CITIDEL: DUE-0121679 DL-in-a-box: DUE-0136690 ETANA: ITR-0325579 GetSmart: DUE-0121741 OAD: IIS-0086227

Others AOL, Capes (Brazilian funding agency) ASOR, CWRU, ETANA, Vanderbilt U. ACM, Adobe, CONACyT, DFG, IBM, Microsoft, NASA,

NDLTD, NLM, OCLC, SUN, US Dept. of Ed. (FIPSE)

Page 3: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Acknowledgements: Faculty, Staff Lillian Cassel, Debra Dudley, Roger

Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, Douglas Knight, Deborah Knox, John Impagliazzo, Gail McMillan, Manuel Perez, Naren Ramakrishnan, Layne Watson, …

Page 4: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Acknowledgements: Students Yuxin Chen, Fernando Das Neves,

Shahrooz Feizabadi, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, …

Page 5: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 6: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 7: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks
Page 8: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks
Page 9: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks
Page 10: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks
Page 11: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

As data, information, and knowledge play increasingly central roles … digital library research should focus on:

Increasing the scope and scale of information resources and services;

Employing context at the individual, community, and societal levels to improve performance;

Developing algorithms and strategies for transforming data into actionable information;

Demonstrating the integration of information spaces into everyday life; and

Improving availability, accessibility, and, thereby, productivity.

Page 12: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

An appropriate infrastructure program will provide sustainability of digital knowledge resources among five dimensions:

Acquisition of new information resources; Effective access mechanisms that span media

type, mode, and language; Facilities to leverage the utilization of

humankind’s knowledge resources; Assured stewardship over humanity’s

scholarly and cultural legacy; and Efficient and accountable management of

systems, services, and resources.

Page 13: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 14: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Integration: Rationale We can read any paper book (ignoring

limitations of language, vision, …). Scholarship requires access, analysis, and

synthesis spanning disciplines and sources. New theories, systems, and services build

upon our past accomplishments. Our “Small World” and the “Internet Age”

demand that we, and our computers, work together and interoperate.

Page 15: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Integration: Urgency, Longevity If we collect, capture, acquire, or produce

information, will it be usable in 100 years?

NSF Digital Archiving Program Library of Congress National Digital

Information Infrastructure and Preservation Program

Page 16: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Integration: Standards Standards don’t exist in many areas. Standards that do exist create a jumble:

Conversion between (without loss?) Bridging gaps (Z39.50 -> OAI) Managing legacy content and systems

Standards in DLs have focused on: Metadata (e.g., Dublin Core) Architecture (e.g., handles, repositories)

Page 17: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Integration: Challenges “Semantic Web” is vision, not reality. How can we integrate without a theory? How can we interoperate without a

common framework? How can we have a science of DLs if we

lack agreement on definitions (so we can reason and discuss) and measures of quality (so we can compare and improve)?

Page 18: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 19: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Motivation DLs are not benefiting from formal theories

as have other CS fields: DB, IR, PL, etc. DL construction: difficult, ad-hoc, lacking

support for tailoring/customization Conceptual modeling, requirements analysis,

and methodological approaches are rarely supported in DL development. Lack of specific DL models, formalisms,

languages

Page 20: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 21: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

DL Services/Activities Taxonomy

BrowsingCollaboratingCustomizingFilteringProviding accessRecommendingRequestingSearchingVisualizing

AnnotatingClassifyingClusteringEvaluatingExtractingIndexing

MeasuringPublicizing

RatingReviewing (peer)

SurveyingTranslating (language)

ConservingConverting

Copying/ReplicatingEmulatingRenewing

Translating (format)

AcquiringCataloging

Crawling (focused)DescribingDigitizingFederatingHarvestingPurchasingSubmitting

PreservationalCreational

AddValue

Repository-Building

Information SatisfactionServices

Infrastructure Services

Page 22: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 23: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

5S Model – Informally

Digital libraries are complex information systems that:• help satisfy info needs of users (societies)• provide info services (scenarios)• organize info in usable ways (structures)• present info in usable ways (spaces)• communicate info with users (streams)

Page 24: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

5S in Archaeology - Structures

Streams

Structures

Spaces

Scenarios

Societies

5S

RegionsExample: Madaba Plains

Page 25: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

5SsModels Examples Objectives

Stream Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data

Structures Collection; catalog; hypertext; document; metadata; organization tools

Specifies organizational aspects of the DL content

Spaces Measure; measurable, topological, vector, probabilistic

Defines logical and presentational views of several DL components

Scenarios Searching, browsing, recommending,

Details the behavior of DL services

Societies Service managers, learners, Teachers, etc.

Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

Page 26: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Background: The 5S ModelStreams

Scenarios

Societies

Structures

Spaces

Static /Passive

Dynamic /Active

Page 27: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

BackgroundStreams

text

audio

image

video do mss

R

C DMcIc

Se

Sc

e

SM

Ac

op

Scenarios

Societies

Top

Pr

Metric

Measurable

Measure

Structures

Spaces

Vec

ms

Page 28: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Background: 5S and DL formal definitions and compositions (April 2004 TOIS)

5S

structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)

structural metadataspecification(d.25)

descriptive metadataspecification(d.26)

repository(d. 33)

collection (d. 31)

(d.34)indexingservice

structured stream (d.29)

digitalobject (d.30)

metadata catalog (d.32)

browsingservice

(d.37)

searchingservice (d.35)

digital library(minimal) (d. 38)

services (d.22)

sequence (d. 3)

graph (d. 6)function (d. 2)

measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces

event (d.10)state (d. 18)

hypertext(d.36)

sequence (d. 3)

transmission(d.23)

relation (d. 1) language (d.5)

grammar (d. 7)

tuple (d. 4)*

Page 29: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Glossary: Concepts in the Minimal DL and Representing SymbolsConcept Symbol Digital object do Metadata specification ms Set of metadata specifications mss Collection C Catalog DMC Repository S Event e Scenario Sc Services Se Actor Ac Service Manager SM Operation op Society Soc

Page 30: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

The 5S Formal Model A digital library is a 10-tuple (Streams, Structs, Sps, Scs,

St2, Coll, Cat, Rep, Serv, Soc) in which: Streams is a set of streams, which are sequences of

arbitrary types (e.g., bits, characters, pixels, frames);

Structs is a set of structures, which are tuples, (G, ), where G= (V, E) is a directed graph and : (V E) L is a labeling function;

Sps is a set of spaces each of which can be a measurable, measure, probability, topological, metric, or vector space.

Page 31: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

The 5S Formal Model Scs = {sc1, sc2, …, scd} is a set of scenarios where each sck =

<e1k({p1k}), e2k({p2k}), …, ed_kk({pd_kk})> is a sequence of events that also can have a number of parameters {pik}. Events represent changes in computational states; parameters represent specific locations in a state and respective values.

St2 is a set of functions : V Streams ( ) that associate nodes of a structure with a pair of natural numbers (a, b) corresponding to a portion (span/segment) of a stream.

Coll = {C1, C2, …, Cf} is a set of DL collections where each DL

collection Ck = {do1k, do2k, …, dof_kk} is a set of digital objects.

Each digital object dok = (hk, Stm1k, Stt2k, k) is a tuple where

Stm1k Streams, Stt2k Structs, k St2, and hk is a handle

which represents a unique identifier for the object.

Page 32: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

The 5S Formal Model

Cat = {DMC_1, DMC_2, …, DMC_f} is a set of metadata catalogs for Coll where each metadata catalog DMC_k = {(h, msshk)}, and msshk = {mshk1, mshk2, …, mshkn_hk} is a set of descriptive metadata specifications. Each descriptive metadata specification mshki is a structure with atomic values (e.g., numbers, dates, strings) associated with nodes.

A repository Rep = {(Ci, DMC_i)} (i=1 to f) is a set of pairs (collection, metadata catalog); it is assumed there exists operations to manipulate them (e.g., get, store, delete).

Page 33: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

The 5S Formal Model Serv = {Se1, Se2, …, Ses} is a set of services where each service Sek = {sc1k,

.., scs_kk} is described by a set of related scenarios.

Soc = (C, R) where C is a set of communities and R is a set of relationships among communities. SM = {sm1, sm2, …, smj}, and Ac = {ac1, ac2, …, acr } are two such communities where the former is a set of service managers responsible for running DL services and the latter is a set of actors that use those services. Being basically an electronic entity, a member smk of SM

distinguishes itself from actors by defining or implementing a set of operations {op1k, op2k, …, opnk} smk. Each operation opik of smk is characterized by a triple (nik, sigik, impik), where nik is the operation’s name, sigik is the operation’s signature (which includes the operation’s input parameters and output), and impik is the operation’s implementation. These operations define the capabilities of a service manager smk.

Page 34: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 35: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Motivation Previous definitions emphasize syntactic aspects, i.e., how

digital library concepts are composed or built from previously defined concepts.

Complete a formal DL theory by: Making explicit the implicit relationships that exist among the DL

formal concepts defined in [Gonc04] Providing set of axiomatic rules that precisely define and constrain

the semantics of the relationships Categorizing and classifying DL services on the basis of the

ontology Research questions

How should DL services be built from the other DL components

Which are the fundamental and elementary DL services ? How can services be built/composed from other DL services?

We will explore semantic relations and rules of the DL domain by using ontologies.

Page 36: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Library Formal Ontology An ontology is a tuple = (Ontol_Concepts,

Ontol_Rels) where: Ontol_Concepts is a family of ontological concepts, Ontol_Rels is a family of relations. Relations in Ontol_Rels are operationally realized by

one or more rules (e.g., first-order logic axioms) which intentionally specify or constrain which elements of a concept can participate in a relation.

Ontol_Rules is a family of rules of a particular ontology.

Page 37: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Relationships Intra-Model

Video contains Audio (MM) Metadata Catalog describes Collection (LIS) Probabilistic Space is_a Measure Space Service extends Service (reuse) Service Manager inherits_from Service Manager (OO)

Inter-Model Event executes Operation Actor participates_in Scenario Service Manager runs Service Service employs/produces Streams Structures

Spaces

Digital Library Formal Ontology

Page 38: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Library Formal Ontology Concepts: {Se, Sc, e}; Key: Se = service; Sc = scenario; e =

event. Relations:

contains Sc e Symbolic Rule. x, y (x contains y Sc(x) e(y) j: (j x.Dom y = x(j)) )

precedes e e Sc; happens_before e e Sc Symbolic Rule 1. x, y, z (x precedesz y e(x) e(y) Sc(z) i, j: (z contains x

z contains y x = z(i) y=z(j) i + 1 = j)) Symbolic Rule 2. x, y, z (x happens_beforez y e(x) e(y) Sc(z) i, j: (z

contains x z contains y x = z(i) y=z(j) i < j)) includes Se Se Sc Sc; extends Se Se Sc Sc

Symbolic Rule 1. x, y (x includes y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p precedesy q p precedesx q))

Symbolic Rule 2. x, y (x extends y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p happens_beforey q p happens_beforex q))

Symbolic Rule 3. x, y (x extends y Se(x) Se(y) y x (x y p, q: Sc(p) Sc(q) p x q y p extends q))

Page 39: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Library Formal OntologyStreams

text

audio

image

video do mss

R

C DMc

describes

stores

is_version_of

Ic

Se

Sc

e

extendsreuses

SM

Ac

opexecutes

participates_in

recipient

runs

Scenarios

Societies

inherits_from/includes

association

uses

Top

Pr Metric

Measurable

Measure

describes

employsproduces

employsproduces

employsproduces

Structures

Spaces

Vec

belongs_to

contains

ms

is_ais_a

precedeshappens_before

is_a

redefinesinvokes

contains

contains

Page 40: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Library Formal Ontology Consistency Rules

Catalog-Collection A complete catalog has at least one set of metadata

specifications for each digital object in the collection it describes (surjective partial function).

In a consistent catalog, each set of metadata specifications describes (exactly) one digital object in the related collection (total function).

Scenarios-Society A scenario x is consistent with regards to a set of

service managers Y if each operation executed by each event in the scenario is defined in some service manager y Y.

Page 41: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Library Formal Ontology Characterizing employs/produces relationships In the table each service is characterized by

parameters (input, output) of the initial and final events of the scenarios that compose those services

All other previous definitions and keys apply here.

That set is complemented with the following definitions:

Page 42: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Services Related Definitions A query q is the representation of user interest or

information need. Hyptxt is an hypertext; wherein an anchor is a node. A log_entry is a descriptive metadata specification about

an event of a scenario. Let {doi} = {doi1, doi2,…, doin } be a set of digital

objects and Ct = {c1, c2,…,cn} be a set of labels for categories. A classifier classCt: {doi} 2Ct is a function that maps a digital object to a set of categories.

A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.

Page 43: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Service User input Other Service Input

Output

Acquiring {doi} Ci Cj

Browsing anchor Hyptxtk {doi}

Cataloging doi, msi_k (hi, mssi_m) (hi, mssi_(m+k))

Classifying doi classCt (doi, {ck_i})

Clustering {doi} X {cluk_i}

Expanding (query) {doi} IC_i, qi qj

Indexing Ci none IC_i

Linking doi Hyptxtk Hyptxtik

Logging none ei({pi}) log_entryi

Rating doi ,acj none {(doi,acj,rk)}

Searching q, Ci IC_i {dok}

Visualizing {doi} tfrk spik

Page 44: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Infrastructure Services: dealing with basic concepts such as collections and catalogs Repository-Building: create collections (digital objects)

and/or catalogs (metadata specifications). Preservational: generate instances by copying collections

(digital objects) or transforming (converting/translating) objects into different formats for preservation purposes

Add_Value: either aggregate value/information to collections (digital objects) or connect objects together.

Information Satisfaction: dealing with higher level societal requirements

KEY in next slide: Fundamental: minimal set of services or essential to existence

of a DL Composite DL service: takes input from some other service;

otherwise the service is called elementary.

Applications: A Taxonomy of DL Services

Page 45: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Searching Browsing

Ic

AcquiringUser interests/needs

query anchor

UniversalCollection

Ci

DMCi

Indexing

Society

actor

DescribingCataloguing

Linking

Hypertext

Infra-structure Services(fundamental)

Information Satisfaction Services(fundamental)

criteria sortOrder

{doi}

Submitting

Authoring

dok

mskj

Application: A Taxonomy of DL Services

Page 46: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Applications: A Taxonomy of DL Services

SearchingBrowsing

queryanchor

Society

actor

criteria sortOrder

Ck, {doi}

Recommending Filtering Binding Visualizing Expanding query

user model/expr Classifier/expr {doj}

{doR} {doF}

bi

InformationSatisfaction Services

spV query’

fundamental

Rating/Reviewing (peer)

Training

Infrastructure

Services (Add_Value)

composite

Page 47: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 48: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

5S model/ 5S languageModel Formal definition Objective within 5SL

Streams Sequences of arbitrary types Describe properties of the DL content such as encoding and language for textual material or particular forms of multimedia data

Structures Labeled directed graphs Specify organizational aspects of the DL (e.g., structural /descriptive metadata, hypertexts, taxonomies, classification schemes)

Spaces Sets of objects and operations on those objects that obey specific constraints

Define logical and presentational views of several components.

Scenarios Sequences of events that modify states of a computation in order to accomplish some functional requirement

Detail the behavior of the DL services

Societies Sets of communities and relationships (relations) among them

Define managers, responsible for running DL services; actors, that use those services; and relationships

Page 49: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

5SL primitives and implementationModel Primitives 5SL implementation

Streams Model

Text; video; audio; picture; software program MIME types

StructuresModel

Collection; catalog; hypertext; document; metadata; organization tools

XML and RDF schemas; Topic maps ML (XTM)

SpacesModel

User interface; index; retrieval model MathML, UIML, XSL

ScenariosModel

Service; event; condition; action Extended UML sequence diagrams; XML serialization

SocietiesModel

Community; service managers; actors; relationships; attributes; operations

XML serialization

Page 50: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Challenges with Approach The designer should know the 5S theory

very well and be very familiar with the syntax and semantics of 5SL to be able to write correct 5SL files.

It is difficult to get the big picture of a digital library just from a textual 5SL file.

Page 51: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Overall objective of 5SGraph:Help users model their own instances of a digital library (DL) in the 5S language (5SL).

A simple modeling process which enables rapid generation of digital libraries is needed.

Support non-expert users. Speed-up development process. Increase the quality of final product.

5SGraph: A DL Modeling Tool

Page 52: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Goals of 5SGraph To help digital library designers understand the

5S model quickly and easily To help digital library designers build their own

digital libraries without difficulty To help digital library designers transform their

models into 5SL files automatically To help digital library designers understand,

maintain, and upgrade existing digital library models conveniently

Page 53: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

5SGraphHow does 5SGraph work?

5SGraph loads and displays a metamodel in a structured toolbox.

The structured editor of 5SGraph provides a top-down visual environment for the DL designer.

5SGraph produces correct 5SL files according to the visual model built by the designer.

Page 54: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Overview of 5SGraph

Workspace

(instance model)

Structured

toolbox

(metamodel)

Page 55: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks
Page 56: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks
Page 57: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks
Page 58: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Visualization Features The structured toolbox

Visualization of the metamodel Visual components that can be added

Truncated display of trees Node-link representation Deep-node problem

Icons Type/Instance relationship

Cardinality

Page 59: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Component Reuse Components can be loaded/saved.

Load and save sub-trees Component reuse saves time and effort.

Full reuse from component pool Partial reuse: adapting components

Page 60: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Semantic Constraints• There are inherent semantic constraints in the

hierarchical structure of the 5S model.

• 5SGraph maintains the constraints and enforces these constraints over the instance model to ensure correctness.

Page 61: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

The Preliminary Test of 5SGraph Research Questions

Does the tool help users understand and use the 5S model to build their own digital libraries?

Does the tool help users efficiently describe digital library models in the 5SL language?

Are users satisfied with the tool?

Page 62: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Test Results  Task 1 Task 2 Task 3

Completion Rate (%)

100 100 100

Mean Task Time (min)

11.3 11.4 15.1

Mean Closeness to Expertise

0.483 0.752 0.712

Mean Goal Achievement (%)

97.4 97.4 98.2

Page 63: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Satisfaction and Usefulness The average rating of user satisfaction is 91%.

The average rating of usefulness of the tool is 92%.

Statistical analysis shows that the mean value of post-understanding of the 5S model is significantly greater than that of pre-understanding.

Page 64: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

5SLGen: Automatic DL Generation

5S Meta

Model5SLGraph

DL Expert

DL Designer

5SL DL

Model

5SLGen

Practitioner

Researcher

TailoredDL

Services

Teacher

componentpool

ODLSearch,ODLBrowse,ODLRate,ODLReview,

…….

Requirements (1) Analysis (2)

Implementation (4)

Design (3)

Page 65: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

The XML Log Standard for DLs1. Marcos André Gonçalves, Ganesh Panchanathan, Unnikrishnan

Ravindranathan, Aaron Krowne, Edward A. Fox, Filip Jagodzinski, and Lillian Cassel. The XML Log Standard for Digital Libraries: Analysis, Evolution, and Deployment. Proc. JCDL'2003, Third Joint ACM / IEEE-CS Joint Conference on Digital Libraries, May 27-31, 2003, Houston, 312 - 314

2. Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox. An XML Log Standard and Tool for Digital Library Logging Analysis. In "Research and Advanced Technology for Digital Libraries, 6th European Conference, ECDL 2002, Rome, Italy, September 16-18, 2002, Proceedings", eds. Maristella Agosti and Constantino Thanos, LNCS 2458, Springer, pp. 129-143.

Page 66: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

The XML Log Standard for DLsDL Concept Dimensions of Quality Log can be used to

measure? Digital object Accessibility

Pertinence Preservability Relevance Similarity Significance Timeliness

No Yes No Yes No No No

Metadata specification Accuracy Completeness Conformance

No No No

Collection Completeness Impact Factor

No No

Catalog Completeness Consistency

No No

Repository Completeness Consistency

No No

Services Composability Efficiency Effectiveness Extensibility Reusability Reliability

No Yes Yes No No Yes

Page 67: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 68: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Defining Quality in Digital Libraries What’s a “good” digital Library?

Central Concept: Quality! Hypotheses of this work: Formal theory can help to define “what’s a good

digital library” by:Proposing and formalizing new quality measures

for DLsFormalizing traditional measures within our 5S

frameworkContextualizing these measures within the

Information Life Cycle

Page 69: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Information Life Cycle

Page 70: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

AuthoringModifying

OrganizingIndexing

Storing

Archiving

NetworkingAccessing

Filtering

Creation

DistributionUtilization

Significance

Similarity

Pertinence

AccuracyCompletenessConformance

Seeking

SearchingBrowsingRecommending

Relevance

Timeliness

Accessibility

Accessibility

Believability

Inactive

Active

Discard

RetentionMining

Semi-Active

Preservability

Timeliness

Preservability

Describing

Quality and the Information Life Cycle

Page 71: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Defining Quality in Digital librariesDL Concept Dimensions of Quality Digital object Accessibility

Pertinence Preservability Relevance Similarity Significance Timeliness

Metadata specification Accuracy Completeness Conformance

Collection Completeness Impact Factor

Catalog Completeness Consistency

Repository Completeness Consistency

Services Composability Efficiency Effectiveness Extensibility Reusability Reliability

Page 72: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Defining Quality in Digital Libraries

Structure of Presentation For each quality metric

Discussion about the metric Meaning, use, etc..

Definition of numerical measure Example of Use

Page 73: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Accessibility A digital object is accessible by an DL

actor or patron, if it exists in the collections of the DL, the repository is able to retrieve the object, and:

1) an overly restrictive rights management property of a metadata specification does not exist for that object; or

2) if it exists, the property does not restrict access to the particular society to which the actor belongs or to that actor in particular.

Page 74: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Accessibility Accessibility acc(dox, acy) of digital object dox

to actor acy is: 0, if there is no collection C in the DL so that dox C;

otherwise acc(dox, acy) = z struct_streams(dox) rz(ac_y))/|struct_streams(dox)|, where: rz(acy) is a rights management rule defined as an indicator

function: 1, if

z has no access constraints; or z has access constraints and acy cmz, where cmz Soc(1) is

a community that has the right to access z; and 0, otherwise

Page 75: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Accessibility VT ETD Collection

Fi rst Letter of Author’ s Name

Unrestri cted Restri cted Mi xed Degree of accessi bi l i ty for users not on the VT communi ty

A 164 50 5 mi x(0. 5, 0. 5, 0. 167, 0. 1875, 0. 6)

B 286 102 3 mi x(0. 5, 0. 5, 0. 13)

C 231 108 7 mi x (0. 11, 0. 5, 0. 5, 0. 5, 0. 33, 0. 09, 0. 33)

D 159 54 2 mi x(0. 875, 0. 666)

E 67 26 1 mi x(0. 5)

F 88 39 2 mi x(0. 375, 0. 09)

G 166 72 2 mi x(0. 666, 0. 5)

H 225 91 3 mi x(0. 66, 0. 5, 0. 235)

I 20 8 1 mi x(0. 5)

J 84 36 2 mi x(0. 5, 0. 6)

K 166 69 2 mi x(0. 5, 0. 5)

L 189 68 6 mi x(0. 153, 0. 33, 0. 5, 0. 5, 0. 94)

M 299 115 9 mi x(0. 5, 0. 5, 0. 5, 0. 041, 0. 5, 0. 5, 0. 5, 0. 117, 0. 5)

Page 76: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: AccessibilityN 74 16 1 mix(0.8)

O 45 19 2 mix(0.5, 0.125)

P 172 71 3 mix(0, 0, 0.33)

Q 13 6 0 mix = none

R 158 71 3 mix(0.66, 0.5, 0.5)

S 398 159 8 mix(0.66, 0.5, 0.5, 0.6, 0.33, 0.66, 0.33, 0.6)

T 111 49 1 mix(0.13)

U 9 7 0 mix = none

V 63 20 0 mix = none

W 191 81 5 mix (0.5, 0.22, 0.38, 0.875, 0.5)

X 11 5 0 mix = none

Y 38 9 3 mix(0.5, 0.5, 0.125)

Z 47 17 2 mix(0.5, 0.5)

All 3474 1368 73

Page 77: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Pertinence

Let Inf(doi) represent the ``information'' (not physical) carried by a digital object or any of its (metadata) descriptions, IN(acj) be the information need of an actor and Contextjk be an amalgam of societal factors which can impact the judgment of pertinence by acj at time k. These include among others, time, place, the

actor's history of interaction, task in hand, and a range of other factors that are not given explicitly but are implicit in the interaction and ambient environment.

Page 78: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Pertinence

Let's define two sub-communities of actors, users and external-judges Ac, as: users: set of actors with an information need who use

DL services to try to fulfill/satisfy that need external-judges: set of actors responsible for determining

the relevance of a document to a query.

Let's also constrain that a member of external-judges can not judge the relevance of a document to a query representing her own information need, i.e., at the same point in time users external-judges = .

Page 79: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Pertinence The pertinence of a digital object to a user acj

is an indicator function Pertinence(doi, acj): Inf(doi) IN(acj) Contextjk defined as: 1, if Inf(doi) is judged by acj to be informative

with regards to IN(aci) in context Contextjk;

0, otherwise

Page 80: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Preservability

Fidelity Obsolescence

Depends on Depends on

Process Source format

Target format

Cost

Software Hardware Evaluation StorageIdentification Training …

Digital Objects: Preservability Factors in Preservability

Page 81: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Preservability Preservability(doi, dl) =

(fidelity of migrating(doi,formatx, formaty),

obsolescence(doi, dl)).

fidelity(doi, formatx, formaty) = 1/ distortion(p(formatx, formaty))

obsolescence(doi, dl) = cost of converting/migrating

object within the context of the specific dl

Page 82: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Relevance

Relevance (doi,q) 1, if doi is judge by external-judge to be relevant to q0, otherwise

Relevance Estimate Rel(doi,q) = doi

dj / |doi| |q|

Objective, public, social notion Established by a general consensus in the field, not

subjective, private judgment by an actor with an information need

Page 83: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Similarity

reflect the relatedness between two or more digital objects

Used in many services (e.g., classification, find similar, etc)

Page 84: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Similarity Metrics

Content-based Cosine(di, dj)

doi dj

/ |doi| |doj

|

Bag-of-words(di,dj) |W(di) W(dj)| / |W(di)|

Okapi(di,dj) (see draft)

Page 85: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Similarity Metrics

Citation-based Co-citation

cocit(di,dj) = |Pdi Pdj| /max P

Bibliographic coupling bibcoup(di, dj) = |Cdi Cdj|/ max Cd

Amsler Amsler(di, dj) =|(Pdi Cdi) (Pdj Cdj)| / max P

Cd

Page 86: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: SimilarityHighest degree of cocitation Publication Year A unified lattice model for static analysis of programs by construction or approximation of fixpoints

4th ACM SIGACT-SIGPLAN 1977

Active messages: a mechanism for integrated communication and computation

19th annual int. symposium on Computer architecture

1992

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

17th annual international symposium on Computer Architecture

1990

Computer programming as an art CACM 1974

The SPLASH-2 programs: characterization and methodological considerations

22nd annual international symposium on Computer architecture

1995

ATOM: a system for building customized program analysis tools

ACM SIGPLAN '94 1994

Analysis of pointers and structures Proceedings of the conference on Programming language design and implementation

1990

Revised report on the algorithmic language scheme | ACM SIGPLAN Notices (Issue) 1986

The directory-based cache coherence protocol for the DASH multiprocessor

17th annual international symposium on Computer Architecture

1990

Page 87: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: SimilarityHighest degree of bibliographic coupling publication date

Query evaluation techniques for large databases CSUR 1993

Compiler transformations for high-performance computing CSUR 1994

On randomization in sequential and distributed algorithms CSUR 1994 External memory algorithms and data structures: dealing with massive data CSUR 2001

A schema for interprocedural modification side-effect analysis with pointer aliasing TOPLAS 2001

Complexity and expressive power of logic programming CSUR 2001

Computational geometry: a retrospective ACM symposium on Theory of computing 1994

Research directions in object-oriented database systems ACM SIGACT-SIGMOD-SIGART symposium

Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons CSUR 1993

Page 88: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Similarity Distributions

Figure 3(a) Figure 3(b)

Page 89: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Similarity Application: Automatic classification with

kNN

Evidence Macro F1 (30%) Abstract_BagOfWords 0.195 Abstract_Cosine 0.343 Abstract_Okapi 0.339 Bib_Coup 0.347 Amsler 0.412 Co-citation 0.273 Title_BagOfWords 0.492 Title_Cosine 0.525 Title_Okapi 0.525

Page 90: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Object: Timeliness

(current time or time of last freshening) – time of the latest citation, if object is ever cited

age = (current time or time of last freshening) – (creation time or publication time) , if object is never cited.

 Time of last freshening =time of the creation/publication of most recent object in the collection to which doi belongs

Page 91: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Digital Objects: Timeliness ACM Digital Library

0

1000

2000

3000

4000

5000

6000

7000

8000

Timeliness 0 1 2 3 4 5 6 7 8 9 10

No. of Documents 5165 7264 5162 4209 2716 2120 1698 1554 1372 1357 1019

1 2 3 4 5 6 7 8 9 10 11

Page 92: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Metadata Specifications and Metadata Format: Completeness

Refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not.

 Metric Completeness(msx) = 1 - (no. of missing attributes in

msx/ total attributes of the schema to which msx

conforms)

Page 93: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Metadata Specifications and Metadata Format: Completeness OCLC NDLTD Union catalog

00. 10. 20. 30. 40. 50. 60. 70. 80. 9

1

GWUD LSU

VTET

D

MIT

UBC

PHYS

NET

VTIN

DIV

VAND

ERBI

LT

NCSU

USAS

K

PITT HKU

HUMB

OLT

OCLC

BGMY

U

DRES

DEN

VIEN

NA

GATE

CH

ETSU USF

MUEN

CHEN

UTEN

N

CCSD

WATE

RLOO

NSYS

U

LAVA

L

UPSA

LLA

CALT

ECH

UCL

WagU

niv

Page 94: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Metadata Specifications and Metadata Format: Conformance An attribute attxy of a metadata specification

msx is conformant to a metadata format/standard if: 2) it appears at least once, if attxy is marked as

mandatory, and; 1) its value is from the domain defined for attxy; 3) it does not appear more than once, if it is not

marked as repeatable. Metric

Conformance(msx) = ((attribute attxy of msx) degree of conformance of attxy)/ total attributes).

Page 95: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Metadata Specifications and Metadata Format: Conformance Based on ETD-MS

0. 75

0. 8

0. 85

0. 9

0. 95

1

GW

UD

LSU

VTET

D

MIT

UBC

PHYS

NET

VTIN

DIV

VAN

DER

BILT

NC

SU

USA

SK

PITT HKU

HU

MBO

LT

OC

LC

BGM

YU

DR

ESD

EN

VIEN

NA

GAT

ECH

ETSU

USF

MU

ENC

HEN

UTE

NN

CC

SD

WAT

ERLO

O

NSY

SU

LAVA

L

UPS

ALLA

CAL

TEC

H

UC

L

Wag

Uni

v

Page 96: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Collection, Metadata Catalog, and Repository: Collection Completeness

A complete DL collection or is one which contains all the pertinent existing digital objects.

Metric completeness(Cx) = |Cx| /|’ideal collection’|

Page 97: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Collection, Metadata Catalog, and Repository: Collection Completeness

Collection Degree of Completeness

ACM Guide 1

DBLP 0.652 CITIDEL(DBLP + ACM + NCSTRL + NDLTD-CS) 0.467

IEEE-DL 0.168

ACM-DL 0.146

ACM Guide

Journal (articles) 256527

Proceeding (papers) 299850

Book(chapters) 107870

Thesis 46098

Tech. Reports 25081

Bibliography 2

Play 1

735429

Page 98: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Catalog Completeness/Consistency Completeness(DMC)=

1 – (no. of do’s without a metadata specification/size of the described collection)

Consistency(DMC)=

0, if there is at least one set of metadata specifications assigned to more than one digital object

1, otherwise

Page 99: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Repository Completeness and Consistency

Completeness (Rep) = Number of collections in the repository/ideal number of

collections

Consistency(Rep) = 1, if the consistency of all the repositories’ catalogs with

respect to their described collection is 1

0, otherwise

Page 100: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Services: Efficiency/ Effectiveness Effectiveness

Very common measures: Precision, Recall, F1, 10-precision, R-Precision

Other services may have different measures: e.g., Recommending, etc

Efficiency: Let t(e) be the time of an event e, eix and efx be the first

and the last event of service sex . The efficiency of service sex is defined as: Efficiency(sex) = t(efx) - t(eix)

Page 101: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Services: Extensibility and Reusability A service Y reuses a service X if the behavior of Y

incorporates the behavior of X A service Y extends a service X if it subsumes the

behavior of X and potentially includes additional

subflows of events. Metrics

Macro-Reusability(Serv) = ( reused(sei), sei Serv)/ |Serv|, where

reused is a indicator function defined as : 1, if smj, sej reuses si; 0,

otherwise. Micro-Reusability(Serv) = ( LOC(smx) * reused(sei), smx SM, sei

Serv, sex runs sei )/ |LOC(sm), sm SM|, where LOC

corresponds to the number of lines of code of a service manager

Page 102: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Services: Extensibility and Reusability

Service Component

Based

LOC for implementing

service

LOC reused from

component

Total LOC

Searching – Back-end Yes - 1650 1650

Search Wrapping No 100 - 100

Recommending Yes - 700 700

Recommend Wrapping No 200 - 200

Annotating – Back-end Yes 50 600 600

Annotate Wrapping No 50 - 50

Union Catalog Yes - 680 680

User Interface Service No 1800 - 1600

Browsing No 1390 - 1390

Comparing (objects) No 650 - 650

Marking Items No 550 - 550

Items of Interest No 480 - 480

Recent Searches/Discussions

No 230 - 230

Collections Description No 250 - 250

User Management No 600 - 600

Framework Code No 2000 - 2000

Total 8280 3630 11910

Macro-Reusability = 3/16 = 0.187Micro-Reusability = 3630 / 11910 = 0.304

Page 103: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Services: Reliability Def: 1 – no. of failures/no. of accesses Failure is an event that

was supposed to happen in a scenario but did not;

did happen, but did not execute some of its operations

did happen, where the operations were executed, but the results were not the expected ones.

Page 104: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Services: Reliability CITIDEL

CITIDEL servi ce No. of fai l ures/no. of accesses Rel i abi l i ty

searchi ng 73/ 14370 0. 994

browsi ng 4130/ 153369 0. 973

requesti ng (getobj ect) 1569/ 318036 0. 995

structured search 214/752 0. 66

contri buti ng 0/ 980 1

Page 105: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Quality in DLs: Conclusions and Future Work Development of more “usage”-oriented measures

Current measures very much “system-oriented” Development of Quality ToolKit (5SQual) for DL

managers with following features: Mapping tool to map local log format to standard

XML Log format Components to implement all measures Visualization Broken into several logical pieces to be used in the

different phases of the information cycle

Page 106: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Outline Vision of the Future: Chatham Report Integration 5S Framework

DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),

Design (5SGraph), Generation (5SGen), Logging Quality DLs

Page 107: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

ETANA-DL

• Apply 5S to the archaeological domain• Identified requirements for future versions of

system• Extensible and componentized approach for

handling heterogeneous archaeological data from disparate sources

• Rapidly generated prototype archaeological DL• Making primary archaeological data available

without significant delay

Page 108: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Modeling ETANA-DL – An Archaeological DL Meta-model

Text Video Audio

*Site *Sub-partition *Container *Artifact*LocusRegion

Taxonomies

Temporal Artifact-specific

Space model

Structuremodel

Metadata

Drawing Photo 3DStreammodel

*Partition

Society model

Archaeologist

General public

Geographic space

Service Manager

Information Satisfaction

Value added

Repository buildingScenario

model Services

Domain specific

User interface Metric space

Spatial

Page 109: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Modeling ETANA-DL – The ETANA-DL model

*Field *Pail *Bone*LocusJordan

Taxonomies

Space model

Structuremodel Field record,

locus sheet

Figurine image (photo)

Streammodel

Umayri

Society model

Archaeologist

Generic public

Site-specific coordinate system

Web interface Vector space

ETANA-DLService Manager

Searching, Browsing

Annotation, binding

Harvesting, Converting Scenariomodel Services

Object comparison, marking item for analysis

Archaeologicalperiods

Bone type

Seed species

*Square

*Figurine

*Quadrant *Bag*LocusJordan Valley Nimrin *Square

*Field *Basket*LocusSouthern Israel Halif *Area*Seed

Site/field plan(drawing)

Preliminary/FinalReport (application/pdf)

Spatial

Page 110: Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt   fox.cs.vt/talks

Conclusions Vision Integration 5S DL Curricula, Courses, Books Questions?