Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

74
11/21/2000 Database Management -- Spring 1998 -- R. Larson Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management

description

Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library. University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management. Today. Object Relational Database Applications The Berkeley Digital Library Project - PowerPoint PPT Presentation

Transcript of Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

Page 1: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Object-Relational DatabaseApplications -- The UC Berkeley

Environmental Digital LibraryUniversity of California, Berkeley

School of Information Management and Systems

SIMS 257: Database Management

Page 2: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Today

• Object Relational Database Applications– The Berkeley Digital Library Project

• Slides from RRL and Robert Wilensky, EECS

– Use of DBMS in DL project.

Page 3: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Final Presentations and Reports

• Specifications for final report are on the Web Site under assignments

• Presentations (1 on Nov. 28, Others on Nov 30, Dec 5th and 7th (Full))

Page 4: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Today

• Object Relational Applications

• The UCB Digital Library

Page 5: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Overview• What is an Digital Library?

• Overview of Ongoing Research on Information Access in Digital Libraries

Page 6: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Digital Libraries Are Like Traditional Libraries...

• Involve large repositories of information (storage, preservation, and access)

• Provide information organization and retrieval facilities (categorization, indexing)

• Provide access for communities of users (communities may be as large as the general public or small as the employees of a particular organization)

Page 7: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Originators

Libraries

Users

Traditional Library System

Page 8: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

But Digital Libraries Are Different From Libraries...

• Not a physical location with local copies; objects held closer to originators

• Decoupling of storage, organization, access

• Enhanced Authoring (origination, annotation, support for work groups)

• Subscription, pay-per-view supported in addition to “free” browsing.

• Integration into user tasks.

Page 9: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Originators

Repositories

Users

A Digital Library Infrastructure Model

Index Services

Network

Page 10: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

UC Berkeley Digital Library Project

• Focus: Work-centered digital information services

• Testbed: Digital Library for the California Environment

• Research: Technical agenda supporting user-oriented access to large distributed collections of diverse data types.

• Part of the NSF/NASA/DARPA Digital Library Initiative (Phases 1 and 2)

Page 11: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

UCB Digital Library Project: Research Organizations

• UC Berkeley EECS, SIMS, CED, IS&T• UCOP• Xerox PARC’s Document Image Decoding group and Work

Practices group• Hewlett-Packard• NEC • SUN Microsystems• IBM Almaden• Microsoft• Ricoh California Research• Philips Research

Page 12: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

• Collection: Diverse material relevant to California’s key habitats.

• Users: A consortium of state agencies, development corporations, private corporations, regional government alliances, educational institutions, and libraries.

• Potential: Impact on state-wide environmental system (CERES )

Testbed: An Environmental Digital Library

Page 13: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

The Environmental Library -Users/Contributors

• California Resources Agency, California Environment Resources Evaluation System (CERES)

• California Department of Water Resources• The California Department of Fish & Game• SANDAG• UC Water Resources Center Archives• New Partners: CDL and SDSC

Page 14: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

The Environmental Library - Contents

• Environmental technical reports, bulletins, etc.• County general plans• Aerial and ground photography• USGS topographic maps• Land use and other special purpose maps• Sensor data• “Derived” information• Collection data bases for the classification and distribution

of the California biota (e.g., SMASCH)• Supporting 3-D, economic, traffic, etc. models• Videos collected by the California Resources Agency

Page 15: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

The Environmental Library - Contents

• As of late 2000, the collection represents about one terabyte of data, including over 165,000 digital images, about 300,000 pages of environmental documents, and nearly 2 million records in geographical and botanical databases.

Page 16: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Botanical Data: The CalFlora Database contains taxonomical

and distribution information for more than 8000 native California plants. The Occurrence Database includes over 600,000 records of California plant sightings from many federal, state, and private sources. The botanical databases are linked to our CalPhotos collection of Calfornia plants, and are also linked to external collections of data, maps, and photos.

Page 17: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Geographical Data:

Much of the geographical data in our collection is being used to develop our web-based GIS Viewer. The Street Finder uses 500,000 Tiger records of S.F. Bay Area streets along with the 70,000-records from the USGS GNIS database. California Dams is a database of information about the 1395 dams under state jurisdiction. An additional 11 GB of geographical data represents maps and imagery that have been processed for inclusion as layers in our GIS Viewer. This includes Digital Ortho Quads and DRG maps for the S.F. Bay Area.

Page 18: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Documents:

Most of the 300,000 pages of digital documents are environmental reports and plans that were provided by California state agencies. This collection includes documents, maps, articles, and reports on the California environment including Environmental Impact Reports (EIRs), educational pamphlets, water usage bulletins, and county plans. Documents in this collection come from the California Department of Water Resources (DWR), California Department of Fish and Game (DFG), San Diego Association of Governments (SANDAG), and many other agencies. Among the most frequently accessed documents are County General Plans for every California county and a survey of 125 Sacramento Delta fish species.

Page 19: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Documents - cont.

The collection also includes about 20Mb of full-text (HTML) documents from the World Conservation Digital Library. In addition to providing online access to important environmental documents, the document collection is the testbed for our Multivalent Document research.

Page 20: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Testbed Success Stories• LUPIN: CERES’ Land Use Planning Information Network

– California Country General Plans and other environmental documents.

– Enter at Resources Agency Server, documents stored at and retrieved from UCB DLIB server.

• California flood relief efforts– High demand for some data sets only available on our server

(created by document recognition).

• CalFlora: Creation and interoperation of repositories pertaining to plant biology.

• Cloning of services at Cal State Library, FBI

Page 21: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Research Highlights

• Documents– Multivalent Document prototype

• Page images, structured documents, GIS data, photographs

• Intelligent Access to Content– Document recognition

– Vision-based Image Retrieval: stuff, thing, scene retrieval

– Natural Language Processing: categorizing the web, Cheshire II, TileBar Interfaces

Page 22: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Multivalent Documents

• MVD Model– radically distributed, open, extensible– “behaviors” and “layers”

• behaviors conform to a protocol suite

• inter-operation via “IDEG”

• Applied to “enlivening legacy documents”– various nice behaviors, e.g., lenses

Page 23: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Document Presentation• Problem: Digital libraries must deliver digital

documents -- but in what form?• Different forms have advantages for particular

purposes– Retrieval– Reuse– Content Analysis– Storage and archiving

• Combining forms (Multivalent documents)

Page 24: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Spectrum of Digital Document Representations

Adapted from Fox, E.A., et al. “Users, User Interfaces and Objects: Evision, an Electronic Library”, JASIS 44(8), 1993

Page 25: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Document Representation: Multivalent Documents

• Primary user interface/document model for UCB Digital Library (Wilensky & Phelps)

• Goal: An approach to new document representations and their authoring.

• Supports active, distributed, composable transformations of multimedia documents.

• Enables sophisticated annotations, intelligent result handling, user-modifiable interface, composite documents.

Page 26: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Multivalent DocumentsCheshire LayerCheshire Layer

OCR LayerOCR Mapping LayerHistory of The Classical World

The jsfj sjjhfjs jsjjjsjhfsjf sjhfjksh sshfjsfksfjk sjs jsjfs kjsjfkjsfhskjf sjfhjkshskjfhkjshfjkshjsfhkjshfjkskjfhsfhskjfksjflksjflksjflksfsjfksjfkjskfjskfjklsslkslfjlskfjklsfklkkkdsjksfksjfkskflk sjfjksfkjsfkjsfkjshf sjfsjfjksksfjksfjksjfkthsjir\\ksksfjksjfkksjkls’ksklsjfkskfksjjjhsjhuusfsjfkjs

Modernjsfj sjjhfjs jsjjjsjhfsjf sslfjksh sshfjsfksfjk sjs jsjfs kjsjfkjsfhskjf sjfhjkshskjfhkjshfjkshjsfhkjshfjkskjfhsfhskjfksjflksjflksjflksfsjfksjfkjskfjskfjklsslkslfjlskfjklsfklkkkdsj

GIS Layer

taksksh kdjjdkd kdjkdjkd kjsksksk kdkdk kdkd dkkskksksk jdjjdj clclc ldldl

taksksh kdjjdkd kdjkdjkd kjsksksk kdkdk kdkd dkkskksksk jdjjdj clclc ldldl

Table 1.

Table Layer

kdkdkdkdk Scanned

PageImage

Valence:2: The relativecapacity to unite,react, or interact(as with antigensor a biologicalsubstrate).

Webster’s 7th CollegiateDictionary

Network Protocols &Resources

Page 27: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Page 28: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Page 29: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

MVD Third Party Work

• Japanese support by NEC; application to office document management

• Printing, support for other OCR formats, by HP

• Chinese character and multilingual lens by UCB Instructional Support staff (Owen McGrath)

• Automatic enlivening of documents via Transcend proxy.

Page 30: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

MVD Forthcoming

• Support for XML + style sheets• More robust parsing• Saving where you want• Media adaptors for

– Continuous media– Near image formats, word proc. formats

• Improve authoring tools• Interoperation with paper• Application versus applet?• Release to community, get feedback, iterate.

Page 31: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

GIS in the MVD Framework• Layers are georeferenced data sets.• Behaviors are

– display semi-transparently– pan– zoom– issue query– display context– “spatial hyperlinks”– annotations

• Written in Java (to be merged with MVD-1 code line?)

Page 32: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

GIS Viewer: Recent Developments

• Annotation and saving– points, rectangles (w. labels and links), vectors – saving of annotations as separate layer

• Integration with address, street finding, gazetteer services

• Application to image viewing: tilePix• Castanet client

Page 33: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Page 34: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Page 35: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Page 36: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

GIS Viewer Example http://elib.cs.berkeley.edu/annotations/gis/buildings.html

Page 37: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Geographic Information: Plans and Ideas

• More annotations, flexible saving• Support for large vector data sets• Interoperability

– On-the-fly • conversion of formats

• generation of “catalogs”

– Via OGDI/GLTP

– Experimenting with various CERES servers

Page 38: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Documents: Information from scanned document

• Built document recognizers for some important documents, e.g. “Bulletin 17”. “TR-9”.

• Recognized document structure, with order magnitude better OCR.

• Automatically generated 1395 item dam relational data base.

• Enabled access via forms, map interfaces.• Enable interoperation with image DB.

Page 39: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library
Page 40: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library
Page 41: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library
Page 42: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Document Recognition: Future Plans

• Document recognizers: for ~ dozen document types

• Development and integration of mathematical OCR and recognition.

• Eventually produce document recognizer generator, i.e., make it easier to write recognizers.

Page 43: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Vision-Based Image Retrieval

• Stuff-based queries: “blobs”– Basic blobs: colors, sizes, variable number

• demonstrated utility for interesting queries

– “Blob world”: Above plus texture, applied to• retrieving similar images• successful learning scene classifier

• Thing-finding: Successfully deployed detectors adding body plans (adding shape, geometry and kinematic constraints)

Find objects by grouping coherent low-level properties

Page 44: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Image Retrieval Research

• Finding “Stuff” vs “Things”

• BlobWorld

• Other Vision Research

Page 45: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

(Old “stuff”-based image retrieval: Query)

Page 46: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

(Old “stuff”-based image retrieval: Result)

Page 47: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Blobworld: use regions for retrieval

• We want to find general objects Represent images based on coherent regions

Page 48: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library
Page 49: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library
Page 50: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

(“Thing”-based image retrieval using “body plans”: Result)

Page 51: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Natural Language ProcessingAutomatic Topic Assignment

• Developed automatic categorization/disambiguation method to point where topic assignment (but not disambiguation) appears feasible.

• Ran controlled experiment:– Took Yahoo as ground truth.– Chose 9 overlapping categories; took 1000 web pages

from Yahoo as input.– Result: 84% precision; 48% recall (using top 5 of 1073

categories)

Page 52: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

(Isaac’s Automatically Generated Ontology)IAGO (0.1)! = Yahoo - labor + NLP

• We categorized (part of) the Web:– 1073 categories; 8000 web pages– ~80% precision for good categories

• E.g., “motion pictures”, “the environment”, “music”• IAGO 1.0 in the works:

– Eliminate pages with little text.– Eliminate proper nouns.– Retrained with MS Encarta - Improved performance

dramatically (perhaps enough to disambiguate the web)!– Need to compute word sense priors using the web.– [Recode implementation to keep up with web crawler.]

Page 53: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Cheshire II: Cross-Domain Resource Discovery: Integrated Discovery and Use of Textual,

Numeric and Spatial Data Ray R. Larson, PI

Kirby Zhang – Yonghui ZhangSchool of Information Management & Systems

University of California, [email protected]

Paul Watry, Co-PIRobert Sanderson

University of LiverpoolArchives and Special Collections

[email protected]

Page 54: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Overview

• Goals are – Practical application of existing DL

technologies to some large-scale cross-domain collections

– Theoretical examination and evaluation of next-generation designs for systems architecture and and distributed cross-domain searching for DLs

Page 55: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Current Usage of Cheshire II• Web clients for:

– Berkeley NSF/NASA/ARPA Digital Library – World Conservation Digital Library– SunSite (UC Berkeley Science Libraries)– University of Liverpool– DeMontfort University (MASTER)– Higher Education Archives Hub

• Glasgow, Edinburgh, Bath, Liverpool, Kings College London, University College London, Nottingham, Durham, School of Oriental and African Studies, Manchester, Southhampton, Warwick and others (to be expanded)

– University of Essex, HDS (part of AHDS)– Oxford Text Archive (test only)– California Sheet Music Project– Cha-Cha (Berkeley Intranet Search Engine)– Berkeley Metadata project cross-language demo– Univ. of Virginia (test implementations)– Use in NESSTAR (NEtworked Social Science Tools and Resources)– Cheshire ranking algorithm is basis for Inktomi

Page 56: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

The Participants• NSF/JISC International Digital Library Grant

Berkeley working with– University of Liverpool/Manchester Computing– DeMontfort University (MASTER)– Art and Humanities Data Service (http://ahds.ac.uk/)

• OTA (Oxford), HDS (Essex), PADS (Glasgow), ADS (York), VADS (Surrey & Northumbria)

– Consortium of University Research Libraries (CURL)– UC Berkeley Library

• Making of America II• Online Archive of California

Page 57: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Approach

• For the first goal, we are implementing a distributed search system based on international standards (Z39.50 and SGML/XML) (existing Cheshire II technology) which will be used for cross-domain searching. Databases include:– HE Archives hub– Arts and Humanities Data Service (AHDS)– MASTER– CURL (Consortium of University Research Libraries) – Online Archive of California (OAC)– Making of America II (MOA2)

Page 58: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Approach

• The second goal will be addressed in the design, development, and evaluation of the distributed information retrieval system architecture, its client-side systems that aid the user in exploiting distributed resources and in the design and evaluation of protocols for efficient and effective retrieval in a internationally distributed multi-database environment. (Cheshire III?)

Page 59: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Research Issues• Appropriate system architecture for information

retrieval in distributed network environment (distributed object architecture)

• Management of vocabulary control in a Cross-Domain context

• Distributed access to existing metadata resources• Navigating Collections• Support for Cross-Domain resource clumps to

facilitate resource discovery

Page 60: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Architecture Overview

Page 61: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Architecture Overview

• Focus on high performance N.O.W. style operations: A scalable, extensible platform for IR

• Current design uses JavaSpaces – a high-level coordination mechanism for distributed systems using a light-weight publish/subscribe distributed programming model

Page 62: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Current Design

• A single operational model for Cheshire that encompasses single node installations, uniformly administered clusters, as well as independently administered federations. – every operation is a distributed operation– an operation is applied over a set of collections

Page 63: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Collections:

• Single node or cluster– can be partitions of other collections

• Federation– can be partitions or subsets of other

collections. In other words, collections in a loosely coupled federation may have overlapping records

• Virtual Collections

Page 64: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Virtual Collections

• The external interface to collections– A VC may only present part of the underlying real collection in its

interface– A VC may grow or shrink dynamically within the bounds of the

real collection. A search only needs to be done over documents in VC, not all documents in the collection

– Ability to logically partition a collection across a number of machines for performance increase, with built in redundancy in the case of node failures.

– When a node failures, its VC is simply distributed (logically) to other nodes in the cluster.

– Cheshire servers can be organized into server groups. A server group can be thought of as an administrative unit.

Page 65: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Distributed Access to Existing Metadata Resources

• Use of current (Z39.50) and new (SDLIP) protocols for access to other metadata systems– Support for common semantics (e.g. Dublin

Core mappings for disparate systems)– Cross-system use of EVMs

Page 66: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Navigating Collections

• Support for “drilling down” from broad Collection-level descriptions, to sub-collection descriptions to individual digital objects.– Primary test bases will be EAD collection

descriptions linked to digital objects as in MOA2.

Page 67: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Cross-Domain Resource Discovery

• Initially -- Use of Z39.50 Cross-domain element set for search (Dublin Core based)

• Support for new protocols and semantics (such as SDLIP)

• Research into a metaprotocol for communicating information about databases, search elements and collections between systems– Initially based on Z39.50 Explain

Page 68: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Meta-Search for Cross-Domain Resource Discovery

• Hundreds or Thousands of servers with databases ranging widely in content, topic, format– Broadcast search is expensive in terms of bandwidth

and in processing too many irrelevant results– How to select the “best” ones to search?

• What to search first• Which to search next

– Topical /domain constraints on the search selections (EVMs for databases?)

Page 69: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Cross-Domain Resource Discovery• Meta-Search

– New approach to building metasearch based on Z39.50– Instead of using broadcast search we will explore

• Extraction of GlOSS-like indexes using Z39.50 SCAN• GIPSY2 extraction of place coverages from index data

– We will also Investigate • How to choose databases using the index• How to merge search results from multiple sources• Hierarchies of servers (general/meta-topical/individual)

– Other methods• Treating database contents as distributed objects

Page 70: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Distributed Metadata Servers

Replicatedservers

Meta-TopicalServers

General ServersDatabaseServers

Page 71: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Meta-Search Server Index Creation

• For all servers, or a topical subset…– Get Explain information (especially DC

mappings)– For each index (or each DC index)

• Use SCAN to extract terms and frequency• Add term + freq + source index + database to the

meta-search index– Post-Process indexes (especially Geo Names,

etc) for special types of data • e.g. create “geographical coverage” indexes

Page 72: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Z39.50 SCAN Results% zscan title cat 1 20 1{SCAN {Status 0}{Terms 20}{StepSize 1}{Position 1}}{cat 27}{cat-fight 1}{catalan 19}{catalogu 37}{catalonia 8}{catalyt 2}{catania 1}{cataract 1}{catch 173}{catch-all 3}{catch-up 2} …

zscan topic cat 1 20 1{SCAN {Status 0}{Terms 20}{StepSize 1}{Position 1}}{cat 706}{cat-and-mouse 19}{cat-burglar 1}{cat-carrying 1}{cat-egory 1}{cat-fight 1}{cat-gut 1}{cat-litter 1}{cat-lovers 2}{cat-pee 1}{cat-run 1}{cat-scanners 1} …

Page 73: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Conclusions

• A lot of interesting work to be done– Redesign and development of the Cheshire II system

– Evaluating new meta-indexing methods

– Developing and Evaluating methods for merging cross-domain results (or, perhaps, when to keep them separate)

– Developing, Testing and evaluating GIPSY2

– User interface development and testing for distributed resource and object access

Page 74: Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

11/21/2000 Database Management -- Spring 1998 -- R. Larson

Further Information• Berkeley DL web site

http://elib.cs.berkeley.edu

• Full Cheshire II client and server source is available ftp://cheshire.berkeley.edu/pub/cheshire/– Includes HTML documentation

• Project Web Site http://cheshire.berkeley.edu/