1 University of California California Digital Library The Costs of Containment: Frameworks and the...

35
1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006

Transcript of 1 University of California California Digital Library The Costs of Containment: Frameworks and the...

Page 1: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

1

University of California

California Digital Library

The Costs of Containment:

Frameworks and the Web

Peter Brantley

BL l London l 2006

Page 2: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

2

Huh? What the heck?

Outline:

1. SoA as accommodation to a changed world

2. CDL's Common Framework as example of SoA

3. Some pitfalls with SoA found in practice

4. Wriggling out of new straightjackets

Page 3: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

3

Transformations

It’s not about libraries any more:• Scholarly work and communication are being

transformed by new innovations and practices.

• Users seek not just content, but the ability to create and annotate, and a way to contribute within their own community.

• The optimum place for ‘social software’ in IR domains is undiscovered, but of great interest.

Page 4: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

4

MySpace

Page 5: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

5

YouTube

Page 6: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

6

Social apps popular

“With nearly 60 million registered users, 15 billion page views per month, and more than 150,000 new users signing up every day, MySpace is that rare social networking contagion that keeps spreading and growing.”

– Robert Young, in “Om Malik’s Blog,” 26 Feb 2006

Page 7: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

7

YouTube as a model

“… YouTube has captured the hearts and minds of the people as the place they go to post videos and find videos. …

Content owners should pay attention to what consumers want to do with their content and find ways to satisfy these desires that can fit into a business model. ”

– Fred Wilson, in “A VC,” 20 Feb 2006

Page 8: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

8

Where in the Market?

Libraries need to:• Assert the worth of digital assets and pervasive

services to institutional stakeholders.• Expand the roles of libraries within the scholarly

enterprise, and enter new realms of business. • Find new ways to make library services available

and relevant for users -- “flattening” the library.• Design and deploy service-oriented frameworksservice-oriented frameworks

for flexibility and manageability.

Page 9: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

9

CDL Common Framework

• The CDL is building an open, services oriented technical architecture that we call the Common Framework (CF).

• The CF provides an integration framework for DL services …

• And supports the integration of local and third-party tools/services via “plug-in” functions.

Page 10: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

10

Services are layered

The CF is a layered architecture, separating:– Front-end tools from …– Back-end services from … – Underlying data storage.

The CF presents itself via both machine interfaces (web services through SOAP & Java APIs) and human interfaces (command line & browser tools).

Page 11: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

11

CF in Flavors

The CF supports several DL data models:– Archival (objects stored locally)– Metadata only (MD stored locally)– Portal (no data stored locally)

Examples:– UC Digital Preservation Repository (Archival)– American West (MD only)– MetaSearch Infrastructure (Portal)

Page 12: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

12

Bundle of Concepts

The CF:– Is a philosophy governing software

development … – A conceptual design for services …– A specific technical architecture … – A set of “on the wire” services … – A growing number of applications …

Page 13: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

13

CF: Philosophy

• Composite, modular, lightweight are good.• Design and implement services quickly.• Reduce the need for application specific

tweaking, twiddling. • Make replacement and enhancement easy. • Integration trumps re-invention.

Page 14: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

14

CF: Design

• Applications are independent of services.• Design atomic services to enable the easy

construction or rebuilding of applications.• New application « reuse existing services

(or minor mods). • Build for scale.

Page 15: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

15

CF: Schematic

Page 16: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

16

CF: Services > Apps

Page 17: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

17

CF: Defined Services

CF Services: Available now -

Ingest, Indexing, Access, Admin and Account mgmt.

In development -

Search and Browse, Harvest and Capture, Rights mgmt, Collection mgmt, and MetaSearch

Page 18: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

18

CF: Plug-Ins

Local - • NOID (Nice Opaque IDentifier, for the

generation of ARK persistent identifiers)• XTF (eXtensible Text Framework, for text

indexing, searching, and browsing)• Metadata Normalization and Enrichment

(currently, primarily Date normalization)

Page 19: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

19

CF: Ext Plug-Ins

Third-Party - JHOVE (for validation and technical MD)

SRB (for archival storage of bitstreams)

MySQL (for MD and admin data storage)

Heritrix (for web crawling)

Ex-Libris’ MetaLib (for metasearch)

Page 20: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

20

There be dragons!

Page 21: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

21

No Greenfields

• Our SoA did not arise on a empty prairie.

• Existing applications and services must be rebuilt, or integrated through abstractions.

• Impacts on service delivery: “Should we implement with old tech immediately, or wait for the new Framework version?”

• Planning costs are (sometimes very) high.

Page 22: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

22

It takes a while

Page 23: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

23

SoA drawbacks

SoA implementations are thorny roses: • SoA internalizes enterprise sware dev. • Development focuses on specification.• Project interdependency can lead to

resource contention and gridlocks.

• Technical barrier for software “re-sellers” is high and usually must be arbitrated.

Page 24: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

24

Building for whom?

• CDL is struggling to define acceptable support commitments for various CF bundled services.

• Internally: how much do we encourage distributed adoption vs. our current centralized hosted model?

• Externally: how much support do we provide within an OSS environment?

• How much interoperation do we bake-in, both among CF installs and with “foreign” services?

Page 25: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

25

We are not alone

• *.edu is not the only service provider for the academic community any more.

• Silicon Valley is busy building services for everyone - who are DLs building for?

• Insular frameworks are inexcusable. • We work in a services ecosystem.

Page 26: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

26

Virtual architecture

Hypothetical BL Google-PAC:• Indexes BL’s local library catalog, • and a UK OpenCourseWare site and IRs … • Stores metadata structures in Google Base … • Integrates with Google Books and the OCA … • Links to journals in Google Scholar via SFX … • Queries OCLC WorldCat for branch locations.

. . . It’s possible now.

Page 27: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

27

Rapid dev … on foot

• Integration is fast; SoA development is not. • Difficult (although not impossible) to rapidly

prototype within SoA architectures.• Pace of new feature accretion in SoA must be

inherently slower than silo app development.• New SoA srvcs require staff/resource coord.• SoA is like building a CBD, vs. Quonset huts.

Page 28: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

28

The Lessons for CDL

• Work hard at defining the Framework core, but leave it a bit porous at the boundary.

• Push on the edge with rapid dev, internal to the CF when possible, external when not.

• When prototyping is possible: “Throw the first one away” and then integrate into CF.

• “Loosely-couple” external services.

Page 29: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

29

Fast dev example

CDL “Relvyl” Recommender system -• A Mellon-funded project to explore relevance

ranking and recommending for library OPAC.• Built on top of XTF-standalone (no CF).• XML files-based system created through a

merge of multiple source extracts.• New features added to XTF for extra crunchy

relevance and recommending goodness.

Page 30: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

30

Relvyl search

Page 31: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

31

Results by Relevance

Page 32: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

32

Recommended Results

Page 33: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

33

Framework accretion

• Relvyl is designed to be calved off as a separate set of services, i.e., ranking and recommending.

• Easy to incorporate into the Common Framework. • Could work with a range of backend data inputs, or

be integrated into external applications. • SoA allows us to scope narrowly at first (biblio.

services) and then expand utilization over time.

Page 34: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

34

An integrative service

Hypothetical service : • Take Google Book Search, and add “Relvyl”-

type recommendations.• All Google Book Search users get “expert” (i.e.,

research university) recommendations.• If authenticated as a UC user, could get extra

toppings, such as inline pointers to catalog records for recommended items, maybe via SFX linking.

Page 35: 1 University of California California Digital Library The Costs of Containment: Frameworks and the Web Peter Brantley BL l London l 2006.

35

Future for services

• Polygamous recombination may be the most likely future for library services.

• Be open to integration with diverse actors, both .ac/.edu and among .com information and service providers.

• Portal to info services can be anywhere.

• Libraries: The Data of Choice™.