Distributed Databases and Applications Presentation

63
DiGIR 1 Distributed Databases and Applications John Wieczorek Museum of Vertebrate Zoology, UC Berkeley

description

 

Transcript of Distributed Databases and Applications Presentation

Page 1: Distributed Databases and Applications Presentation

DiGIR 1

Distributed Databases and Applications

John Wieczorek

Museum of Vertebrate Zoology, UC Berkeley

Page 2: Distributed Databases and Applications Presentation

DiGIR 2

Distributed Databases – Discipline-specific

The Species Analyst (TSA) The Integrated Taxonomic Information System (ITIS) FishNet The Mammal Networked Information System (MaNIS) HerpNET The Ornithological Information System (ORNIS) …

Page 3: Distributed Databases and Applications Presentation

DiGIR 3

Distributed Databases –International

European Natural History Science Information Network (ENHSIN)

Biological Collection Access for Europe (BioCASE)

Australia Virtual Herbarium (AVH) Red Mundial de Información Sobre

Biodiversidad, Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (REMIB, CONABIO)

Page 4: Distributed Databases and Applications Presentation

DiGIR 4

Distributed Databases –Regional

Mountain and Plains Spatio-Temporal Database-Informatics (MaPSTeDI)

Ocean Biogeographic Information System (OBIS) Pacific Basin Information Node, National Biological

Information Infrastructure (PBIN, NBII) Species Link, Centro de Referência em Informação

Ambiental (Species Link, CRIA) A Virtual Herbarium of the Chicago Region (vPlants) Spatial Analysis of Local Vegetation Inventories Across

Scales (SALVIAS) …

Page 5: Distributed Databases and Applications Presentation

DiGIR 5

Distributed Databases –Intra-institutional

Berkeley Natural History Museums (BNHM) Association of Biological Collections, UC Davis …

Page 6: Distributed Databases and Applications Presentation

DiGIR 6

Distributed Databases –“Nodes”

LifeMapper National Biodiversity Information

Infrastructure (NBII) Global Biodiversity Information Facility

(GBIF)

Page 7: Distributed Databases and Applications Presentation

DiGIR 7

GBIF Work Programmes NODES ECAT – Electronic Catalogue of Names of

Known Organisms DIGIT – Digitisation of Natural History

Collections OCB – Outreach and Capacity Building DADI – Data Access and Database

Interoperability

Page 8: Distributed Databases and Applications Presentation

DiGIR 8

Taxonomic Database Working Group

Standards development and maintenance Access to Biological Collections Data (

ABCD) Darwin Core Version 2 (DwC2) Structure of Descriptive Data (SDD) DiGIR Others…

Page 9: Distributed Databases and Applications Presentation

DiGIR 9

DiGIRDistributed Generic Information Retrieval

John Wieczorek, Stan Blum, Dave Vieglais, P.J. Schwartz

Page 10: Distributed Databases and Applications Presentation

DiGIR 10

Information Retrieval Distributed - a protocol for retrieving

structured data from multiple, heterogeneous databases across the Internet.

Generic - a protocol independent of the data retrieved and of the software to retrieve it.

Page 11: Distributed Databases and Applications Presentation

DiGIR 11

Project Rationale Avoid multiple incongruous development

efforts Pool resources and create a support

community of experts Solve scalability problems

Page 12: Distributed Databases and Applications Presentation

DiGIR 12

Design Goals Use open protocols and standards, such as

HTTP and XML Decouple the protocol, software and

semantics Make new data provider installations as

easy as possible Develop open source software with GNU

General Public Licensing (It’s free).

Page 13: Distributed Databases and Applications Presentation

DiGIR 13

DiGIR Component Summary

Page 14: Distributed Databases and Applications Presentation

DiGIR 14

DiGIR ArchitectureProvider

Page 15: Distributed Databases and Applications Presentation

DiGIR 15

Provider Receives requests Retrieves data from

database Sends results to requestor Supplies metadata to

describe content, contacts, and capabilities

Logs requests

Page 16: Distributed Databases and Applications Presentation

DiGIR 16

DiGIR ArchitecturePortal Engine

Page 17: Distributed Databases and Applications Presentation

DiGIR 17

Portal Engine The entry point for an

application Can query a registry to

discover potential providers Can determine, based on

provider metadata, whether a provider should be queried

Can send requests to multiple providers

Page 18: Distributed Databases and Applications Presentation

DiGIR 18

Portal Engine, continued Assembles responses from

providers Returns packaged results to

the requesting application Communicates via protocol

compliant messaging only Logs activity

Page 19: Distributed Databases and Applications Presentation

DiGIR 19

Registry Provides a “yellow pages” to

advertise the existence and capabilities of a provider

Provides a means to discover potential providers of interest

May be public or private Need not be a part of the

architecture

Page 20: Distributed Databases and Applications Presentation

DiGIR 20

DiGIR ArchitectureProviderRegistry (register)

Page 21: Distributed Databases and Applications Presentation

DiGIR 21

DiGIR ArchitecturePortal EngineRegistry (discover)

Page 22: Distributed Databases and Applications Presentation

DiGIR 22

DiGIR Protocol Defines request and response message

formats for communication between provider, portal engine, and applications Metadata requests Search requests Inventory requests

Remains unfettered by the structure of the data it transfers

Page 23: Distributed Databases and Applications Presentation

DiGIR 23

DiGIR ArchitectureApplication

Page 24: Distributed Databases and Applications Presentation

DiGIR 24

DiGIR ArchitectureApplicationProtocol (request)Portal Engine

Page 25: Distributed Databases and Applications Presentation

DiGIR 25

DiGIR ArchitectureApplicationProtocol (request)Portal EngineProtocol (request)Provider

Page 26: Distributed Databases and Applications Presentation

DiGIR 26

DiGIR ArchitectureApplicationProtocol (request)Portal EngineProtocol (response)Provider

Page 27: Distributed Databases and Applications Presentation

DiGIR 27

DiGIR ArchitectureApplicationProtocol (response)Portal Engine

Page 28: Distributed Databases and Applications Presentation

DiGIR 28

Applications Must be able to assemble and send

a request document to a portal Must be able to receive and

interpret a response document from the portal

Must do something incredibly useful and interesting with the data

This is where the real fun is!

Page 29: Distributed Databases and Applications Presentation

DiGIR 29

Hot topics – Interesting problems Persistent unique identifiers Web services (converters, translators, calculators,

transformation services) Data validation (taxonomic and geographic thesauri,

ecological niche modeling, expedition analysis, outlier detection,…)

Spatial query interfaces Data quality feedback mechanisms Automated georeferencing, event gazetteers Concept libraries, schema extensions, schema libraries,

federated ontologies

Page 30: Distributed Databases and Applications Presentation

DiGIR 30

MaNIS – The Mammal Networked Information System

It’s more than just a pangolin…

Page 31: Distributed Databases and Applications Presentation

DiGIR 31

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 32: Distributed Databases and Applications Presentation

DiGIR 32

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 33: Distributed Databases and Applications Presentation

DiGIR 33

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 34: Distributed Databases and Applications Presentation

DiGIR 34

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 35: Distributed Databases and Applications Presentation

DiGIR 35

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 36: Distributed Databases and Applications Presentation

DiGIR 36

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 37: Distributed Databases and Applications Presentation

DiGIR 37

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 38: Distributed Databases and Applications Presentation

DiGIR 38

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 39: Distributed Databases and Applications Presentation

DiGIR 39

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 40: Distributed Databases and Applications Presentation

DiGIR 40

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 41: Distributed Databases and Applications Presentation

DiGIR 41

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 42: Distributed Databases and Applications Presentation

DiGIR 42

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 43: Distributed Databases and Applications Presentation

DiGIR 43

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

OnlineMS AccessDatabase

Page 44: Distributed Databases and Applications Presentation

DiGIR 44

MaNIS Network Configuration

LACMMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

OnlineMS AccessDatabase

OnlineMS AccessDatabase

Page 45: Distributed Databases and Applications Presentation

DiGIR 45

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

UAMOracle

Database

OnlineMS AccessDatabase

OnlineMS AccessDatabase

Page 46: Distributed Databases and Applications Presentation

DiGIR 46

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

OnlineMS AccessDatabase

Page 47: Distributed Databases and Applications Presentation

DiGIR 47

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 48: Distributed Databases and Applications Presentation

DiGIR 48

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 49: Distributed Databases and Applications Presentation

DiGIR 49

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MVZ-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

Page 50: Distributed Databases and Applications Presentation

DiGIR 50

MaNIS Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

MaNISDiGIRPortal

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

MaNISDiGIRPortal

MaNISDiGIRPortal

MVZ-MaNISPresentation

Layer

UMNH-MaNISPresentation

Layer

UWBM-MaNISPresentation

Layer

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

… …

… …

Page 51: Distributed Databases and Applications Presentation

DiGIR 51

“CalNet” Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

CalNetDiGIRPortal

Page 52: Distributed Databases and Applications Presentation

DiGIR 52

Event Gazetteer Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

BioGeomancerWeb

Service

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

CalNetDiGIRPortal

Page 53: Distributed Databases and Applications Presentation

DiGIR 53

NBII Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

NBIIDiGIRPortal

Page 54: Distributed Databases and Applications Presentation

DiGIR 54

GBIF Network Configuration

LACMMS AccessDatabase

OnlineMS AccessDatabase

MVZSybase

Database

LSUMZ4D-Mac

Database

OnlineMS AccessDatabase

UAMOracle

Database

OnlineMS AccessDatabase

GBIFPresentation

Layers

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

DiGIRProvider

CASSQL ServerDatabase

NBIIDiGIRPortal

Page 55: Distributed Databases and Applications Presentation

DiGIR 55

Intra-Network Configuration (BNMH)

PHMAWorking

Database

OnlineDatabase

UCBGWorking

Database

DiGIRProvider

BNHMDiGIRPortal

UCJEPSWorking

Database

OnlineDatabase

UCMPWorking

Databases (4)

OnlineDatabase

EssigWorking

Database

OnlineDatabase

OnlineDatabase

BNHMPresentation

Layer

Page 56: Distributed Databases and Applications Presentation

DiGIR 56

Other Network Configurations

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRProvider

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

WorkingDatabase

OnlineDatabase

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRPortal

WorkingDatabase

Page 57: Distributed Databases and Applications Presentation

DiGIR 57

Other Network Configurations

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRProvider

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

WorkingDatabase

OnlineDatabase

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRPortal

WorkingDatabase

Page 58: Distributed Databases and Applications Presentation

DiGIR 58

Other Network Configurations

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRProvider

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

WorkingDatabase

OnlineDatabase

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRPortal

WorkingDatabase

Page 59: Distributed Databases and Applications Presentation

DiGIR 59

Other Network Configurations

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRProvider

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

WorkingDatabase

OnlineDatabase

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRPortal

WorkingDatabase

Page 60: Distributed Databases and Applications Presentation

DiGIR 60

Other Network Configurations

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRProvider

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

WorkingDatabase

OnlineDatabase

DiGIRPortal

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRPortal

WorkingDatabase

Page 61: Distributed Databases and Applications Presentation

DiGIR 61

Other Network Configurations

WorkingDatabase

OnlineDatabase

DiGIRProvider

DiGIRPortal

WorkingDatabase

WorkingDatabase

Page 62: Distributed Databases and Applications Presentation

DiGIR 62

Project Information DiGIR is a collaborative open source

development project on SourceForge (https://sourceforge.net/projects/digir).

Software and documentation are available on the DiGIR web site (http://digir.net).

MaNIS is an international network collaboration among mammal specimen collections (http://elib.cs.berkeley.edu/manis).

Page 63: Distributed Databases and Applications Presentation

DiGIR 63

Hot topics – Interesting problems Persistent unique identifiers Web services (converters, translators, calculators,

transformation services) Data validation (taxonomic and geographic thesauri,

ecological niche modeling, expedition analysis, outlier detection,…)

Spatial query interfaces Data quality feedback mechanisms Automated georeferencing, event gazetteers Concept libraries, schema extensions, schema libraries,

federated ontologies