Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru
1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego...
-
Upload
chad-stevenson -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego...
![Page 1: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/1.jpg)
1
Distributed Software Systems: Cyberinfrastructure and
Geoinformatics
Chaitan Baru
San Diego Supercomputer Center
![Page 2: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/2.jpg)
2
Hardware
Integrated Cyberinfrastructure System Source: Dr. Deborah Crawford, Chair, NSF CI Working Committee
Middleware Services
DevelopmentTools & Libraries
Applications• Geosciences• Environmental Sciences• Neurosciences• High Energy Physics … •
Domain-specific Cybertools (software)
Domain-specific Cybertools (software)
Shared Cybertools (software)
Shared Cybertools (software)
Distributed Resources (computation, storage, communication, etc.)
Distributed Resources (computation, storage, communication, etc.)
Ed
uca
tion a
nd
Tra
inin
g
Dis
covery
& In
novati
on
![Page 3: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/3.jpg)
3
Community Cyberinfrastructure Projects
Middleware Services
DevelopmentTools & Libraries
Distributed Computing, Instruments and Data Resources
Friendly Work-Facilitating PortalsAuthentication - Authorization - Auditing - Workflows - Visualization - Analysis
Bio
med
ical
In
form
atic
s (B
IRN
)
Hig
h E
neg
y P
hys
ics
(Gri
Ph
yN)
Geo
scie
nce
s (G
EO
N)
Eco
log
ical
Ob
serv
ato
ries
(N
EO
N)
Ear
thq
uak
e E
ng
inee
rin
g (
NE
ES
)
Oce
an O
bse
rvin
g (
OR
ION
)
Hardware
Adapted from: Prof. Mark Ellisman, UC San Diego
Shared Tools
ScienceDomains
Shared Tools
ScienceDomains
Your Specific Tools & User Apps.
Your Specific Tools & User Apps.
![Page 4: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/4.jpg)
4
Data, Tools, & Computation
• Data– Field observations– Laboratory analyses– Sensor-based data (land, airborne, satellite)
• Tools– QA/QC, simple transformations and analyses– Complex models
• Computation– Community codes– Access to high-performance computing– Data Intensive Computing
![Page 5: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/5.jpg)
5
Variety of Geoinformatics Efforts
• Data collection– Digital data collection in the field– “When does it become cyberinfrastructure”?
• Database curation– E.g. EarthChem, Paleobiology, MorphoBank, Paleo
Pollen, etc….– When does it become “tools” and “community codes”
• Software Development– Tools: gravity and magnetics, paleogeography,
geochemistry, seismic data products, …– Community codes: SCEC-CME, CIG, …
![Page 6: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/6.jpg)
6
Variety of Geoinformatics Efforts
• High Performance Computing– LiDAR data management– Seismic analyses– Petascale initiative
• Data Integration– E.g. CUAHSI HIS– Also, a pressing need in projects like
EarthScope
![Page 7: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/7.jpg)
7
Cyberinfrastructure
To provide access to all of these “resources” and support “interoperability” among them
Cyberinfrastructure: The Common Platform Across Distributed Projects
Data Collection
Data ManagementAnd Curation
Tool Development
Modeling and Integration
![Page 8: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/8.jpg)
8
Example: USArray Data Flow
• Deploy field sensor arrays– Across US
• Collect data from sensor arrays and perform QA/QC– One of the sites is SIO, San Diego
• Archive data for community access– IRIS, Seattle EarthScope/USArray: Single
project, multiple participants.
![Page 9: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/9.jpg)
9
D. Harding, NASA
Point Cloudx, y, z, …
Example: LiDAR Workflow
Courtesy: Chris Crosby, ASU
Survey
Analyze / “Do Science”
Interpolate / Grid
Single goal: Multiple projects, multiple participants, e.g. NCALM,
GEON, ASU, NASA, USGS, …
![Page 10: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/10.jpg)
10
GEON Cyberinfrastructure
• Funded by NSF IT Research program
• Multi-institution collaboration between IT and Earth Science researchers
• GEON Cyberinfrastructure provides:– Authenticated access to data and Web services
– Registration of data sets, tools, and services with metadata
– Search for data, tools, and services, using ontologies
– Scientific workflow environment and access to HPC
– Data and map integration capability
– Scientific data visualization and GIS mapping
![Page 11: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/11.jpg)
11
Key Informatics Areas• Portals
– Authenticated, role-based access to cyber resources: data, tools, models, model outputs, collaboration spaces, …
• Data Integration– Search, discovery and integration of data from heterogeneous information
sources (“mediation” and “semantic integration”)• Use of workflow systems, and access to HPC
– Ability to “program” at a higher level of abstraction– Sharing of models, along with “provenance” information– Gateways to HPC environments
• Management of Geospatial Information– Using GIS capabilities, map services, geospatial data integration
• Visualization of 3D, 4D geospatial data and information
![Page 12: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/12.jpg)
12
Distributed System Definition
• A Distributed System is – one in which the hardware and software
components in networked computers communicate and coordinate their activities only by passing messages, e.g. the Internet
• A Distributed Database System is – one in which data is stored at several sites, each
managed by a database system (DBMS) that can run independently
![Page 13: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/13.jpg)
13
Distributed System Models
• Client – ServerClient A
Client B
Server 1 Client CNetworkNetworkNetworkNetwork
invocation
response
Process 1
Process 3
Process 2NetworkNetwork
Network
Network
• Peer to Peer
![Page 14: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/14.jpg)
14
Remote Service Invocation• TCP/IP
– Basic Internet protocol for computer communications
– Platform for building a number of other open or proprietary, “higher-level” communications protocols
• Communication at a higher-level of abstraction
• http– Open protocol based on TCP/IP for the Web
– Fixed set of “verbs” (actions) used to transfer HTML documents
• CORBA, Java RMI– Protocols based on an object model
![Page 15: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/15.jpg)
15
SRBArchives
HPSS, ADSM,UniTree, DMF
DatabasesDB2, Oracle,
Sybase
File SystemsUnix, NT,Mac OSX
User
Dublin Core
Resource,Mthd, User
User Defined
ApplicationMeta-data
RemoteProxies
DataCutter
MetadataExtractionC, C++,
Linux I/OUnix Shell
Java, NTBrowsers
WebPrologPredicate
MCAT
SDSC Storage Resource Broker “Virtualizing” storage
http://www.sdsc.edu/srb
![Page 16: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/16.jpg)
16
SRB Client/Server ModelSRB Client
NetworkNetwork
SRB ServerNetwork
Network SRB Server B
SRB peer-to-peer protocol
Oracle Server
OracleClient
Network
Network
Network
Network
HPSSClient
HPSSserver
Data are requested using an SRB ID and a “file abstraction” (open,
close, read, write)
![Page 17: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/17.jpg)
17
OpenDAP
• Client/Server model
OpenDAPClients
NetworkNetwork
OpenDAP Servers
![Page 18: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/18.jpg)
18
OpenDAP
From: Peter Cornillon & Jim Gallagherhttp://www.opendap.org/support/stennis_tutorial.html
Data Data Data Data Data Data Data
Matlab
HDF4 JDBC
FreeFromFITS
CDF CEDAR
Data
netCDF
netCDF HDF4 Matlab
Data
DSP
DSP
Data
JGOFS
Tables SQL FITS CDFFlat
Binary CEDAR
Data
CODAR
Data
ESML
GeneralCODAR
Servers
netCDF C netCDF Java
IDVFerret GrADS VisAD ncBrowse Matlab ExcelIDL Access
MatlabClient
IDLClient
Clients
![Page 19: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/19.jpg)
19
• Data are requested with a URL.
• http://www.cdc.noaa.gov/cgi-bin/nph-nc/datasets/Reynolds_sst
• Protocol Machine name OPeNDAP server Directory File name
?sst[10:10][0:90][0:180]
Constraint
• User can impose a constraint on the data to be acquired from a data set by appending a constraint expression to the end of the URL
OpenDAP Data Request
![Page 20: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/20.jpg)
20
Remote Service Invocation with Web Services
• A Web Service is a simple protocol for invoking remote services on the Web. It is:– A network “endpoint”, i.e. server, that implements one or more
“ports”. • `Each port is defined by the message types that accepts and the
messages it returns.– Specified by a “Web Service Definition Language” xml document.
• Given the WSDL for a web service you know all you need to interact with it.
• Web Service Standards also exist for security, policy, reliability, addressing, notification, choreography and workflow.– It is the basis for MS .NET, IBM Websphere, SUN, Oracle, BEA,
HP, …– It is the basis for the new Grid standards like WSRF and OGSA.
![Page 21: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/21.jpg)
21
Web Site vs Web ServiceFrom: “Building Grid Applications and Portals, An Approach Based on
Components, Web Services and Workflow Tools,” Gannon et al, Euro-Par 2004
• Web Site– Designed to pass http
get/post/put request to between a browser and a web server.
– Google has a web site.
• Web Service– Designed for services to
talk to other services by exchanging xml messages
– Google also provides a web service so Google may be used in distributed apps
Client’s Browser
WebServer
WebServer Web
Service
WebService
WebService
WebService Web
Service
WebService
![Page 22: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/22.jpg)
22
Grid ServicesFrom: “Building Grid Applications and Portals, An Approach Based on
Components, Web Services and Workflow Tools,” Gannon et al, Euro-Par 2004• Grid: A distributed, heterogeneous set of resources
– Integrated by a pervasive layer of services – Goal: allow users to view it as a single system
• More than the Internet (which forms part of the resource layer)
• Builds on the Web by building on web services
SecuritySecurityData Management
Service
Data ManagementService
AccountingService
AccountingServiceLogging
Logging
Event ServiceEvent Service
PolicyPolicy
Administration& Monitoring
Administration& Monitoring
Grid OrchestrationGrid Orchestration
Registries andName binding
Registries andName binding
Reservations And Scheduling
Reservations And Scheduling
Open Grid Service Architecture Layer
Web Services Resource Framework – Web Services Notification
Physical Resource Layer
![Page 23: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/23.jpg)
23
Access Interfaces and Levels of Access
• Web service, native application program interface, ODBC/JDBC, filesystem
filesystem
DBMS
Web Server “stack”
SOAP server stack
Application Program
Mount remote filesystems
Expose ODBC/JDBC interface (and full SQL)
URLs and http
WSDL and SOAPApplication can also be “wrapped” as a Web Service
SRB, OpenDAP, etc…
![Page 24: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/24.jpg)
24
Authentication
• Client – Server models
Client A Server 1NetworkNetwork
User
Client-sideauthentication
Server-sideauthentication
Server 2Server 3
? ?
![Page 25: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/25.jpg)
25
Common Authentication
CertificateAuthority
Client
ObtainCredentials
Server 1Invoke withCredentials
VerifyCredentials
Server 2 Server 3
![Page 26: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/26.jpg)
26
Portal server 2
Grid Account Management Architecture (GAMA): Single sign-on in GEON (also used in a number of other projects)
Karan Bhatia, Kurt Mueller, Choonhan Youn, Sandeep Chandra
Portal server 1
GAMA server
CACL
Myproxy
CAS
OG
SA
Grid
se
rvic
es w
rapp
er
…
Servlet container
import user
retrieve credential
Stand-alone applications
retrieve credential
DBDBgridportlets
Java keystoreJava keystore
Java keystoreJava keystore
gama
GridSphere
Servlet container
create user
![Page 27: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/27.jpg)
27
Systems Issues
• Load Balancing, Failover, Replication
Client
Server 1
Server 2
Server 3
Multiple servers for load balancing, failover
Data replication
![Page 28: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/28.jpg)
28
Distributed Data Access
• What is the issue?• Ability to access data stored in multiple, different
databases using a single request, e.g.– Get geologic information from multiple geologic
databases– Get employee information from all branches
• Ability to update data stored in multiple databases, e.g.– Transfer salary amount from University to my bank
account – Transfer funds from Visa account to vendor’s account
![Page 29: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/29.jpg)
29
Distributed data access
Client
Database 1 Database 2 Database 3
Homogeneous: mySQL mySQL mySQLHeterogeneous: mySQL Oracle DB2
How about creating a “cached” local copy?
mySQL Excel ASCII flat file
Sources may be data repositories or metadata catalogs
![Page 30: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/30.jpg)
30
Data Warehousing
Client
Data Source 1 Data Source 2 Data Source 3
Data Warehouse(common schema)
ETL
– Extract– Transform– Load ETL ETL
1. Load data from sources to warehouse
2. Query processing interaction only between client and warehouse
But, warehouse data could be “stale”, i.e. out of synch with source data…
![Page 31: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/31.jpg)
31
Data integration via middleware
Client
Database 1 Database 2 Database 3
Data integration Middleware
(aka Mediator)
1. Each client request goes to sources, via middleware 2. Result collected by
middleware and returned to client
![Page 32: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/32.jpg)
32
Warehousing vs Mediation
• Warehousing: User ETL to “massage” local data to fit into a common global, warehouse schema
• Mediation: Modify user query to match schemas exported by each source– But, which schema does the user query?– The Integrated View Schema– Sources “export” a view (the export schema)
• Federated databases– Local sources belong to different “administrative
domains”, i.e. different owners.– Local autonomy
![Page 33: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/33.jpg)
33
The Canonical Mediator / Wrapper Architecture
Client Application
Wrapper Wrapper Wrapper Wrapper
Mediator(Integrated view in mediator data model, e.g. relational, XML)
Local viewin local data model
Export viewin mediator data model
Q1
Q11 Q12 Q13 Q14
Cacheddata
Wrapper processes could execute at sources, at mediator, or elsewhere
q14Data source 1
Local schema
Data source 2
Local schema
Data source 3
Local schema
Data source 4
Local schema
![Page 34: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/34.jpg)
34
Example: A Relational Mediator
Client Application
Mediator(Relational data model)
Wrapper Wrapper
Relational DBMSe.g. PostGIS
Shape file
![Page 35: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/35.jpg)
35
Example: A Shape-file Based Mediator
Client Application
Mediator(Shape file-based data model)
Wrapper Wrapper
Relational DBMSe.g. PostGIS
Shape file
![Page 36: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/36.jpg)
36
Example: An XML Mediator
User / Applications
Mediator(XML-based data model, e.g. GML)
Wrapper Wrapper
Relational DBMSe.g. PostGIS
Shape file
Wrapper
XML filee.g. ArcXML
![Page 37: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/37.jpg)
37
User Authentication and Access Control
Client Application
Mediator
Wrapper Wrapper
Data source 1
Data source 2
2. User connects to mediator (passes credentials to mediator)
1. User authenticates to system
3. Mediator connects to sourcesa) Using original user credentialsb) Or, mapped credentials (role-based access)
4. Need to define users or roles in sources
How about using GAMA for
authentication?
![Page 38: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/38.jpg)
38
Different types of heterogeneity in data integration
• Platform heterogeneity: different OS platforms
• DBMS heterogeneity: different database systems, e.g. SQLServer, mySQL, DB2
• Data type heterogeneity• Schema heterogeneity• Heterogeneity in units, accuracy, resolution• Semantic heterogeneity
![Page 39: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/39.jpg)
39
• A long standing Computer Science problem• Simple case
– Mediator View: (SampleID varchar, Rock_Type varchar, Age int) – In Source2 Table, map Age to int
Wrapper: convert between int and varchar for Age
WrapperSample ID: Rock type: Age: … varchar varchar int
Schema Integration
Sample ID: Rock type: Age: … varchar varchar varchar
Source 1Table
Source 2Table
![Page 40: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/40.jpg)
40
Another integration scenario
– Mediator View:(SampleID varchar, Rock_Type varchar, Age varchar, Era varchar, Period varchar)
– In Source 2 Table, parse Age to obtain sub-components of the field
Sample ID: Rock type: Eon: Era: Period:varchar varchar varchar varchar varchar
Phanerozoic Mesozoic Jurassic
“Phanerozoic/mesozoic;jur”
Source 1Table
Sample ID: Rock type: Age:varchar varchar varchar
Source 2Table
![Page 41: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/41.jpg)
41
A more advanced integration scenario
• Mediator View: (SampleID varchar, Rock_Type varchar, Eon varchar, Era varchar, Period varchar)– Same as Source1 table schema
• Query: Get rock types for all rocks from the Jurassic period
Sample ID: Rock type: Eon: Era: Period:varchar varchar varchar varchar varchar
Phanerozoic Mesozoic Jurassic
150
Source 1Table
Sample ID: Rock type: Age:varchar varchar int
Source 2Table
![Page 42: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/42.jpg)
42
Doing the integration
• Query sent to mediator:
SELECT DISTINCT(Rock_Type) FROM Mediator_View WHERE Period=‘Jurrasic’
• Query to Source 1:
SELECT DISTINCT(Rock_Type) FROM Source1_Table WHERE Period=‘Jurrasic’
• For Source2, need to map Period=“Jurassic” to Age values
Sample ID: Rock type: Age:varchar varchar int
Source 2 TableEon: Era: Period: Min Maxvarchar varchar varchar int int
Geologic_Time Table
![Page 43: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/43.jpg)
43
Query “fragment” sent to Source 2
• SELECT DISTINCT (S2.Rock_Type)
FROM
Source2_Table S2,
Geologic_Time_Table GT
WHERE
GT.Period = ‘Jurrasic’ AND
(S2.Age >= GT.Min) AND
(S2.Age <= GT.Max)
Where is the Geologic_Timetable stored ?
![Page 44: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/44.jpg)
44
Data Integration Carts™
• Integrating data sets without explicitly creating views• An example request:
Plot all gravity data points that fall within the spatial extent of rocks of a given type, in the Rocky Mountain testbed region– Use GEONsearch to find all gravity and geologic data using
bounding box for “Rocky Mountain testbed region”• Need gazeteer / spatial ontology to determine Rocky Mountain region• Need to know classification of datasets (as gravity and geology)• Intersect extent of gravity and geologic datasets (from metadata) with
extent of Rocky Mountain region– Plot gravity point data that fall within polygons of rocks of given
type
![Page 45: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/45.jpg)
45
Ad hoc integration
GEONsearch Plot mapMap
Data Integration Cart™ Query
Search MetadataCatalog
“Geologic and gravitydata in Rocky Mountains”
![Page 46: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/46.jpg)
46
Data Registration
Igneous
Granite Quartzmonzonite
Rock Classification Ontology
Gravitydataset
(X, Y)Metadata
Geologicdataset
Lat, Long, RockType Metadata
Item DetailRegistration
Item Registration(Schema registration)
Location
Latitude Longitude
Spatial Ontology
Point Polygon
![Page 47: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/47.jpg)
47
![Page 48: 1 Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru San Diego Supercomputer Center.](https://reader034.fdocuments.in/reader034/viewer/2022051621/56649e0d5503460f94af6d30/html5/thumbnails/48.jpg)
48
Another complex query
• Query: Get rock types for all rocks from the mesozoic era– Easy to do for Source 1: Era = “Mesozoic”– For Source 2:
• Need to find numeric age range for Mesozoic– Find age range across all subclasses of Mesozoic
(Cretaceous, Jurassic, Triassic)
• Select all Source 2 Table records whose age range falls within the Mesozoic age range