CWIC Data Partner’s Guide May 10, 2017 -...

Post on 23-Jul-2020

9 views 0 download

Transcript of CWIC Data Partner’s Guide May 10, 2017 -...

CEOS CWIC Project

CWIC Data Partner’s Guide May 10, 2017

Document version 1.2

CWIC Data Partner’s Guide Version: 1.2

Page I

CWICImplementationTeamArchieWarnock,A/WWWEnterprises(warnock@awcubed.com)LiLin,GeorgeMasonUniversity(llin2@gmu.edu)EugeneG.Yu,GeorgeMasonUniversity(gyu@gmu.edu)

Approvals ApprovedBy Signature Date

Yonsook K. Enloe

RevisionHistoryDate Version Brief Description Author

1.0 CWIC Data Partner’s Guide Archie Warnock 12 March 2012 1.1 CWIC data partner guide Archie Warnock

Yuanzheng Shao 10 May 2017 1.2 Revised CWIC CSW data partner

guide Li Lin, Archie

Warnock, Eugene G. Yu, Lingjun Kang

CWIC Data Partner’s Guide Version: 1.2

Page 2

TableofContentsExecutiveSummary.........................................................................................................................4

RecommendationsforCWICDataPartners................................................................................4

1.BeforeYouBegin.........................................................................................................................5

1.1CWICBackground..................................................................................................................5

1.2CWICConceptandDesign.....................................................................................................5

1.3CWICArchitecture.................................................................................................................6

1.4CWICTermsandDefinitions..................................................................................................7

1.5CWICSystems........................................................................................................................8

2.CSWQueryInterface...................................................................................................................9

2.1Introduction...........................................................................................................................9

2.2GetRecordsOperation...........................................................................................................9

2.3GetRecordByIdOperation.....................................................................................................9

3.CWICMetadataModel..............................................................................................................10

3.1CSWCoreMetadataModel.................................................................................................10

3.2ISO19115-2MetadataModel.............................................................................................10

3.3WGISSSearchCriteria............................................................Error!Bookmarknotdefined.

4.PartnerGuidelines.....................................................................................................................11

4.1Metadata&SemanticMapping..........................................................................................11

4.1.1 HTTPaccess...........................................................................................................11

4.1.2 Spatialsearch........................................................................................................11

4.1.3 Temporalsearch....................................................................................................11

4.1.4 Keywordsearch........................................................Error!Bookmarknotdefined.

4.1.5 Othersearchparameters.........................................Error!Bookmarknotdefined.

4.1.6 UniquegranuleIDs................................................................................................12

CWIC Data Partner’s Guide Version: 1.2

Page 3

4.1.7 RequestforgranulebyID......................................................................................12

4.1.8 Recordcounts........................................................................................................12

4.1.9 Searchstatus&Errorresponses............................................................................12

4.2Interaction&ServicesModel..............................................................................................12

4.2.1 UniquegranuleID..................................................................................................12

4.2.2 Contactinformation..............................................................................................12

4.2.3 BrowseURL............................................................................................................13

4.2.4 OrderURL..............................................................................................................13

4.3ErrorHandling.....................................................................................................................13

CWIC Data Partner’s Guide Version: 1.2

Page 4

ExecutiveSummary

RecommendationsforCWICDataPartnersWhileCWICcanserveasaCSWproxyforinternet-accessibleinventorysearchsystemforDataPartners,thereareasmallnumberofrecommendationsforDataPartnersthatwillmakethejobvastlyeasier.

1. RegisterdatasetintheIDN2. ProvideasearchinterfaceaccessibleviaasimpleURL(i.e.HTTPGET),ideallyincluding

parametersforstartingrecordnumberandnumberofrecordsdesiredintheresponse3. Supportsearchingonspatialboundingbox4. Supportsearchingontemporalextent,atleastobservationstartandenddates5. Providesearchresponsesinwell-structuredtext(XML,JSON,etc.)returningmatching

datagranules6. Identifyeachreturneddatagranulebyanidentifierthatisuniquewithintheinventory

system7. Provideacapabilityforusingthegranuleidentifiertoretrievemetadataaboutthe

granule8. ReturnURLsforbrowsedataanddirectlinktogranule-leveldata(ortoadataordering

system)inthesearchresponse

Ingeneral,thesearecommonandwidelyimplementedcapabilitiesinalmostanygranulesearchsystemandshouldnotrepresentanimpedimenttojoiningCWICasaDataPartner.Ifanyofthesecapabilitiesarenotimplemented,itisstillpossibletobecomeaCWICDataPartner–contactanyoftheCWICteamfordetails.

CWIC Data Partner’s Guide Version: 1.2

Page 5

1.BeforeYouBegin

1.1CWICBackgroundForscientistswhoconductmulti-disciplinaryresearch,theremaybeaneedtosearchmultiplecatalogsinordertofindthedatatheyneed.Suchworkisverytime-consumingandtedious,especiallywhenthecatalogsmayusedifferentmetadatamodelsandcataloginterfaceprotocols.Itwouldbedesirable,therefore,forthosecatalogstobeintegratedintoacatalogfederation,whichwillpresentawell-knownanddocumentedmetadatamodelandinterfaceprotocoltousersandhidethecomplexityanddiversityoftheaffiliatedcatalogsbehindtheinterface.Withsuchafederation,usersonlyneedtoworkwiththefederatedcatalogthroughthepublicinterfaceorAPItofindthedatatheyneedinsteadofworkingwithvariouscatalogsindividually.

CommitteeonEarthObservationSatellite(CEOS)addressescoordinationofthesatelliteEarthObservation(EO)programsoftheworld'sgovernmentagencies,alongwithagenciesthatreceiveandprocessdataacquiredremotelyfromspace.WorkingGrouponInformationSystemsandServices(WGISS)isasubgroupofCEOS,whichaimstopromotecollaborationinthedevelopmentofsystemsandservicesthatmanageandsupplyEOdatatousersworld-wide.TorealizeafederatedcataloguefordatadiscoveryfrommultipleEOdatacenters,CEOSWGISSIntegratedcatalog(CWIC)wasimplemented.CWICwasexpectedtoprovideinventorysearchtoWGISSagencycatalogsystemsforEOdata.

1.2CWICConceptandDesignThemediator-wrapperarchitecturehasbeenwidelyadoptedtorealizetheintegratedaccesstoheterogeneous,autonomousdatasources.AsdepictedinFig.1,thedatasourcearchivesdataanddisseminateitthroughtheInternet.Thewrapperontopofthedatasourceprovidesauniversalqueryinterfacebyencapsulatingheterogeneousdatamodels,queryprotocols,andaccessmethods.Themediatorinteractswiththewrapperandprovidestheuserwithanintegratedaccessthroughtheglobalinformationschema.

Wrappersofferqueryinterfaceshidingtheparticulardatamodel,accesspath,andinterfacetechnologyofthepartnercatalogsystems.Wrappersareaccessedbyamediator,whichoffersusersafront-endintegratedaccessthroughitsglobalschema.Theuserposesqueriesagainsttheglobalschemaofthemediator;themediatorthendistributesthequerytotheindividualsystemsusingtheappropriatewrappers.Thewrapperstransformthequeriessotheyareunderstandableandexecutablebythepartnercatalogsystemstheywrap,collecttheresults,andreturnthemtothemediator.Finally,themediatorintegratestheresultsasauserresponse.

CWIC Data Partner’s Guide Version: 1.2

Page 6

Fig. 1 The Mediator-Wrapper Architecture

ThedataprovidersconnectedbyCWICincludeNASA,USGS,NOAAGHRSST,BrazilINPE,CanadaCCMEO,EUMETSATandIndiaISRO(MOSDACandNRSC).Additionally,theCWICconnectorconnectingAustraliaNCIandChinaAOEareunderdevelopmentinCWICdevelopmentserver.Fig.2illustratesthesystemarchitectureofCWIC.Differentwrapperswereimplementedfordifferentdataproviders.Thewrapperisresponsiblefortranslatinganddispatchingtherequesttodifferentdatainventories.

Fig. 2 The System Architecture of CWIC

1.3CWICArchitectureAtitscore,CWICpresentstoEndUsersandClientsastandards-basedCSWserver.ToDataPartners,itisappearstobeaweb-basedclient.Itconnectsthetwo(EndUsersandDataPartners)throughtheMediatoronthefrontend–servingastheCSWservertoendusersanda

CWIC Data Partner’s Guide Version: 1.2

Page 7

CSWclienttotheConnectors.TheConnectorsarecustom-writtenproxiesforthedatagranuleinventorysearchsystemsattheindividualDataPartners,acceptingCSWsearchrequestsfromtheMediator,translatingthemintovalidsearchrequestsforthetargetdataset,thenparsingtheresultsfromtheinventorysearchsystemandtranslatingthoseintoCSWsearchresponseswhicharepassedbacktotheMediator.

Inthisway,outsideclientsand,forthemostpart,theMediatoritselfneedtohavenospecificknowledgeoftheparticularpartnerdatasystemsandcommunicateonlyviaCSW.EachDataPartnerwillgenerallybeaccessedbyadedicatedConnectorcalledbytheMediator.TheConnectorhandlesallofthedetailsuniquetoindividualdatapartnerinventorysystemandallofthecommunicationswiththepartner’sinventorysystemismanagedexclusivelybytheconnector.

1.4CWICTermsandDefinitionsForthepurposesofthisdocument,thefollowingtermsanddefinitionsapply:

1) client

Asoftwarecomponentthatcaninvokeanoperationfromaserver

2) dataclearinghouse

Thecollectionofinstitutionsprovidingdigitaldata,whichcanbesearchedthroughasingleinterfaceusingacommonmetadatastandard

3) identifier

Acharacterstringthatmaybecomposedofnumbersandcharactersthatisexchangedbetweentheclientandtheserverwithrespecttoaspecificidentityofaresource

4) IDNdatasetID

UniquedatasetidentifierinIDN,returnedfromtheIDNinresponsetotheOSDDrequest.ThisidentifierisassignedbytheIDNCMRdatabase.

5) nativeID

DatasetidentifierusedbyCWICtoretrievegranulemetadatathroughdataproviderAPI.Thisidentifierisassignedbythedataprovider.

6) catalogID

Identifiersofdataprovidercatalogsorconnectionsservinggranulemetadata.

7) operation

Thespecificationofatransformationorquerythatanobjectmaybecalledtoexecute.

8) profile

Asetofoneormorebasestandardsand-whereapplicable-theidentificationofchosenclauses,classes,subsets,optionsandparametersofthosebasestandardsthatarenecessaryforaccomplishingaparticularfunction

CWIC Data Partner’s Guide Version: 1.2

Page 8

9) request

TheinvocationofanoperationbyaCWICclient

10) response

Theresultofanoperation,returnedfromCWICservertoCWICclient

11) collection

Agroupingofgranulesthatallcomefromthesamesource,suchasamodelinggrouporinstitution.Collectionshaveinformationthatiscommonacrossallthegranulesthey"own"andatemplatefordescribingadditionalattributesnotalreadypartofthemetadatamodel.

12) dataset

Hasthesamemeaningascollection,see(11)

13) granule

Thesmallestaggregationofdatathatcanbeindependentlymanaged(described,inventoried,andretrieved).Granuleshavetheirownmetadatamodelandsupportvaluesassociatedwiththeadditionalattributesdefinedbytheowningcollection.

14) IDN

TheCEOSInternationalDirectoryNetwork,aGatewaytotheworldofEarthSciencedataandservices

1.5CWICSystemsTherearetwooperationalCWICsystemstowhichend-usershaveaccess.

§ CWICOperations.ThisisthecurrentoperationalsystemforCWCIandisavailabletoallusers.Endpoint:http://cwic.wgiss.ceos.org/

§ CWICPartnerTest.ThisisatestsystemareausedbypartnersandCWICdeveloperstotestbeforechangestotheCWICsystemgooperational.Endpoint:http://cwictest.wgiss.ceos.org/

CWIC Data Partner’s Guide Version: 1.2

Page 9

2.CSWQueryInterface

2.1IntroductionTheCSWprotocolisacatalogservicesearchspecificationandisusedbyCWICtosearchandreturnmetadatarelatedtogranule-levelinventorydata.CSWisnotdesigned,norisitusedforreturningobservationaldatafromtheinventorysystems,althoughthemetadatareturnedmightincludelinksdirectlytodatagranulesortoadataorderingsystem.CWICisintendedtotaketheenduserasclosetoactualdataaspossiblewithintheconstraintsofthedatapartnerinventorysystemsandthelimitsoftheCSWprotocolitself.

TheCWICConnectorshavethetaskofreturningvalidresponsestoCSWGetRecordsandGetRecordByIdrequeststotheMediator.Thesearegeneratedon-the-flybysubmittingsearchrequeststotheDataPartnerinventorysystemfortherequesteddataset,retrievingtheresultsandtranslatingthemintosyntacticallyvalidandsemanticallymeaningfulCSWresponses.TheConnectorimplementerwillworkwiththeDataPartner’ssupportteamtodefinethemappingsbetweenquantitiescontainedintheinventorysystemresponseandtheassociatedelementsintheCSWresponses.

2.2GetRecordsOperationTheCSWGetRecordsoperationcanbeusedforgeospatialcatalogsearchesonthetargetsystemwithawiderangeofparameters.ThesearchparameterssupportedbyCWICincludedatasetidentifier,spatial(boundingbox)andtemporalsearch(start/enddateandtime).

GetRecordsrequestscanspecifyoneoftwotypesofresponses–“hits”or“results.”The“hits”requestreturnsonlyacountofresults,noresultsareactuallyreturned.Itturnsoutthatnotallinventorysystemscaneasilypredictthenumberofresponsestoaquerywithoutactuallyprocessingthequeryandbuildingthefullresultsetinordertocounttherecords.ThiscanbequitecostlyintermsofCPUusageandbandwidth,sotheCWICteamdiscouragestheuseofthisrequest.The“results”requestreturnsactualresults,butalsoincludesthetotalnumberofmatchingrecords,aswellasthestartingrecordnumberandcountoftherecordsreturned.

GetRecordsrequestsalsocanspecifytheresultsetortypeofresultsreturned,i.e.,howmuchinformationtoincludeintheresponseforeachrecord.

2.3GetRecordByIdOperationTheCSWGetRecordByIdrequestisintendedtoallowtheusertorequestasinglespecificrecordfromthetargetsystem,generallyasafollow-uptoabroaderGetRecordsrequest.Nosearchfilterisspecified–onlytheuniqueidentifierforthespecificrecordisrequired.TheresponseisidenticaltotheGetRecordsresponse,exceptthatonlyasinglerecordwillbereturned.

CWIC Data Partner’s Guide Version: 1.2

Page 10

3.CWICMetadataModel

3.1CSWCoreMetadataModelTheCSWCoremetadataisasmallsetofmetadataelements,essentiallytheDublinCoremetadata,intendedtoprovideaminimalsetofinteroperabilityforCSWserversandclients.Table1&2oftheCWICClientGuideprovidetheminimallistofsupportedsearchandresponseelements.ForCWICpurposes,thecoremetadataspecificationprovidesdefinitionsforgranuleidentifier,spatialandtemporalcomponentsaswellasthebasicrequiredelementsforCSWrequestsandresponses(i.e.,responsetype,elementset,attributesforresultsetpaging,etc.)andXMLrepresentationofthemodel.

3.2ISO19115-2MetadataModelTheISO19115part2metadataisamoreextensivesetofmetadataelementswithmorecompleteresponsemodels.ItistheprimarymetadataschemacurrentlysupportedbyCWIC.Table3oftheCWICClientGuideshowstheadditionalelements,inadditiontothoseintheCSWCore,availabletobereturnedbyCWICinsearchresponses.ManyofthesemayalreadybeincludedintheresponsesfromtheDataPartner’sinventorysearchsystem,althoughCWICwillomitanyoptionalelementswhichcannotbepopulatedfromtheinventoryresponseandwillreturnemptyelementsforanymandatoryelementswhichcannotbepopulatedunlessinformationisavailablefromsomeothersource(e.g.contactinformation).

CWIC Data Partner’s Guide Version: 1.2

Page 11

4.PartnerGuidelines

4.1Metadata&SemanticMapping

4.1.1 HTTPaccessAlthoughCWICwillattempttouseanymechanismavailableforconnectingtoDataPartners’datamanagementsystemsinordertoaccesstheavailableinventorysearch,thereareafewspecificswhichmaketheprocesssimplerandmorerobust.

TheuseofHTTPforaccessingtheinventorysearchengineisstronglypreferred.Thisiswidelyusedalready,aswebbrowsersarenearlyuniversalandprovideaneffectiveuserinterfaceforbothhumanandautomatedaccess.Whileotherprotocolsmaybeused(Z39.50,forexample),HTTPisthepreferredmechanismfortheCWICconnectors.

Similarlyforresults,CWICwillattempttoextracttherelevantresultsfromanyresponsesthepartnerdatasystemreturns.However,structuredtextofsomesort–XML,forexample–isstronglypreferred.Theabilitytoeasilyanddefinitivelyparseresultsmakestheprocessofmappingthemetadatareturnedinthesearchresponsesimplerandlesserror-prone.Otherstructuredformatslikecomma-delimitedtablesorJSONareacceptable.

4.1.2 SpatialsearchAllCWICdatapartnersareexpectedtosupportsomelevelofspatialsearchsincealloftheinventorydataareanticipatedtohaveaspatialcomponent.Simpleboundingbox,withtheboundingcoordinatesindividuallyidentifiedistheminimumrequired,althoughmorecomplexspatialfootprintgeometriesarepossibleinthefuture.

ItisdesirabletohavetheAPIalsosupportadynamiccalltoreturnthelimitsofthespatialsearch,althoughnotnecessary.ThepresenceofsuchaservicecanhelpCWICavoidinvalidorinappropriatesearchrequests,suchasthoseoutsidethespatialboundariesforspecificdatacollections.

4.1.3 TemporalsearchSimilartospatialsearchdescribedintheprevioussection,allCWICdatapartnersareexpectedtosupportsomeleveloftemporalsearchsincealloftheinventorydataisanticipatedtohaveatemporalcomponent.Simpletemporalextent,withthestartandendtimesindividuallyidentifiedistheminimumrequired,althoughmorecomplextemporalrelationsareanticipatedinthefuture.ItisbesttosupportsomeminimalsubsetoftheISO8601timespecificationforsyntax–YYYY-MM-DD,atleast.

ItisdesirabletohavetheAPIalsosupportadynamiccalltoreturnthelimitsofthetemporalextentsearch,althoughnotnecessary.ThepresenceofsuchaservicecanhelpCWICavoid

CWIC Data Partner’s Guide Version: 1.2

Page 12

invalidorinappropriatesearchrequests,suchasthoseoutsidetheexistingtemporalextentforspecificdatacollections.

4.1.4 UniquegranuleIDsEachdatagranulereturnedinasearchresponseshouldhaveanidentifierassociatedwithitwhichisuniquewithinthedataset.Itisimportantthatthesearchresponseincludeauniqueidentifierforeachgranulesothatthefulldataonindividualgranulesmayberetrievedwithoutre-executinga(potentiallytime-consuming)search.

4.1.5 RequestforgranulebyIDCWICsupportstheCSWGetRecordByIdrequestandsoConnectorsexpecttobeabletosubmittothesearchsystemarequesttoreturninformationonasinglegranulespecifiedbyitsuniqueidentifier.Generally,thiswillbesothattheConnectorcanreturntotheMediatorthefullmetadatarecordforthatdatagranule,includinglinkstobrowsedataandtothedatagranulefordownloadororder.

4.1.6 RecordcountsAspartofthesearchresponsefromtheinventorysystem,itishighlydesirabletohavethetotalcountofmatchinggranulesreturned,evenifthemetadataforthegranulesisnotcontainedinthesearchresponse.Thisparameter,coupledwiththeabilitytospecifythestartingrecordnumberandnumberofdesiredrecordsfromtheinventorysystem,willallowclientstoimplementresultspagingandreducingtheloadonboththeCWICsystemandonthedatapartners.

4.1.7 Searchstatus&ErrorresponsesUsefulstatusanderrormessageshelptheConnectormanageclientsessionseffectively.Anylimitationsonsubmittedsearchrequeststotheinventorysystemsshouldbenotedintheresponse(e.g.,“toomanyrecordsrequested”,“searchtimedout”)sothatpredictableerror-handlingcanbemanagedbytheConnector.

4.2Interaction&ServicesModel

4.2.1 UniquegranuleIDAsdescribedabove,eachdatagranuleshouldhaveauniqueidentifierwhicha)ispassedbacktotheclientaspartofthesearchresponseandb)canbeusedasakeywithwhichtoretrievethatspecificgranule.TheCWICcomponentswillmanagethetaskofassociatingtheidentifierwiththecorrectdatasetanddatacenter.

4.2.2 ContactinformationTheCSWGetRecordsandGetRecordByIdresponsesincludeseveralblocksofcontactinformation–fordistributor,pointofcontactandmetadatacontact.Theseareusuallythesameforalldatagranules,andfrequentlythesamewithinasingledatacenter.

Thereisnoneedforthisinformationtobereturnedwitheachsearchresponseoreachdatagranule,althoughitmightbe.TheCWICConnectorcancachethisinformationintheCWIC

CWIC Data Partner’s Guide Version: 1.2

Page 13

runtimeenvironment,socoordinationwiththeCWICdevelopmentteamtoensuretheaccuracyandcurrencyofthecontactinformationisessential.

4.2.3 BrowseURLIfbrowseimagesofthedatagranuleareavailable,avalidURLtodisplaythebrowseimageshouldbeincludedinthesearchresponseforeachgranulesothattheclientcandisplayitasalink.WhileitispossiblefortheCWICconnectortobuildtheURLbasedonsomepre-defined,fixedpattern,thismechanismisnotrecommendedbecauseitremovescontrolovertheformoftheURLfromtheDataPartnerandchangesmayrequiremodificationstotheConnectorsourcecode.ThiscanleadtodelaysinthedeploymentofthecorrectURLwhenchangesareimplementedbythedatacenter.

4.2.4 OrderURLTheCWICteamrecommendsthat,whenthegranuledatacanbedownloadedfromthedatacenterdirectly,avalidURLtoretrievethedatabeincludedinthesearchresponseforthegranule.

Alternatively,thesearchresponsemaycontainaURLdirectingtheusertoawebsitefororderingthedataifthisistheonlyoptionpermittedbythedatacenter.Thisisoftennecessaryevenforfreelyavailabledataif,forexample,datacenterpoliciesrequireuserregistrationbeforedownloadsaremadeavailable.Insuchcases,theCWICteamstronglyrecommendsthatthegranuleIDrequestedbecachedattheorderingsystemsothatwheneverthedatacenterrequirementsfordownloadingdataaremet,theuserwillbeabletoretrievethedatawithoutre-enteringthegranuleID.

4.3ErrorHandlingTheCSWprotocolitselfhasrelativelylimitedcapabilitiesfordocumentingerrorswhichmayariseduringatransaction.TheCWICdevelopmentteamisinvestigatingwaystoenhancethisfunctionalitytoprovidebetterinformationtotheenduserorclient.Inordertosupportthiseventuality,itwouldbeusefulfortheinventorysearchsystemtoattempttoreturnsensibleandrelevanthttpstatuscodes(whereapplicable)ifsomethinggoeswrongwiththesearchor,perhapsevenbetter,asmall,descriptiveresponsedocument(inXMLorJSONorwhateverthedefaultformatmightbe)providingerrorcodesanderrortext.Inthisway,theCWICconnectorscandistinguishthetypeoferrorarisingattheinventorysystemfromthosearisingelsewhereandtakeappropriateaction.TherearenospecificrecommendationsatthistimebutthisshouldbepartofongoingdiscussionsbetweentheConnectordevelopersandtheDataPartner’ssupportstaff.