CWIC Data Partner’s Guide May 10, 2017 -...
Transcript of CWIC Data Partner’s Guide May 10, 2017 -...
CEOS CWIC Project
CWIC Data Partner’s Guide May 10, 2017
Document version 1.2
CWIC Data Partner’s Guide Version: 1.2
Page I
CWICImplementationTeamArchieWarnock,A/WWWEnterprises([email protected])LiLin,GeorgeMasonUniversity([email protected])EugeneG.Yu,GeorgeMasonUniversity([email protected])
Approvals ApprovedBy Signature Date
Yonsook K. Enloe
RevisionHistoryDate Version Brief Description Author
1.0 CWIC Data Partner’s Guide Archie Warnock 12 March 2012 1.1 CWIC data partner guide Archie Warnock
Yuanzheng Shao 10 May 2017 1.2 Revised CWIC CSW data partner
guide Li Lin, Archie
Warnock, Eugene G. Yu, Lingjun Kang
CWIC Data Partner’s Guide Version: 1.2
Page 2
TableofContentsExecutiveSummary.........................................................................................................................4
RecommendationsforCWICDataPartners................................................................................4
1.BeforeYouBegin.........................................................................................................................5
1.1CWICBackground..................................................................................................................5
1.2CWICConceptandDesign.....................................................................................................5
1.3CWICArchitecture.................................................................................................................6
1.4CWICTermsandDefinitions..................................................................................................7
1.5CWICSystems........................................................................................................................8
2.CSWQueryInterface...................................................................................................................9
2.1Introduction...........................................................................................................................9
2.2GetRecordsOperation...........................................................................................................9
2.3GetRecordByIdOperation.....................................................................................................9
3.CWICMetadataModel..............................................................................................................10
3.1CSWCoreMetadataModel.................................................................................................10
3.2ISO19115-2MetadataModel.............................................................................................10
3.3WGISSSearchCriteria............................................................Error!Bookmarknotdefined.
4.PartnerGuidelines.....................................................................................................................11
4.1Metadata&SemanticMapping..........................................................................................11
4.1.1 HTTPaccess...........................................................................................................11
4.1.2 Spatialsearch........................................................................................................11
4.1.3 Temporalsearch....................................................................................................11
4.1.4 Keywordsearch........................................................Error!Bookmarknotdefined.
4.1.5 Othersearchparameters.........................................Error!Bookmarknotdefined.
4.1.6 UniquegranuleIDs................................................................................................12
CWIC Data Partner’s Guide Version: 1.2
Page 3
4.1.7 RequestforgranulebyID......................................................................................12
4.1.8 Recordcounts........................................................................................................12
4.1.9 Searchstatus&Errorresponses............................................................................12
4.2Interaction&ServicesModel..............................................................................................12
4.2.1 UniquegranuleID..................................................................................................12
4.2.2 Contactinformation..............................................................................................12
4.2.3 BrowseURL............................................................................................................13
4.2.4 OrderURL..............................................................................................................13
4.3ErrorHandling.....................................................................................................................13
CWIC Data Partner’s Guide Version: 1.2
Page 4
ExecutiveSummary
RecommendationsforCWICDataPartnersWhileCWICcanserveasaCSWproxyforinternet-accessibleinventorysearchsystemforDataPartners,thereareasmallnumberofrecommendationsforDataPartnersthatwillmakethejobvastlyeasier.
1. RegisterdatasetintheIDN2. ProvideasearchinterfaceaccessibleviaasimpleURL(i.e.HTTPGET),ideallyincluding
parametersforstartingrecordnumberandnumberofrecordsdesiredintheresponse3. Supportsearchingonspatialboundingbox4. Supportsearchingontemporalextent,atleastobservationstartandenddates5. Providesearchresponsesinwell-structuredtext(XML,JSON,etc.)returningmatching
datagranules6. Identifyeachreturneddatagranulebyanidentifierthatisuniquewithintheinventory
system7. Provideacapabilityforusingthegranuleidentifiertoretrievemetadataaboutthe
granule8. ReturnURLsforbrowsedataanddirectlinktogranule-leveldata(ortoadataordering
system)inthesearchresponse
Ingeneral,thesearecommonandwidelyimplementedcapabilitiesinalmostanygranulesearchsystemandshouldnotrepresentanimpedimenttojoiningCWICasaDataPartner.Ifanyofthesecapabilitiesarenotimplemented,itisstillpossibletobecomeaCWICDataPartner–contactanyoftheCWICteamfordetails.
CWIC Data Partner’s Guide Version: 1.2
Page 5
1.BeforeYouBegin
1.1CWICBackgroundForscientistswhoconductmulti-disciplinaryresearch,theremaybeaneedtosearchmultiplecatalogsinordertofindthedatatheyneed.Suchworkisverytime-consumingandtedious,especiallywhenthecatalogsmayusedifferentmetadatamodelsandcataloginterfaceprotocols.Itwouldbedesirable,therefore,forthosecatalogstobeintegratedintoacatalogfederation,whichwillpresentawell-knownanddocumentedmetadatamodelandinterfaceprotocoltousersandhidethecomplexityanddiversityoftheaffiliatedcatalogsbehindtheinterface.Withsuchafederation,usersonlyneedtoworkwiththefederatedcatalogthroughthepublicinterfaceorAPItofindthedatatheyneedinsteadofworkingwithvariouscatalogsindividually.
CommitteeonEarthObservationSatellite(CEOS)addressescoordinationofthesatelliteEarthObservation(EO)programsoftheworld'sgovernmentagencies,alongwithagenciesthatreceiveandprocessdataacquiredremotelyfromspace.WorkingGrouponInformationSystemsandServices(WGISS)isasubgroupofCEOS,whichaimstopromotecollaborationinthedevelopmentofsystemsandservicesthatmanageandsupplyEOdatatousersworld-wide.TorealizeafederatedcataloguefordatadiscoveryfrommultipleEOdatacenters,CEOSWGISSIntegratedcatalog(CWIC)wasimplemented.CWICwasexpectedtoprovideinventorysearchtoWGISSagencycatalogsystemsforEOdata.
1.2CWICConceptandDesignThemediator-wrapperarchitecturehasbeenwidelyadoptedtorealizetheintegratedaccesstoheterogeneous,autonomousdatasources.AsdepictedinFig.1,thedatasourcearchivesdataanddisseminateitthroughtheInternet.Thewrapperontopofthedatasourceprovidesauniversalqueryinterfacebyencapsulatingheterogeneousdatamodels,queryprotocols,andaccessmethods.Themediatorinteractswiththewrapperandprovidestheuserwithanintegratedaccessthroughtheglobalinformationschema.
Wrappersofferqueryinterfaceshidingtheparticulardatamodel,accesspath,andinterfacetechnologyofthepartnercatalogsystems.Wrappersareaccessedbyamediator,whichoffersusersafront-endintegratedaccessthroughitsglobalschema.Theuserposesqueriesagainsttheglobalschemaofthemediator;themediatorthendistributesthequerytotheindividualsystemsusingtheappropriatewrappers.Thewrapperstransformthequeriessotheyareunderstandableandexecutablebythepartnercatalogsystemstheywrap,collecttheresults,andreturnthemtothemediator.Finally,themediatorintegratestheresultsasauserresponse.
CWIC Data Partner’s Guide Version: 1.2
Page 6
Fig. 1 The Mediator-Wrapper Architecture
ThedataprovidersconnectedbyCWICincludeNASA,USGS,NOAAGHRSST,BrazilINPE,CanadaCCMEO,EUMETSATandIndiaISRO(MOSDACandNRSC).Additionally,theCWICconnectorconnectingAustraliaNCIandChinaAOEareunderdevelopmentinCWICdevelopmentserver.Fig.2illustratesthesystemarchitectureofCWIC.Differentwrapperswereimplementedfordifferentdataproviders.Thewrapperisresponsiblefortranslatinganddispatchingtherequesttodifferentdatainventories.
Fig. 2 The System Architecture of CWIC
1.3CWICArchitectureAtitscore,CWICpresentstoEndUsersandClientsastandards-basedCSWserver.ToDataPartners,itisappearstobeaweb-basedclient.Itconnectsthetwo(EndUsersandDataPartners)throughtheMediatoronthefrontend–servingastheCSWservertoendusersanda
CWIC Data Partner’s Guide Version: 1.2
Page 7
CSWclienttotheConnectors.TheConnectorsarecustom-writtenproxiesforthedatagranuleinventorysearchsystemsattheindividualDataPartners,acceptingCSWsearchrequestsfromtheMediator,translatingthemintovalidsearchrequestsforthetargetdataset,thenparsingtheresultsfromtheinventorysearchsystemandtranslatingthoseintoCSWsearchresponseswhicharepassedbacktotheMediator.
Inthisway,outsideclientsand,forthemostpart,theMediatoritselfneedtohavenospecificknowledgeoftheparticularpartnerdatasystemsandcommunicateonlyviaCSW.EachDataPartnerwillgenerallybeaccessedbyadedicatedConnectorcalledbytheMediator.TheConnectorhandlesallofthedetailsuniquetoindividualdatapartnerinventorysystemandallofthecommunicationswiththepartner’sinventorysystemismanagedexclusivelybytheconnector.
1.4CWICTermsandDefinitionsForthepurposesofthisdocument,thefollowingtermsanddefinitionsapply:
1) client
Asoftwarecomponentthatcaninvokeanoperationfromaserver
2) dataclearinghouse
Thecollectionofinstitutionsprovidingdigitaldata,whichcanbesearchedthroughasingleinterfaceusingacommonmetadatastandard
3) identifier
Acharacterstringthatmaybecomposedofnumbersandcharactersthatisexchangedbetweentheclientandtheserverwithrespecttoaspecificidentityofaresource
4) IDNdatasetID
UniquedatasetidentifierinIDN,returnedfromtheIDNinresponsetotheOSDDrequest.ThisidentifierisassignedbytheIDNCMRdatabase.
5) nativeID
DatasetidentifierusedbyCWICtoretrievegranulemetadatathroughdataproviderAPI.Thisidentifierisassignedbythedataprovider.
6) catalogID
Identifiersofdataprovidercatalogsorconnectionsservinggranulemetadata.
7) operation
Thespecificationofatransformationorquerythatanobjectmaybecalledtoexecute.
8) profile
Asetofoneormorebasestandardsand-whereapplicable-theidentificationofchosenclauses,classes,subsets,optionsandparametersofthosebasestandardsthatarenecessaryforaccomplishingaparticularfunction
CWIC Data Partner’s Guide Version: 1.2
Page 8
9) request
TheinvocationofanoperationbyaCWICclient
10) response
Theresultofanoperation,returnedfromCWICservertoCWICclient
11) collection
Agroupingofgranulesthatallcomefromthesamesource,suchasamodelinggrouporinstitution.Collectionshaveinformationthatiscommonacrossallthegranulesthey"own"andatemplatefordescribingadditionalattributesnotalreadypartofthemetadatamodel.
12) dataset
Hasthesamemeaningascollection,see(11)
13) granule
Thesmallestaggregationofdatathatcanbeindependentlymanaged(described,inventoried,andretrieved).Granuleshavetheirownmetadatamodelandsupportvaluesassociatedwiththeadditionalattributesdefinedbytheowningcollection.
14) IDN
TheCEOSInternationalDirectoryNetwork,aGatewaytotheworldofEarthSciencedataandservices
1.5CWICSystemsTherearetwooperationalCWICsystemstowhichend-usershaveaccess.
§ CWICOperations.ThisisthecurrentoperationalsystemforCWCIandisavailabletoallusers.Endpoint:http://cwic.wgiss.ceos.org/
§ CWICPartnerTest.ThisisatestsystemareausedbypartnersandCWICdeveloperstotestbeforechangestotheCWICsystemgooperational.Endpoint:http://cwictest.wgiss.ceos.org/
CWIC Data Partner’s Guide Version: 1.2
Page 9
2.CSWQueryInterface
2.1IntroductionTheCSWprotocolisacatalogservicesearchspecificationandisusedbyCWICtosearchandreturnmetadatarelatedtogranule-levelinventorydata.CSWisnotdesigned,norisitusedforreturningobservationaldatafromtheinventorysystems,althoughthemetadatareturnedmightincludelinksdirectlytodatagranulesortoadataorderingsystem.CWICisintendedtotaketheenduserasclosetoactualdataaspossiblewithintheconstraintsofthedatapartnerinventorysystemsandthelimitsoftheCSWprotocolitself.
TheCWICConnectorshavethetaskofreturningvalidresponsestoCSWGetRecordsandGetRecordByIdrequeststotheMediator.Thesearegeneratedon-the-flybysubmittingsearchrequeststotheDataPartnerinventorysystemfortherequesteddataset,retrievingtheresultsandtranslatingthemintosyntacticallyvalidandsemanticallymeaningfulCSWresponses.TheConnectorimplementerwillworkwiththeDataPartner’ssupportteamtodefinethemappingsbetweenquantitiescontainedintheinventorysystemresponseandtheassociatedelementsintheCSWresponses.
2.2GetRecordsOperationTheCSWGetRecordsoperationcanbeusedforgeospatialcatalogsearchesonthetargetsystemwithawiderangeofparameters.ThesearchparameterssupportedbyCWICincludedatasetidentifier,spatial(boundingbox)andtemporalsearch(start/enddateandtime).
GetRecordsrequestscanspecifyoneoftwotypesofresponses–“hits”or“results.”The“hits”requestreturnsonlyacountofresults,noresultsareactuallyreturned.Itturnsoutthatnotallinventorysystemscaneasilypredictthenumberofresponsestoaquerywithoutactuallyprocessingthequeryandbuildingthefullresultsetinordertocounttherecords.ThiscanbequitecostlyintermsofCPUusageandbandwidth,sotheCWICteamdiscouragestheuseofthisrequest.The“results”requestreturnsactualresults,butalsoincludesthetotalnumberofmatchingrecords,aswellasthestartingrecordnumberandcountoftherecordsreturned.
GetRecordsrequestsalsocanspecifytheresultsetortypeofresultsreturned,i.e.,howmuchinformationtoincludeintheresponseforeachrecord.
2.3GetRecordByIdOperationTheCSWGetRecordByIdrequestisintendedtoallowtheusertorequestasinglespecificrecordfromthetargetsystem,generallyasafollow-uptoabroaderGetRecordsrequest.Nosearchfilterisspecified–onlytheuniqueidentifierforthespecificrecordisrequired.TheresponseisidenticaltotheGetRecordsresponse,exceptthatonlyasinglerecordwillbereturned.
CWIC Data Partner’s Guide Version: 1.2
Page 10
3.CWICMetadataModel
3.1CSWCoreMetadataModelTheCSWCoremetadataisasmallsetofmetadataelements,essentiallytheDublinCoremetadata,intendedtoprovideaminimalsetofinteroperabilityforCSWserversandclients.Table1&2oftheCWICClientGuideprovidetheminimallistofsupportedsearchandresponseelements.ForCWICpurposes,thecoremetadataspecificationprovidesdefinitionsforgranuleidentifier,spatialandtemporalcomponentsaswellasthebasicrequiredelementsforCSWrequestsandresponses(i.e.,responsetype,elementset,attributesforresultsetpaging,etc.)andXMLrepresentationofthemodel.
3.2ISO19115-2MetadataModelTheISO19115part2metadataisamoreextensivesetofmetadataelementswithmorecompleteresponsemodels.ItistheprimarymetadataschemacurrentlysupportedbyCWIC.Table3oftheCWICClientGuideshowstheadditionalelements,inadditiontothoseintheCSWCore,availabletobereturnedbyCWICinsearchresponses.ManyofthesemayalreadybeincludedintheresponsesfromtheDataPartner’sinventorysearchsystem,althoughCWICwillomitanyoptionalelementswhichcannotbepopulatedfromtheinventoryresponseandwillreturnemptyelementsforanymandatoryelementswhichcannotbepopulatedunlessinformationisavailablefromsomeothersource(e.g.contactinformation).
CWIC Data Partner’s Guide Version: 1.2
Page 11
4.PartnerGuidelines
4.1Metadata&SemanticMapping
4.1.1 HTTPaccessAlthoughCWICwillattempttouseanymechanismavailableforconnectingtoDataPartners’datamanagementsystemsinordertoaccesstheavailableinventorysearch,thereareafewspecificswhichmaketheprocesssimplerandmorerobust.
TheuseofHTTPforaccessingtheinventorysearchengineisstronglypreferred.Thisiswidelyusedalready,aswebbrowsersarenearlyuniversalandprovideaneffectiveuserinterfaceforbothhumanandautomatedaccess.Whileotherprotocolsmaybeused(Z39.50,forexample),HTTPisthepreferredmechanismfortheCWICconnectors.
Similarlyforresults,CWICwillattempttoextracttherelevantresultsfromanyresponsesthepartnerdatasystemreturns.However,structuredtextofsomesort–XML,forexample–isstronglypreferred.Theabilitytoeasilyanddefinitivelyparseresultsmakestheprocessofmappingthemetadatareturnedinthesearchresponsesimplerandlesserror-prone.Otherstructuredformatslikecomma-delimitedtablesorJSONareacceptable.
4.1.2 SpatialsearchAllCWICdatapartnersareexpectedtosupportsomelevelofspatialsearchsincealloftheinventorydataareanticipatedtohaveaspatialcomponent.Simpleboundingbox,withtheboundingcoordinatesindividuallyidentifiedistheminimumrequired,althoughmorecomplexspatialfootprintgeometriesarepossibleinthefuture.
ItisdesirabletohavetheAPIalsosupportadynamiccalltoreturnthelimitsofthespatialsearch,althoughnotnecessary.ThepresenceofsuchaservicecanhelpCWICavoidinvalidorinappropriatesearchrequests,suchasthoseoutsidethespatialboundariesforspecificdatacollections.
4.1.3 TemporalsearchSimilartospatialsearchdescribedintheprevioussection,allCWICdatapartnersareexpectedtosupportsomeleveloftemporalsearchsincealloftheinventorydataisanticipatedtohaveatemporalcomponent.Simpletemporalextent,withthestartandendtimesindividuallyidentifiedistheminimumrequired,althoughmorecomplextemporalrelationsareanticipatedinthefuture.ItisbesttosupportsomeminimalsubsetoftheISO8601timespecificationforsyntax–YYYY-MM-DD,atleast.
ItisdesirabletohavetheAPIalsosupportadynamiccalltoreturnthelimitsofthetemporalextentsearch,althoughnotnecessary.ThepresenceofsuchaservicecanhelpCWICavoid
CWIC Data Partner’s Guide Version: 1.2
Page 12
invalidorinappropriatesearchrequests,suchasthoseoutsidetheexistingtemporalextentforspecificdatacollections.
4.1.4 UniquegranuleIDsEachdatagranulereturnedinasearchresponseshouldhaveanidentifierassociatedwithitwhichisuniquewithinthedataset.Itisimportantthatthesearchresponseincludeauniqueidentifierforeachgranulesothatthefulldataonindividualgranulesmayberetrievedwithoutre-executinga(potentiallytime-consuming)search.
4.1.5 RequestforgranulebyIDCWICsupportstheCSWGetRecordByIdrequestandsoConnectorsexpecttobeabletosubmittothesearchsystemarequesttoreturninformationonasinglegranulespecifiedbyitsuniqueidentifier.Generally,thiswillbesothattheConnectorcanreturntotheMediatorthefullmetadatarecordforthatdatagranule,includinglinkstobrowsedataandtothedatagranulefordownloadororder.
4.1.6 RecordcountsAspartofthesearchresponsefromtheinventorysystem,itishighlydesirabletohavethetotalcountofmatchinggranulesreturned,evenifthemetadataforthegranulesisnotcontainedinthesearchresponse.Thisparameter,coupledwiththeabilitytospecifythestartingrecordnumberandnumberofdesiredrecordsfromtheinventorysystem,willallowclientstoimplementresultspagingandreducingtheloadonboththeCWICsystemandonthedatapartners.
4.1.7 Searchstatus&ErrorresponsesUsefulstatusanderrormessageshelptheConnectormanageclientsessionseffectively.Anylimitationsonsubmittedsearchrequeststotheinventorysystemsshouldbenotedintheresponse(e.g.,“toomanyrecordsrequested”,“searchtimedout”)sothatpredictableerror-handlingcanbemanagedbytheConnector.
4.2Interaction&ServicesModel
4.2.1 UniquegranuleIDAsdescribedabove,eachdatagranuleshouldhaveauniqueidentifierwhicha)ispassedbacktotheclientaspartofthesearchresponseandb)canbeusedasakeywithwhichtoretrievethatspecificgranule.TheCWICcomponentswillmanagethetaskofassociatingtheidentifierwiththecorrectdatasetanddatacenter.
4.2.2 ContactinformationTheCSWGetRecordsandGetRecordByIdresponsesincludeseveralblocksofcontactinformation–fordistributor,pointofcontactandmetadatacontact.Theseareusuallythesameforalldatagranules,andfrequentlythesamewithinasingledatacenter.
Thereisnoneedforthisinformationtobereturnedwitheachsearchresponseoreachdatagranule,althoughitmightbe.TheCWICConnectorcancachethisinformationintheCWIC
CWIC Data Partner’s Guide Version: 1.2
Page 13
runtimeenvironment,socoordinationwiththeCWICdevelopmentteamtoensuretheaccuracyandcurrencyofthecontactinformationisessential.
4.2.3 BrowseURLIfbrowseimagesofthedatagranuleareavailable,avalidURLtodisplaythebrowseimageshouldbeincludedinthesearchresponseforeachgranulesothattheclientcandisplayitasalink.WhileitispossiblefortheCWICconnectortobuildtheURLbasedonsomepre-defined,fixedpattern,thismechanismisnotrecommendedbecauseitremovescontrolovertheformoftheURLfromtheDataPartnerandchangesmayrequiremodificationstotheConnectorsourcecode.ThiscanleadtodelaysinthedeploymentofthecorrectURLwhenchangesareimplementedbythedatacenter.
4.2.4 OrderURLTheCWICteamrecommendsthat,whenthegranuledatacanbedownloadedfromthedatacenterdirectly,avalidURLtoretrievethedatabeincludedinthesearchresponseforthegranule.
Alternatively,thesearchresponsemaycontainaURLdirectingtheusertoawebsitefororderingthedataifthisistheonlyoptionpermittedbythedatacenter.Thisisoftennecessaryevenforfreelyavailabledataif,forexample,datacenterpoliciesrequireuserregistrationbeforedownloadsaremadeavailable.Insuchcases,theCWICteamstronglyrecommendsthatthegranuleIDrequestedbecachedattheorderingsystemsothatwheneverthedatacenterrequirementsfordownloadingdataaremet,theuserwillbeabletoretrievethedatawithoutre-enteringthegranuleID.
4.3ErrorHandlingTheCSWprotocolitselfhasrelativelylimitedcapabilitiesfordocumentingerrorswhichmayariseduringatransaction.TheCWICdevelopmentteamisinvestigatingwaystoenhancethisfunctionalitytoprovidebetterinformationtotheenduserorclient.Inordertosupportthiseventuality,itwouldbeusefulfortheinventorysearchsystemtoattempttoreturnsensibleandrelevanthttpstatuscodes(whereapplicable)ifsomethinggoeswrongwiththesearchor,perhapsevenbetter,asmall,descriptiveresponsedocument(inXMLorJSONorwhateverthedefaultformatmightbe)providingerrorcodesanderrortext.Inthisway,theCWICconnectorscandistinguishthetypeoferrorarisingattheinventorysystemfromthosearisingelsewhereandtakeappropriateaction.TherearenospecificrecommendationsatthistimebutthisshouldbepartofongoingdiscussionsbetweentheConnectordevelopersandtheDataPartner’ssupportstaff.