Best practices for sensor networks and sensor data...

Post on 14-Jun-2020

2 views 0 download

Transcript of Best practices for sensor networks and sensor data...

Best practices for sensor networks

and sensor data management

Citation

ESIP EnviroSensing Cluster (2014). Community Wiki Document, “Best practices for sensor

networks and sensor data management”, Federation of Earth Science Information Partners.

http://wiki.esipfed.org/index.php/EnviroSensing_Cluster. (wiki document accessed 12-1-2014).

Contributors (as of December 2014)

Each chapter has a lead editor who is responsible for periodically compiling comments and

contributions into stable versions of this document which will be archived as PDF versions and

can be found here. If you contribute to this document by editing or adding text, images or

comments you agree to the use of that material in the regularly published PDF versions of this

document. Please add your name to the list of contributors if you feel you made a significant

contribution.

Corinna Gries, University of Wisconsin-Madison, North Temperate Lakes LTER

Don Henshaw, USFS Pacific Northwest Research Station, Andrews Forest LTER

Scotty Strachan, University of Nevada, Reno

Renee F. Brown, University of New Mexico, Sevilleta LTER & UNM Sevilleta Field Station

Christopher Jones, National Center for Ecological Analysis and Synthesis

Christine Laney, University of Texas at El Paso, Jornada Basin LTER

Branko Zdravkovic, University of Saskatchewan

Richard Cary, University of Georgia, Coweeta LTER

Jason Downing, University of Alaska Fairbanks, Bonanza Creek LTER

Adam Kennedy, Oregon State University, Andrews Forest LTER

Mary Martin, University of New Hampshire, Hubbard Brook LTER

Jennifer Morse, University of Colorado Boulder, Niwot Ridge LTER

Fox Peterson, Oregon State University, Andrews Forest LTER

John Porter, University of Virginia, Virginia Coast Reserve LTER

Jordan Read, US Geological Survey

Andrew Rettig, University of Cincinnati

Wade Sheldon, University of Georgia,, Georgia Coastal Ecosystems LTER

Scope

This document on best practices for sensor networks and sensor data management provides

information for establishing and managing a fixed environmental sensor network for on or near

surface point measurements with the purpose of long-term or “permanent” environmental data

acquisition. It does not cover remotely sensed data (satellite imagery, aerial photography, etc.),

although a few marginal cases where this distinction is not entirely clear are discussed, e.g.,

phenology and animal behavior webcams. The best practices covered in this document may not

all apply to temporary or transitory sensing efforts such as distributed “citizen science”

initiatives, which do not focus on building infrastructure. Furthermore, it is assumed that the

scientific goals for establishing a sensor network are thought out and discussed with all

members of the team responsible for establishing and maintaining the sensor network. i.e.,

appropriateness of certain sensors or installations to answer specific questions is not discussed.

Information is provided here for various stages of establishing and maintaining an environmental

sensor network: planning a completely new system, upgrading an existing system, improving

streaming data management, and archiving data.

Below are chapters of a living document to which contributions can be made by anybody

interested in this subject. Please post questions, answers, experiences with particular

software/hardware/setup, comments, additions, edits, resources, and publications. Please use

common online etiquette. If conflicting views arise they should be discussed in the

EnviroSensing e-mail list.

backtoEnviroSensingClustermainpage

Contents1PlanningProcess2ImplementationFeasibility3AssemblingtheTeam4OverviewofChapters

PlanningProcessThroughoutthisdocumentitisemphasizedthattheinitialplanningisextremelyimportant,aswellastheinclusionofexpertiseinmanydifferentareas,scientificandtechnical,intheearlydiscussionandplanningphasebeforeaproposaliswritten.Ifallfieldsofexpertisearenotconsulted/incorporatedpriortomakinglocation,budget,deployment,andtimelinedecisions,criticalinterdependenciesarelikelytobeoverlooked(e.g.powerrequirements,topographicconstraints,constructiontoolsrequired,etc.).

Although,thediscussionhereisgearedtowardmaintainingsensornetworksoveranextendedperiodoftime,planningisequallyimportantforshortterminstallations.Experiencehasshownthatmanyshortterminstallationshavebecomelongtermevenifthatwasnotintendedinitiallyandmanysmallinstallationshavebeenexpandedtocovermoreareaormeasuremoreparameters.

Clearly,sensornetworkdeploymentsaredrivenbyambitioussciencequestions.However,goodplanningcanhelpanticipatelimitationsandpreventtimeissuesfrombecomingthedrivingforce.Focusingontheoverarchingimperativesofgooddesign,properplacement,organizeddataflow,andawelltrainedandmotivatedteam,willresultinsuccessfulimplementationandcontinuedmaintenance.Compromisedinstallationsdiminishtheimpactsoftheoriginalstudy,candrainoperatingbudgetsunnecessarily,andinhibitleveragingofthescienceforfutureworkandfunding.

ImplementationFeasibilityDuringtheexperimentorprojectdesignphase,definingtheprimarymeasurementobjectiveisthefirststeptoplanninganobservationsiteandplatform.Answeringthesegeneralquestionsishelpfulbeforeaddressingspecifictechnicalissues:

Whereisthegeographicareaofinterest?Whatarethemeasurementsofinterest?Whatisthedesiredaccuracyandfrequencyofmeasurement?Howcriticalarethesensormeasurements?Candatagapsbetolerated?Issensorredundancynecessary?Whattypeofexperimentalmanipulationisdesired(ifany)?Whattypesoflocalizedtopographyarelikelytoyield“representative”measurementsatthetimefrequencyofinterest?Whatisthetotalfundingamountforpersonnel,travel,tools/equipment,fees,andscienceinstruments?Whatistheexpectedscope/lifetimeofthedeployment?Willitbeexpandedinthefuture?Considerscalingpossibility(moresites,moresensors)evenifitisnottheimmediategoal.Evaluatecommercialturnkeyinstallationsvs.systemsdevelopedfromcommercialoropensourcecomponents.Considerations:cost,skills,maintenance,longevityofthecompanyprovidingthewholesystemoreachcomponent,functionality,interoperability,accesstocontinuedsupport.

AssemblingtheTeamSeveralverydifferentareasofexpertisearerequiredtosuccessfullyplan,install,andmaintainsensingsystems.Someoftheseroles/skillsetscancertainlybeprovidedbyasingleindividualorindividuals.

Roleswithinateamestablishingandmaintaininganenvironmentalsensornetwork:

Scientist-determinesthetypeofdataandsamplingfrequencyneededtoanswerthescientificquestionswithinbudgetlimitations.

Sensorsystemexpert-knowsthetypesofsensorsandplatforms,theirinstallationandprogrammingneededtoanswerscientificquestions.IsfamiliarwithspecificclimateandterrainissuesandQA/QCapproachesinthefield.

Fieldlogisticsexpert(formajorsiteconstruction)-familiaritywithtransport,construction,weather,tools,andsuppliesforconstruction

Fieldconstructionandfabricationexpert -understandsconcrete,metalstructure,towerdesign,fencing,underwateranchors,floatingdevices,loadestimates

Fieldworkers/assistants-manypeopleareneededforremoteconstructiontasks,sensorwiring,initialsitesetup,cablemanagement

Fieldtechnician-familiarwithmaintenancetasksincludingminorrepairs,maintainingacalibrationschedule,otherregularsensormaintenancetasks.Fieldtechniciansneedtohaveagoodunderstandingofthescienceapplicationandtheenduser,theyneedtobecomfortablewithtechnology,andapplyingknowledgefromoneareatoanother,havecreativeproblemsolvingandcriticalthinkingskillsandpayattentiontodetail.Theyshouldhavebasicelectricalandmechanicalknowledge(e.g.,multimeteruse,basicequipmentinstallation,repairandprogramming).Dependingonsiteconditionstheyalsoneedtobecertifiedintower/rock/treeclimbing,boathandling,SCUBAdiving,respectivesafetytraining,andenjoyskiing,hiking,off-roaddrivingetc.plusneedtobeskilledinGPSorienteering,navigation,andbasicmapmaking.

Communications/datatransportexpert/LicensedCommercialradiooperator(ideal,butnotrequired)-needstobefamiliarwithmovingdigitaldataoverwiredorwirelessnetworksfromremotepointstoprojectserversandshouldhavebasicknowledgeofradiocommunication(e.g.,technician-levelamateurradiolicense,basicantennatheory,IPnetworking)

Networkadministrator/Systemadministrator-isresponsiblefornetworkarchitecture,redundancyofsystemsfromdatacentertofieldsites,backup,datasecurity

Softwaredeveloper-skillsinpreferredprogramminglanguage

Datamanager-needstobefamiliarwithmeansofdocumentingproceduresformaintainingcommunicationbetweenallrolesinvolved,specifically,meansfordocumentingfieldeventsandtheirramificationforthedataquality.Needstoknowapproaches/softwareformanaginghighfrequencystreamingdata,standardQA/QCroutinesforsuchdata,approachestodocumentingdataprovenanceanddataarchiving(spacerequirements,backup,storageofdifferentQ/Clevels)andhavedatabase/softwarepackageprogramming/configurationexpertise

Datatechnician-needstobethoroughandreliableduringtaskslike‘eyeon’qualitycontrol,manualdataentryetc.

OverviewofChapters

Thefollowingchapterscontainedinthisguidearestructuredtoprovideageneraloverviewofthespecificsubject,anintroductiontomethodsused,andalistofbestpracticerecommendationsbasedonthepreviousdiscussions.Casestudiesprovidespecificexamplesofimplementationsatcertainsites.

SensorSiteandPlatformSelectionconsidersenvironmentalissues,siteaccessibility,systemspecifications,sitelayout,andcommonpointsoffailure.DataAcquisitionandTransmissionisconcernedwiththeacquisitionofsensordatafromthefield,whileensuringtheintegrityofthosedata.Also,remotecontrolofthesystem.SensorManagementTrackingandDocumentationoutlinestheimportanceofcommunicationbetweenfieldanddatamanagementpersonnelasfieldeventsmayalterthedatastreamsandneedtobedocumented.StreamingDataManagementMiddlewarediscussessoftwarefeaturesformanagingstreamingsensordata.SensorDataQualitydiscussesdifferentwayssensordatamaybecompromised,howtoautomaticallycontrolforitinthedatastream..SensorDataArchivingintroducesdifferentapproachesandrepositoriesforarchivingandpublishingdatasetsofsensordata.

Sensor Site and Platform Selection

Considers environmental issues, site accessibility, system

specifications, site layout, and common points of failure.

backtoEnviroSensingClustermainpage

Fig.1Typicalenvironmentalsensordeploymentwithscience,support,andcommunicationsystems.Photo2013ScottyStrachan,NevCANSheepRangeBlackbrushstation

Contents1Contacts2Reviewers3Overview4Introduction5Methods

5.1Environmentalconcerns5.2Siteaccessibility5.3Scienceplatformselection5.4Supportsystemspecification5.5Sitelayout

6BestPractices6.1CommonPointsofFailure

7CaseStudies

ContactsTheleadeditorsforthispagemaybecontactedforquestions,comments,orhelpwithcontentadditions.

ScottyStrachan-DepartmentofGeography,UniversityofNevada,Reno-scottyatdayhike.netAdamKennedy-AndrewsForestLTER-adam.kennedyatoregonstate.edu

ReviewersThispagewasreviewedby:

JasonDowning,BonanzaCreekLTERInformationManager,on5/2/2014RichardJasoni,AssociateResearchEcologist,DesertResearchInstitute,on6/30/2014

Overview

Selectionofexactlywhereandhowtoacquiredataviain-situsensingeffortsisacrucialpointinthescienceprocesswhereenvironmentalresearchisconcerned.Decisionsmadewhenchoosingsites,sensorpackages,andsupportinfrastructureinturnplaceboundariesonwhatthefinalsciencedeliverablescanbe.Datatypes,quantity,andqualityaremoreorlesssetinstoneduringthisprocess.Initialcosts,timeframes,andsustainabilityarealsodeterminedbythesechoices.Selectionsneedtobemadebasedonthedesiredscienceproducts,butalsoinconsiderationofawidearrayofvariablesincludinglandownership,access,equipmentbudget,long-termmaintenancecapability,previousresearch,andconstruction/deconstructionlogistics.

Settingupterrestrialsensingsystemsisamajorinfrastructure/personnelcommitmentwithbudgetaryandenvironmentalconcerns,andeveryefforttowardsmaintainingarobust,low-impact,andlong-termdatastreamshouldbemade.Becauseeachregionpossessesuniquegeography,thereisno“onesizefitsall”solution.Instead,aseriesofdecisionsneedstobemade,withthegoalsandcapabilitiesoftheresearchteamdefinedinthecontextofclearly-articulatedsciencequestionsandobjectives.

Introduction

Fig.2Progressionofworkinselectingasiteanddesigningasciencedeployment.

Identifyingboththedeploymentstrategy(site,process)aswellasthephysicalhardware(sensorplatformsandsupportinfrastructure)forenvironmentalsensingisusuallyadauntingtask.Akeyobjectiveoftheresearchteamshouldbetokeepthesciencecontextinviewduringthisprocess,aslogisticrealitieswilloftenclashwith"ideal"scientificconditions.Veryoftenthedecisiontreeforchoosingexactlocationsanddeploymentschemesisdependentoninteractingfactors(suchaspermitting/geography/access;Fig.2).Thereisalsoavastarrayofpossiblesensor/hardwarepackagesavailableforamultitudeofscienceapplications.

ItiscriticalthatPrincipalInvestigators(PI’s),logisticaltechs,andsensorspecialistsworktogethertodevelopspecificdeploymentplansandalternatives,ideallyinthepre-proposalstage.Planningtopicsmustincludescienceobjectives,operatingbudgets,proposedlocations,seasonalweatherpatterns,powersources,communicationsoptions,landownership,distancefrommanaginginstitutions,availablepersonnel/expertise,andpotentialexpansion/future-proofing.Allofthesecategoriesareequallycriticalfordiscussionasproposedinstrumentationprojectsmovetowardsimplementation.

MethodsSitevisits,permit/agreementnegotiations,equipmentspecifications,anddeploymenttimelinesneedtobe

initiatedconcurrentlybecauseallphasesofdeploymentareinterdependent(Fig.2).TheP.I.,togetherwiththetechnicalpersonnel,shouldidentifysitesforsensorandequipmentdeploymentbasedonscienceneeds,localtopography,permit/agreementavailability,logisticalaccess,andavailabilityofservices(suchaspowerandcommunications).Portionsoftheplan(suchassomepurchasingdecisions)shouldremainflexibleuntiltheprecisesites,permits/agreements,anddataflowplanhavebeenpositivelydetermined.

Environmentalconcerns

Environmentalconditionshaveconsiderablebearingonscienceapplication,platformdesign,constructionlogistics,accessrestrictions,equipmentreliability,andmaintenancecost/longevity.Conditionsforinsitusensingcanvarytremendouslyfromregiontoregion;therefore,siteandequipmentselectionmustbeconcideredonacase-by-casebasis.

Localtopographicvariablesinclude:northernversussouthernexposure,whichcanaffecthoursofdirectsunlightandsnowpersistence;andvalley/sinkversusridgelinesettings,whichcanaffectdailytemperaturecycleandwindcharacteristics.Thedifferencesinairflow,windexposure,coldsinks,snowdrifts,skyexposureforsolarpanels,andpossibleradio/communicationspathwaysareallimportantvariableswhenselectingasiteandwhattypeofequipmentwillbedeployed.

Dominantvegetationconditionsandpotentiallong-termgrowthcanaltersensorreadingsviashadingeffects,affectingtemperature,radiation,andsnow-relatedmeasurements.Radiocommunicationsarealsoaffectedbyvegetation,withmostmicrowavefrequenciesusedbyhigh-speeddataradiosbeingstronglyattenuatedbytreesandbrush.Vegetationcanalsobealong-termhazardintheformsoffirefuelsanddeadfalls.

Visibilityandthevisualimpactofdeploymentsshouldbeconsideredforbothsecurityandaestheticconsiderations.Sometimesreductionofvisualimpactisrequiredbylandowners,butingeneralitissimplygoodpractice.Metalstructurescanbecamouflagedwithpainttoreducevisibility,structureheightsmaybereducedtoblendwithvegetation,andgrounddisturbancecanbekepttoaminimumtoavoidbiasingcertaintypesofmeasurementsanderosion.

Dominantweatherconditionsdeterminewhatlevelsofseasonalaccessareavailable,whatstructuraldesignsshouldbeused,andwhatsortofequipmentshouldbepurchased.Extremetemperatures,tropicalstorms,lightning,snowdepth,riming/ice,UVexposure,highhumidity,windspeeds,saltwaterexposure,flooding,andstreamdepthvariationareallexamplesofconditionswhichwillinfluencedesignanddeploymentplans.

Wildlifecanprovidehazardconsiderationsorbeaffectedbyproposeddeployments.Birdperchingandflightpaths,cattle,soilinvertebrates,rodents,andlargemammalscanalldisturborbeaffectedbysensorsandequipmentinstalledinthefield.Landownerswillhaveregulationsorpreferencesconcerningthesefactors,andproactivestepsarenecessaryonthepartofthescienceteamtominimizethesehazards.

Sensitivitytolocalpoliticalandsocialissuesneedtobeconsidered,asobjectivesciencedatashouldconstructivelyservethelocalpopulationsaswellasthescientistsandfundingagencies.

Sitesecurityisaprimaryconcernwhenplanningtodeploysensorsandequipmentintothefield.Humantheft/vandalismisapotentialcauseofsensordisturbanceorfailure.Whileremotedeploymentsarenearlyimpossibletosecurephysically,measuressuchascamouflaging,informativesigns,fencing,andlockboxesmaybeemployedtomitigatehostileorirresponsiblepassers-by.

Hazardstosensorsincludenaturaldisturbance/disasterssuchaswildfire,flooding,extremewinds,andmasswasting.Plannersshouldbeawareofallthesepossibilitiesandatleastexaminethelikelihoodsoftheseeventatsiteswhichhavebeenevaluatedfromthescientificpointofview.

Siteaccessibility

Fig.3Seasonalaccessmayvaryhighlydependingonlocation,limitingthetypesofmaintenancepossibleatanygiventime.

Locationsforin-situsensingmustbeaccessedfordatacollection,survey,construction,andmaintenanceoverthelifeoftheproject.Seasonalconditions,roads,andtopographydeterminewhattypesofaccessmaybeusedduringdifferenttimesoftheyear.Categoricalconsiderationsinclude:

Vehicularaccess.Commercialvehicle/equipment,2WDauto,4x4truck,ATV,snowmachine,boat,helicopter.Non-motorizedaccess.Hiking,skiing,packanimals,snowshoeing.Accessimprovements.Roadbuilding,trailbuilding,traildemarcation,safetyrails,harnessanchorpoints.Seasonalaccess.Defineaccessbyspring/summer/fall/winterseasons.Thisisdirectlyrelatedtolocalweather/topographicalconditions.Constructionaccess.Heavyequipment,specialequipment,heavyloads,andheavyfoottrafficarealllikelypossibilitiesdependingonmonitoringdesign.Minimalimpactconsiderations.Cantraffic/accessbedirectedinawaytominimizeenvironmentalimpact(e.g.erosion,vegetation)?Solutionsincludeboardwalks,bridges,raisedsteps,delineatedpathways.

Scienceplatformselection

Fig.4Scienceinstrumentationspecificationmustbedrivenbysciencequestionsandenvironmental/logisticalconstraints.

Oncethesciencequestionshavebeenestablishedandsiteconditionsareknown,anitemizedlistofsensorandsupportsystemplatforms/hardwaremaybeassembledthatbestfitstheapplicationandbudget.Primaryconsiderationsincludereliability,comparabilitywithothersimilarfieldsystems,technological

(e.g.programming)requirements,budget,andsystemflexibility(e.g.upgrades,expansion,telemetryoptions).Accuracy,precision,andexpectedperiodofusepriortocalibrationorreplacementmayalsobeaconsideration.Insomefieldsofstudy,thereareonlyoneortwoalternativestochoosefromintermsofscientificinstrumentation,whereasinotherstherecanbemanychoices.Optionscanbenarrowedbyresearchingwhatequipment/standardsareusedbyexistinginstallationstowhichcomparabilityisdesired.Onceadataacquisitionplatformandsensorarrayischosen,remainingsupportsystemsarethendesignedaroundthiscoreequipment.

Supportsystemspecification

Thesubsystemsofinfrastructure,electricalpowersupply,anddatacommunicationsshouldallbedesignedtobestsupportthescienceplatformsinallseasonsoverthelongterm.Whilesomevendorsoffer“all-in-one”packagessuppliedwithstandardinstrumentation,itisbestfortheresearchteamtoassesswhetherthesesolutionsareadequatefortheirchosensiteandobjective.Quiteoftenseveralsciencequestionsarebeingaddressedinlargerdeployments,andmultiplehardwaresolutionsfromseveralvendorsmustbecombinedintoonedeployment.Thesupportsystemsshouldbespecifiedandscaledappropriately.

Physicalinfrastructures–thesearethebuilding-blocksofanyremotedataacquisitionsite,includingtripods,towers,poles,buoys,solarpanelracks,storageboxes,fencing,concretepads,andthelike.Quiteoftenasingletripodortowerdoesnothaveadequatespaceorstructuralintegritytosupportallofthesensors,antennas,solarpanels,batteries,andotheritems,soatypicalsitedesignincorporatesmultiplestructuralcomponents.

Powergenerationandstorage–forsustaininglong-termreliabledatastreams,powerindependenceiscritical.Stationsshouldbecapableofgeneratingandstoringtheirownpowerlocally,aswellastakingadvantageofanygridorotheravailablepowerthatiswithinbudgetanddesigncriteria.BecausethemajorityofrelatedelectronicsareultimatelypoweredbyDCvoltage,havingapowergenerationsystemandDCbatterybankforeverysite(andsometimesdiscretesubsystems)isrecommendedtominimizethelossofpowerandtheresultingdatagaps.IndependentgenerationsourcesaremostcommonlyPVarrays(solar),wind,orwaterturbines.Forreasonsofcost,reliability,andmaintenanceissues,PV(solar)isrecommendedastheprimaryon-sitegenerationsourceifenvironmentalconditionsallow.Incorporatingsimplicity,redundancy,andexcesscapacityisimportantforlong-termreliability.

Datacommunications–Useofreal-timecommunication(inadditiontolocalstoragecapacity)isdesirableinordertotransmitdata,monitorsystemhealthperformance,troubleshootproblems,andminimizedatagaps.Thisisusuallyperformedusingradiocommunications(whethervendor-specificorbuildingageneral-purposefieldIPnetwork).Communicationssystemsneedtoberobust,secure,andshouldhavelowpowerrequirements(referto“DataAcquisitionandTransmission”BestPracticesforfurtherdetail).

Constructiondetails–Whenselectinganddesigningthesensorandsupportsystems,manydetailsneedtobeconsideredwhengeneratingspecificationsandpurchasinghardware.Wiresshouldbeprotectedinconduitandstorageenclosurestoavoidexposuretodamageandseasonaldegradation.Wirelengths,enclosuresizes,andmountinglocationsshouldbeplannedforaccordingly.Anchoringforsupportstructureshouldbedesignedtowithstandworst-caseweather/environmentalconditions.Useofcorrosion-resistantmetalsforstructureandhardwaresuchasgalvanizedsteelandaluminumwillgreatlyreducefailureorongoingmaintenanceproblems.

Sitelayout

Fig.5Carefullyplanningasitelayoutinadvancecanpreventsurprisesandsetbacksduringinstallation.

Sitelayoutatfirstmightseemtrivial,butisveryimportantwhenconsideringinteractionsofthevarioussubsystemsthatcaninfluencesensor/equipmentreliabilityanddataquality.Sciencequestions/objectivesshoulddrivetheplacement/separationofsensorstooptimizemeasurementquality,followedbyplacementofsupportsystemsandadditionalstructure.Solararraysneedtobeangledforsunexposure,minimalshading,andsnowshedding.Theimpactsofsitestructureonmeasurementssuchaswindeddies,incoming/outgoingradiation,cameraviewsheds,orprecipitationcatchzonesneedtobeconsideredaswellasaestheticimpactsiflocatedinaregionthatisfrequentlyvisitedbythepublic.Poweranddatacablerunsshouldbeprotectedandkeptasshortaspossible;voltagedropoverlongrunscanbeaconsiderationinlayoutanddesign.Stipulationsinsitepermitsmaybedriversofsitelayoutandconstruction.Oncethesitelayoutisdesignedandmapped,specificationofconstructionmaterials,sensorcables,andothersuppliesmaybeoptimized.

BestPractices

Fig.6Theapproachtodeploymentshouldbeasdurable,reliable,andflexibleaspossibletoaccommodateunforeseenconditionsandchangingsciencequestionsortechnologyimprovementsoverthelongterm.

Selectionofdeploymentsites,sensorpackages,andsupportsystemsareinteractingprocesseswhichcanrequiresomeiterationbeforearrivingatthefinaldeterminations.Unlessthesciencequestionsareextremelynarroworexceptionalinnature,itisunlikelythatanyoneofthesedecisionscanbemadeinavacuumwithoutconsideringtheothers.Withthisinmind,thefollowingoverarchingrecommendationsshouldbeemphasized:

P.I.consultationwithsystem/hardware/constructionspecialistswhileintheproposalphasewillminimizebudgetsurprisesorplatformcompromiselaterintheprocess.

Dataqualityandlongevityshouldbetheultimategoalswhendesigningthedeployment.Makingchoicesformorerobustandwidely-usedcoresystemsandsensorswillensurethatdatacomparabilityismaximizedandhardwareproblemscorruptingdataorcreatinggapsareminimized.Purchaseofreliableandknownequipmentisnotasexpensiveasrepairing/replacingequipmenthalfwaythroughthestudyorlosingvaluabledata.

Whendataqualityandcontinuityisparamount,useofreplicatesensorsorstationsmayberequired.

Planningforreal-timeconnectivityiscrucialforreducingfieldmaintenancetimeanddatagaps.

Optimalsiteselectiontoanswersciencequestionscanoftenbeimpededbypermitrequirementsandlandownerpreferences.Startingtheconversationwithlandownersearlyintheprocessmayimprovethechanceofgettingthelocations/deploymenttypesthataredesired.

Standardizingsensorandsupporthardware,software/programming,andstructuraldesignsacrossmultiplesitesminimizesmaintenanceissuesaswellasconstructioncostsanddesigntime.

Assessingaccesscapabilitiestothesiteswillallowforplanningofemergencymaintenanceaccess,procedures,andcosts.

Overbuildingstructure,powercapacity,andsiteinfrastructure(e.g.cabling,networking)willpreventproblemsinthecaseofunforeseeneventsorsiteexpansion.

CommonPointsofFailure

Powerproblemsareoneofthemostfrequentcausesoftotalsystemfailure.Batteryfatigue,looseconnections,andelectricalshortsneedtobeanticipatedandpreventedwherepossible.Powersystemsneedtobeprotected,over-engineered,andreplicatedwhereverpossible.

Temperatureextremesofheatorcoldcancauseelectronicormechanicalfailureofindividualsensorsandsystems.Insulatingenclosures,ventilatingenclosures(activeorpassive),andplacementofequipmentinshelteredzonescanhelpalleviatetheseproblems.

Humidityandcondensationcanbeaseriousissueforelectronicslongevityandcircuitperformance(includingaccuracy).Inzonesofhighaveragehumidity,sealingenclosuresandprovidingsomemeansofreducinghumidity(e.g.desiccantpackets)isdesirable.

Sensorscanbedisruptedbywildlife.Hardeningofsensorsystems(e.g.,armoringcables,fences)canhelpwithsomeproblems.Near-realtimedatafeedsallowrapiddetectionofproblemsthatwilloccur.

Lightningstrikesornear-missesareacommonproblematexposedormountainoussites.Extensivegrounding(e.g.exposedcopperwirenetwork)anduseofsurgeprotectionthroughoutthepowersystemandatendsoflongpoweranddatacablerunswillcompartmentalizethesiteelectricallyandprotectasmanycomponentsaspossible.

Lackofdatastoragereplicationcancauselossofdata.Incorporatinghighcapacitystorageon-site(datalogger)aswellasoff-site(database),thisproblemcanbemitigated.

Personnelturnovercoupledwithlackofprocessandhardwaredocumentationcanleadtodatadiscontinuityorequipmentfailure(seeSensorManagementandTrackingforadditionaldetails).

CaseStudies

NevCANTransectsorWalkerBasin(Scotty)---Tobecompleted,willincludeastationdesignandsystems,maintenance/accessplan,dataflow,andsomephotos/diagrams.AndrewsResearchSites(Adam)---TobecompletedSevilleta-Reneetocompletewithmultiplecasestudyexamples

Data Acquisition and Transmission

Concerns the acquisition of sensor data from the field and

remote control of the system, while ensuring the integrity

of those data.

backtoEnviroSensingClustermainpage

Contents1Overview2Introduction

2.1Considerations2.1.1CollectionFrequency2.1.2Bandwidth2.1.3Protocols

2.1.3.1Hardware2.1.3.2Network

2.1.4Line-of-sight2.1.5Power2.1.6Security

2.1.6.1PhysicalSecurity2.1.6.2NetworkSecurity

2.1.7ReliabilityandRedundancy2.1.8Expertise2.1.9Budget

3Methods3.1Manual3.2Unidirectional

3.2.1GeostationaryOperationalEnvironmentalSatellite(GOES)3.2.2MeteorBurstRadio3.2.3IridiumSatelliteservice

3.3Bidirectional3.3.1ISMbandradionetwork3.3.2Cellular3.3.3Vendor-specificradionetwork3.3.4Satelliteinternet3.3.5Licensedradio3.3.6MeshNetworks3.3.7Wired

4BestPractices5CaseStudies6Resources

6.1GOES7References

OverviewTraditionally,environmentalsensordatafromremotefieldsitesweremanuallyretrievedduringinfrequentsitevisits.However,withtoday'stechnology,thesedatacannowbeacquiredinreal-time.Indeed,thereareseveralmethodsofautomatingdataacquisitionfromremotesites,butthereisinsufficientknowledgeamongtheenvironmentalsensorcommunityabouttheiravailabilityandfunctionality.Moreover,thereareseveralfactorsthatshouldbetakenintoconsiderationwhenchoosingaremotedataacquisitionmethod,includingdesireddatacollectionfrequency,bandwidthrequirements,hardwareandnetworkprotocols,line-of-sight,powerconsumption,security,reliabilityandredundancy,expertise,andbudget.Here,weprovideanoverviewofthesemethodsandrecommendbestpracticesfortheirimplementation.

IntroductionTheclassicmethodofacquiringenvironmentalsensordatafromremotefieldsitesinvolvesroutinetechniciansitevisits,inwhichs/heconnectsalaptoptoadatalogger,anelectronicdevicethatrecordssensordataovertime,andmanuallydownloadsdatarecordedsincethelastsitevisit.Oncethetechnicianreturnstothelab,s/heisthenresponsibleformanuallyuploadingthesedatatoaserverforlaterprocessingandarchival.

Whilemanualacquisitionmethodsaregenerallyeffective,therearemanyreasonstoautomateenvironmentalsensordataacquisition.Forinstance,ifthesiteisnotvisitedfrequentlyenough,thedataloggermemorycanbecomefullanddependingonhowthedataloggerisprogrammed,sensordatawilleitheroverwriteitselforstoprecordingentirely.Thissituationoftenoccursatremotesitesthatbecomeperiodicallyinaccessibleduetoenvironmentalconditions,suchasheavywintersnowpack.Second,theburdenofresponsibilityfornotonlythesuccessfulretrievalofthesensordata,butalsothesubsequentuploadtoaserverforsafekeeping,liessolelyonthetechnician.Moreover,withanyinstrumentedsite,thereistheinherentpotentialforsensororpowerfailure.Automateddataacquisitionsystemsallowtechnicianstolearnofsuchissuespriortovisitingthefieldsite,reducingthepotentialfordataloss.Finally,automateddataacquisitionmethodssavehundredsofpersonhoursandvehiclemilesthatwouldhaveotherwisebeenspentmanuallyacquiringdataortroubleshootingunanticipatedproblems,thusimprovingtheoverallqualityofthedata.

Bidirectionalcommunicationmethodshavetheadditionaladvantagesofallowingtechnicianstoremotelychangesystemsettings,testconfigurations,andtroubleshootproblems.Thesemethodsalsoopenthefieldtoawidevarietyofdevicesthatmaybedeployedataremotefieldsite,suchascontrollablecameras,on-sitewirelesshotspots,andIP-enabledcontrolorautomationequipment.

Considerations

Thedecisionofwhichsensordataacquisitionmethodtouseatagivensiterequiresthecarefulconsiderationofmanyfactors,forwhichweprovideanoverviewhere.

CollectionFrequency

Whatisthedesiredcollectionfrequency?Howimportantisreal-timeaccessibility?Forinstance,thedatacouldberetrievedinnearreal-time(everyfewminutestoeveryfewhours)orjustonceortwiceperday.Highfrequencydatasetsorimagesshouldbecollectedmorefrequently.

Bandwidth

Bandwidthcanbeanimportantconsideration,particularlywhenhighfrequencydataarebeingcollected.Willcamerasbeutilizedatthesite?Whereisbroadbandpointofpresence(POP)located?Doesequipmentworkwithrequiredbandwidth?Morefrequentcollectionintervalsrequirelessbandwidthpertransmissionarearerecommendedforhighfrequencydatasetsorforimages.

Protocols

Hardware

Manydataloggersonlyhaveserial(RS232)ports,thereforerequiringaserial-to-ethernetconvertertointerfacewithautomatedacquisitioninstrumentation.USB.

Network

PublicIPnetworksareadvantageousoverprivateIPnetworksinmanycasesbecausetheycanbemanagedfromanywherethereisaconnectiontotheInternet.RemoteaccesstoprivateIPnetworksrequiresadvancednetworkexpertisetoprovisionportforwardinginfirewallsand/orVPN.

Line-of-sight

Fig.Anexampleofanear-Line-of-Sight(nLoS)condition,whereinterveningterrainand/orvegetationcaninterferewiththeradiosignal.Inthiscase,theantennaheightsatbothendsareactuallyatthe8mlevel,mitigatingtheeffectsomewhat.Thelinkisoperational,albeitwithareducedReceivedSignalStrengthIndication(RSSI)duetothepresenceofanobstructioninthelinksFresnelzones.

Evaluationofenvironment,topography,andvegetation.CanbeinitiallydeterminedusingLOScalculators,whichuseDEMmodels,butmustbegroundtruthed.Oftenrequiresarepeaterinfrastructure.Choosingrepeaterlocationsinvolvesmanyofthesameconsiderationsforchoosingsiteselection.Distancetorepeaterisafactor.AutomatedsensordataacquisitionmethodsrequiremanyofthesamesiteselectionconsiderationsdiscussedinSensorSiteandPlatformSelection,particularlywhenselectingrepeatersites.

Power

Howimportantisreal-timeaccessibility?(e.g.,whatisdesiredcollectionfrequency?).Whatarethetransmissiontypepowerrequirements,onsitebuffersize.Redundancyispreferred,especiallyinveryremotesites.Ifpowerisdisrupted,willsystemresumeoperations?

Security

PhysicalSecurity

Forphysicalsecurityconsiderations,refertoSensorSiteandPlatformSelection.

NetworkSecurity

Itisrecommendedthatencryptionkeys,suchasWPA2encryption,beconfiguredtopreventunauthorizedaccessofdataacquisitionequipmentorsensordata.AprivateIPnetworkcanfurtherhelptopreventunwantedaccess,butalsopreventseasyremotemanagementbynetworkadministratorsunlessaVPNisinstalled.

ReliabilityandRedundancy

oftransmissionmodeandofequipment.Also,networkinfrastructure.

Expertise

Someacquisitionmethodsareplug-n-playwithsubstantialvendorand/orcommunitysupport,whileothersrequireafairamountofhardwareandnetworkexpertisetoconfigureandmaintain.AllacquisitionmethodsrequirefundamentalknowledgeofIPnetworkingalongwithbasicelectronics,radio,andantennatheory.

Budget

Costsofimplementingadataacquisitionandtransmissionmethoddependonexistinginfrastructure,initialsetupcostsincludingpersonnel,personnelcosts,specificallytechnicianmaintenance,andrecurringcosts,suchasmonthlyrecurringcostswithcellulartransmission.

MethodsTherearethreegeneralcategoriesofremotesensordataacquisitionmethods:manual,unidirectionaltelemetry,andbidirectionaltelemetry.Eachhasadvantagesanddisadvantagesintermsofinfrastructure,cost,reliability,requiredexpertise,andpowerconsumption.

Manual

Thismethodinvolvesscheduledvisitstothesitebyafieldtechnician,whousesaserial-to-computerconnectionand/orflashmemorytransferofenvironmentalsensordatatotheirlaptoporsimilardevice.Uponreturningfromthefield,thetechnicianisresponsibleformanuallyuploadingthesedatatoaserver.Thisacquisitionmethodissimpleandmaybetheonlyoptionwhensiteinstrumentationgenerateslargedatafiles.However,thismethodprovidesnoreal-timedataaccessandtherefore,noknowledgeofinstrumentationfailures.Moreover,thereliabilityofthismethodiscompletelydependentonthetechnician.

Unidirectional

Unidirectionalsensordataacquisitionmethodsinvolveregularlyscheduledwirelessdatatransmissionfromaremotesitetoaserver,withnooffsiteabilitytocontrolorchangesensorsettings.Theseinclude...

GeostationaryOperationalEnvironmentalSatellite(GOES)

Fig.Atypicalcircular-polarizedGOESantennaforone-waybursttransmissionoflimiteddata

Thismethodispreferredinveryremoteandpotentiallyruggedareaswhereotherautomatedtransmissionmethodswouldnotwork.Whileitdoesnotrequireline-of-sitetoarepeaterlikemostothertransmissionmethods,itdoesrequireaviewtothesouthernsky.Additionally,theGOESmethodhasalowpowerrequirement.However,GOEShasseveraldisadvantages,includingahighinitialinvestment(<$5K)andrequirestrainingandlicensing.Moreover,lessthan100valuescanbetransferredperhour,makingitdisadvantageousforsitesthatsampleathighfrequencies.

DatatransferspeedforGOESsystemsistypicallylimitedto1200bitspersecondwith10secondtransferassignmentsoccurringonceeveryhour.Duringeach10secondperiod,onecantransferupto1500bytesofdata(12,000bits/8)includingthe53byteGOESheaderstring.Inotherwords,maximum1447byteswithtimestampsandmeasuredvaluescanbetransferredtothesatelliteduringonetransmissioninterval.Mostoften,GOESmessagesareorganizedinatimeorderedformatsimilartothefollowingexample:

0105E59013190131824G30+1NN196WXW00517 0 13:00:00 23.7,43,5,245,-55.1,5,245,23.7,23.7,12.8 1 12:30:00 23.7,43,-55.1,204,1011.09,0.000,0.0,24.7,0.270,-0.456,-0.997,-0.416,-2.687,23.5,0.00,214.81,0.00,5,245 1 12:45:00 23.7,43,-55.1,204,1011.11,0.000,0.0,24.7,0.249,-0.468,-0.994,-0.436,-2.650,23.5,0.00,214.82,0.00,5,245

Here,firstlinerepresentstheGOESheaderstringthatincludestheaddress,dateandUTCtimeofthetransfer(13:18:24),signalinformation,satelliteinformation,messagelengthandsomeothercharacters.Intheexampleabove,thelinesthatfollowcarrythetimestampandvalueinformationfromthesensorsets0and1.Asthelengthofeachcharacterinthesensorsetstringis1byte,wecanseethatourGOESmessagehasapproximately280bytesusedfrom1447bytesthatareatheoreticalmaximumforthetransfer.However,inordertoaccommodatethepossibledifferencesbetweenthestationsendingtime,decoders,andscheduledreceptiontime,weneverwanttoreachthisvalue.

ProspectiveusersoftheGOESsystemmustfillouttheSystemUseAgreement(SUA)formand,uponapproval,receiveandsigntheMemorandumofAgreement(MOA)fromtheNOAA'sSatelliteandInformationService(NESDIS).AftertheMOAisapproved,NESDISwillissueachannelassignmentandanIDaddresscodetotheapplyingorganization.Non-U.S.governmentandresearchorganizationsmustbesponsoredbyaU.S.governmentagencyinordertoapplyforthispermission.Uponapproval,allusersmustpurchaseequipmentthathasbeencertifiedtobecompatiblewiththeGOESDataCollectionSystem.AsofMay2013,GOEStransmittersmustconformtothecertificationstandardsversion2(alsoknownasCS2).ThischangewasimplementedtodoublethenumberofGOESchannelsonthesamebandwidth.Asaresult,oldGOEStransmittersthatareonlycompatiblewiththeCS1standardcannotbeusedfornewNESDISassignments.ForassignmentsobtainedpriortoMay2012,CS1transmitterswillbesupporteduntil2023.IfyouconsiderbuyingtheusedequipmentforGEOStransmission,makesurethetransmittersarecompliantwiththeCS2standard.

MeteorBurstRadio

LikeGOES,thismethoddoesnotrequireline-of-sightandhasalowpowerrequirement.However,itrequiresalargeantenna,arrangementofservice,andhasaveryslowtransmissionrate.ItworksbyreflectingVHFradiosignalsatasteepangleoffthebandofionizedmeteoritesthatexist50to75milesabovetheEarth.SeeSNOTELandITUCaseStudiesformoreinformation.

IridiumSatelliteservice

Iridiumprovidestheonlycompleteglobalsatellitecoverage.ThenewIridiumPilotisavailableuntil2016.ThenextgenerationofIridiumisexpectedtobeimplementedaroundthattimeframe.ThePilotisveryeasytoinstallandmaintainwithawaterproofbodyandUSBinterface.Withthissimpleinterfacealaptopcanbeconnectedandsurfingthewebwithinminutes.Recently,thecosthasbecomemoreaffordablewithaperdatausagecoststructure.SinceIridiumoperatesintheLbanditisnearlyimpervioustoweather.Iridiumisusedprimarilyformarinecommunication.

Bidirectional

Thismethodinvolvesbidirectional(andtypicallywireless)transmissionofdatafromaremotesitetoaserver,withtheabilitytomodifydataloggerprogramsand/orsensorsettingsremotely.Thesemethodsgenerallyrequireline-of-sightandsecurityconsiderations(bothnetworkandphysical).Sometimes,canbepurchasedfromanInternetServiceProvider(ISP)ifthereiscommercialcoverageinthearea,orcanbemanuallyinstalledinremoteareas.Often,connectivitycanbeextendedtocomputersonsite.Combinationofseveralmethodsmayberequiredincertainsituations.

ISMbandradionetwork

Fig.Threedifferentantennatypesusedforbi-directionalmicrowavebandcommunication:a)5.xGHz24"dual-polaritydish,30dBigain;b)2.4GHzsingle-polaritygrid,24dBigain;andc)5.xdual-polaritypanel,23dBigain.Thehigherthegainvalue,themorenarrowtheantennadirectivity,increasingsignalstrengthinthedesireddirectionandrejectingadjacent

interference.Eachofthesedesignshasprosandcons,dependingontheapplication.

(unlicensed,900MHz,2.4GHz,5.xGHz):TheISMbandradiosarecommonlyreferredtoas"WiFi"radios(eventhoughthesearegenerallyusedasbackhaulsandnotwide-areaaccesspoints)andcomeinavarietyoffrequencies.Thismethodhasmanyadvantagesinthatitisnonproprietary,hasnorecurringcosts,usespublicradiofrequencies,allowstransmissionoflargedatasets,utilizesinexpensivehardware,isnotrestrictedtoasinglevendorordevicetype,andhasincreasingcompatibilitywithmanydevices.However,itrequiresline-of-site(LoS)ornear-line-of-sight(nLoS),anetworkinterfaceonloggersanddevices,andbasictoadvancednetworkadministrationskills.Theseradiosarealsosubjecttointerference,particularlyinmorepopulatedareas,andcanhavehigherpowerrequirementsthanothertransmissionmethods.

Cellular

Thismethodhasprolificcoverageandminimalongoingmaintenance.However,itrequiresareliablecellularnetworkbepresentandcomeswithmonthlyrecurringcosts.Occasionallyacontractmayberequiredunlesscanbenegotiatedthroughuniversityororganization.

Vendor-specificradionetwork

Vendorspecificradionetworksuseproprietaryprotocolsandaretypicallymoreexpensivethansomeotheracquisitionmethods,buthavetheadvantageofbeingrelativelyeasytosetupandmaintain.Forexample,Freewave

Satelliteinternet

Thismethodcangetlimited2-wayconnectivityintoaremotesite,albeitathighmonetarycostsandsignificantpowerconsumption.Ithasslowuplinkspeeds,highlatency,requiresasubscription,andon-sitevendorsetupisrequired.

Licensedradio

Thismethodisexpensiveandrequiresapurchaseofalicensedfrequency.

MeshNetworks

Ameshnetworkisanetworktopologyinwhicheachnoderelaysdatathroughoutthenetwork.Ameshnetworkwhosenodesareallconnectedtoeachotherisafullyconnectednetwork.Duetotheinherentredundancyinmeshnetworkdesign,meshnetworksaretypicallyquitereliable,asthereisoftenmorethanonepathbetweenasourceandadestinationinthenetwork.Meshnetworksaretypicallywireless,butcanbewired.Meshnetworksarenotverycommon,especiallyatlargespatialscales,sinceeverydevicemustbeconnectedtoeveryotherdevice.Theinitialinvestmenttobuildsuchanetworkisconsiderablyhigherthanotheracquisitionmethods.Meshnetworks,eitherpartiallyorfullyconnected,aremostcommonlyusedindistributedsensornetworks.

Wired

Whileallmethodsdiscussedutilizewirelesstransmissionprotocols,wiredbidirectionaltransmissionmaybepossibleviain-groundoraerialcopperorfiberoptics.

BestPracticesThinkaboutdataacquisitionaspartofsitedesign.Itismoreexpensivetoaddtelemetrytoapreexistingsitethantointegratewithinitialsiteconstruction.Makesuretoincludeacquisitionmethodpowerconsumptioninthesitepowerbudget,oraseparatepowersystemwillberequired.UsesoftwaretoolswithradioorahandheldspectrumanalyzertosurveyRFconditionson-site.Forinstance,urbanareasaretypicallynoisierwithrespecttoRFinterference,andforWi-Fitransmissionmethods,5Ghzfrequenciesarepreferred.Useabidirectionaltransmissionmethodtoprovidemorecontrolandflexibility.Over-engineerpowersystem,especiallywhenpoweringrepeatersandothersitesinhardtoreachareas.Useequipmentthatcanconservepower(sleepmode)Provideadequatelocalstoragefordisruptedtransmissions.Adequate“offlogger”localstorageisrecommendedtoavoidlosingdatawhen/ifloggerisreset.Provideredundancy,suchthatwhenonelinkgoesdown,thesiteisstillremotelyaccessible.Thisisrelatedtonetworkarchitectureplanning-multiplegeographic/hardwarepathsalongbackhaulroutestofieldhubsishighlydesirable.Examplesinclude:parallelbackhauls,multipleinternetpointsofaccess,"failover"paths.Havinga"backdoor"intothenetwork,evenoverreducedspeedlinks,canallowatechtoremotelytroubleshootproblemsonthemainlinks.Standardizetransmissionprotocolacrossallsitestoprovideeasiernetworkmanagement.Matchradioband,power,antenna,andbandwidthtoapplication.Forinstance,whenasitegenerateshighfrequencysensordata,highbandwidthandhighdatacollectionfrequencyarerecommended.UseanarrowbandwidthforyourRFdevices/coordinatefrequenciesbetweenradiosystemsThoroughlydocumentallsitecoordinates,IPaddresses,maps,radioazimuth,zenith.WhenusinganIPbasedacquisitionmethod,usepublicIPaddressesforeasierremotemanagementofdevices.

CaseStudies

NevCAN:NevadaClimate-ecohydrologicalAssessmentNetwork-UniversityofNevada,Reno(UNR);DesertResearchInstitute(DRI),UniversityofNevada,LasVegas(UNLV)SevilletaWirelessNetwork-SevilletaLongTermEcologicalResearch(LTER)ProgramandSevilletaFieldStation;DepartmentofBiology;UniversityofNewMexico(UNM),Albuquerque,NewMexico,USAVirginiaCoastReserveLTERWirelessNetwork-VirginiaCoastReserveLongTermEcologicalResearch(LTER)ProgramSNOTELITU

Resources

GOES

NewNESDISAssignmentsCS2StandardCompliance

References

Sensor Management Tracking and

Documentation

Outlines the importance of communication between field

and data management personnel as field events may alter

the data streams and need to be documented.

backtoEnviroSensingClustermainpage

Fig.Documentationofsensorinstallation,maintenance,andrelatedsystemsiscriticaltolong-termdatausability.

Contents1Overview2Introduction3Methods

3.1Whatshouldbetracked3.1.1Documentationatsetuptime3.1.2Infrastructureeventstotrack3.1.3Siteleveleventstotrack3.1.4Sub-componenteventstotrack

3.2Howtotracktheinformation4BestPractices

4.1Documentspecificinformationduringnormaloperations4.2Maintainingtherecordsandlinkingtoaffecteddatastreams4.3Managingsensorconfigurations

5CaseStudies

OverviewAutomatedobservationsystemsneedtobemanagedforoptimalperformance.Maintenanceoftheoverallsensorsystemincludeanythingfromrepairs,replacements,changestothegeneralinfrastructure,todeploymentandoperationofindividualsensors,andseasonaloreventdrivensitecleanupactivities.Anyoftheseactivitiesinthefieldmayaffectthedatabeingcollected.Therefore,consistentanduniformrecordsofmaintenance,service,andchangestofieldinstrumentationandsupportinginfrastructureserveasmetadataforlongtermqualitycontrolandevaluationofthesensordata.

Inthischapter,wedescribethetypesofmanagementrecordsthatshouldbekeptandthevariousmethodsforcollecting,maintaining,communicating,andconnectingthisinformationtothedata.Itisimportanttocreatetrackinganddocumentationprotocolsearlyonbecausetheseprotocolswillsupportandguidecommunicationsandworkbetweenfieldanddatamanagementpersonnel.

Realtimemonitoringofsystemhealthandalertingsystemsarediscussedinthemiddleware,qualitycontrol,andtransmissionsectionsofthisdocument.Althoughsomeoftheseparametersdonotaffecttheactualdataquality,trackingofthesesystemperformancediagnosticdatamaybehelpfultodetectpatterns

andpreventfuturedataloss,interveneremotely,andschedulesitevisitsmoreeffectively.Calibrationproceduresandschedules,maintenanceactivities,andreplacementschedulesarehardwarespecificandwillnotbecoveredhereindetail.

IntroductionDataarecollectedtodetectchangesintheenvironment,effectsoftreatments,disturbancesetc.,andinalldatacollectiongreatcareistakentonotmaskthesignatureofeventsofinterestwithimpactsfromunavoidable,samplingrelateddisturbances.Fieldnotesareusuallyassociatedwiththerawdatatobeabletodiscernanaturaleventofinterestfromamanagementevent.Datacollectionapproachesusingautomatedsensingnetworksarebecomingmorecomplexwithmanypeopleinvolvedinthedatagathering,management,andinterpretationactivities,andcommunicationamongallinvolvedpartiesisbecomingmoreimportantandmorechallenging.Fieldnotescanbeausefulvehicleforthiscommunication.Everyoneusingolderlong-termdataknowsthevalueoffieldnotebookstohelpunderstandandinterpretadataset.Fieldnotesareequallyvaluabletofutureusersforasensordatastream,particularlyifthenotesareinterpretedsuchthatinformationisintegratedwiththedataviadataqualifyingflagsandmethoddescriptioncodes.

Currentlytherearenostandardsforflagcodesetsorfordefiningwhicheventsshouldbeflaggedandhowtoefficientlycommunicatewithdatausers.Hereweattempttopresentalistofeventsthatareusefultotrackandthathavebeenhelpfulinthepasttoguidedatausersintheinterpretationandevaluationofthedata.Tomanagethisinformationtheconceptsofa‘logicalsensor’,a‘physicalsensor’,a‘method’and‘eventcodes’haveprovenuseful.

A‘logicalsensor’orasensordatastreamcanbedefinedbyalocation,height/depth,andmeasurementparameter,regardlessofwhatexactphysicalsensororhardwareisusedtologmeasurements.Anexamplewouldbe‘airtemperatureat3mabovethegroundatsiteA’.However,overtimethe‘physicalsensor’willhavetobecalibrated,eventuallyreplaced,andanewtypeofsensormaybechosentoprovidemoreaccuratemeasurements.Ifhardwareisswappedoutfortechnicalreasons,thedatastreamstillrepresentsthesitelocationforthatmeasurement,andthenotionofa‘logicalsensor’allowsidentificationofaconsistentdatastreamovertime.

Changesinthetypeofsensoror‘method’mightbetrackedwithamethodcodeassociatedwiththelogicalsensor.Ofcourseshouldareplacementsensorbesignificantlydifferentsuchthatthepastandnewdatastreamarenotcomparable,thenanewlogicalsensorstreamshouldbeinitiated.Eventssuchasroutinecalibrationmightbeflaggedwithan‘eventcode’ratherthanachangein‘method’,evenifthiseventhaslastingeffectsonthedata,i.e.,moreaccuratedata.Aneventcodemayserveasameanstolinktoindividualfieldnotesfortheevent.‘Physicalsensors’shouldalsobeindividuallyidentifiablebylocationandtrackedthroughacalibrationorreplacementschedule.

MethodsWhatshouldbetrackedBasicinformationonthesiteandhardwareconfigurationneedtoberecordedatinstallationtime.Duringnormaloperationseventtrackingcanbedoneatseverallevelsofgranularitywithrespecttoaresearchprogram.Forexample,itmaybedoneattheleveloftheentireinfrastructure,atasite,oratasub-componentofasite.Theinformationabouteacheventneedstobepropagatedorconnectedtoallrelevantdatastreams.Followingareexamplesofwhatshouldbetrackedateachoftheabovelevels,intermsofimpactontherecordeddata:

Documentationatsetuptime

Locationlat,long,elevation(and/ordepth),direction(e.g.camerafacingnorth),Locationfromacertainreferencepoint(e.g.towerbase)SitedescriptionSitephotoswithmetadata,photosofprocedures(howdoyouchange...),photoofsensor(sootherscaneasilyrecognize)ManufacturersspecsandIDofinstruments(make,model,serialnumber,range,precision,detectionlimit,calibrationcoefficient)Instrumentation(e.g.datalogger,multiplexer,sensor)wiringdiagrams(thisshouldbepartoftheloggerprogramcomments,aheadersectionwiththewiringdescriptionchannelbychannel)Powerwiringdiagrams(e.g.howmanysolarpanels,aretheyhookedupinseriesorparallel,etc.)NetworktopologyandIPaddressesSoftwareusedforcalculatingmeasurements(otherthandatalogger)Instrumentationdeploymentdate(the“golive”date)

Infrastructureeventstotrack

Changestodataloggers,multiplexers,ordataloggerprograms(dataloggerprogramsmaybearchived)Powerproblems,includingbatteryvoltageEnclosuretemperatureandhumidityPlatformmaintenance(e.g.,towerinspection,tramlineleveling,etc.)Samplingprotocolchanges(e.g.,timing,routinechangingorupgradingofsensorparts,instrumentchangeorreplacement)RF/networkperformancedegradation(preventssome/alldatafrombeingtransmitted;trackhealth/statusofIPnetworkdevicesusingSNMPstreamstoNagios,etc.)

Siteleveleventstotrack

Sitedisturbance(e.g.,animal,human,weathercaused)Sitevisits(presenceofpeoplemaychangemeasurements)Sitemaintenance(e.g.,cuttingbrush,cuttingtrees,etc.)Changestosensornetworkdesign,includingadditionsordeletionsofsensors

Sub-componenteventstotrack

Here,weincludecomponentslikeindividualtelemetry,powersystems,instruments,sensorcomponents,etc.Whileeachcomponentdoesn’taffectthewholesystem,theystillmayinfluencetheinterpretationofthemeasurements.TotrackindividualcomponentsasystemofIDsmaybedevelopedforallcomponentsandsupportedbyBarcodes,Geo-LocationTagsandMicrochipEncodedSensors.

SensorfailuresSensorcalibrationsSensorremovalSub-sensoraddition,removal,orchange(pluggablesub-sensorpositionswithinthemainsensorneedtobenotedandkeptconsistent)Sensorinstallation(replacement)Sensormaintenance(cleaning,changeofparts)SensorfirmwareupgradesEnclosuretemperatureandhumidityRepositioningofsensor(e.g.,moveupduringwintertobeabovesnowlineNormal(nonextreme)disturbancesastheyarenotedandremoved(e.g.,sticksinweirs)Methodologychanges(e.g.,temperatureradiationshieldchange)

Howtotracktheinformation

Minimallydocumentingorloggingsiteeventsorproblemsmightbeinatablestructuresuchas:

SiteID DataloggerID SensorID datetimebegin

datetimeend category notes person

controlledvocabulary

However,usuallyalotmoreisrecordedateachsitevisit-seeusecases.Acontrolledvocabularyisveryimportanttocategorisetheeventforlaterinterpretationandflagginginthedatasetandshouldbeestablishedasearlyaspossiblewithprojectspecificterms.Severaldatabasestructurestomaintainthisinformationandconnecttotheactualdataarecurrentlybeingproposedanddiscussedbelowinusecases.

BestPracticesEstablishanddocumentproceduresandprotocolsforsitevisits,installationofnewsensors,maintenanceactivities,calibrations,communicationbetweenfieldanddatapersonnel.Suchprotocolsmayincludepre-designedfieldsheetsorsoftwareapplicationsonfielddataentrydevices,bothofwhichshouldbesynchronizedwithacentralstoragesystemtowhichallpartieshaveaccess.Observationsinthefieldmayalsobemadeandrecordedbyresearchersandfieldpersonnelnotdirectlyinvolvedinthesensorsystemmaintenance,andprovisionsshouldbemadetocapturethatinformationandcommunicateittoresponsiblestaffmembers.

Inadditiontocapturingthefieldeventsmentionedaboveitisgoodpracticeforthedatamanagementstafftoregularlymonitorthedataandconferwiththefieldcrewwhenanomaliesarenoticed.Thisfrequentlywillbringupadditionalinformationthatneedstoberecordedinthefield.Itisalsogoodpracticetohavethedatamanagementstaffvisitthesite,periodicallyassistwithfieldmaintenanceactivitiestobetterunderstandandinterpretfieldnotesandgenerallyinteractwiththefieldstaff.

Allphysicalsensorsshouldbeuniquelyidentifiable.Thismaybeachievedbyrecordingaserialnumber,attachingabarcode,usingintelligentsensorswhicharecapableofstoringtheirownmetadataandwhichcanbeaccesseduponconnection.Thisisparticularlyimportantforsensorsthataremovedaroundorarepulledformasscalibrationandredeployed.SensorlocationandcalibrationschedulesshouldbetrackedbyeachsensorwithID.

Documentspecificinformationduringnormaloperations

Eitherapre-designedfieldsheetoradataentryapponafielddevice(tablet,laptop,etc.)helpsremembereverydetailtorecord.Itisalsohelpfultodefinealistoftermstodescribethemostcommonproblemsinaconsistentwayforlateranalysis.DocumentsiteID,date,time,person(s),siteconditions,tasksperformedeverytimeasiteisvisited.Whenupdatingdataloggerprograms,useanewprogramnameforeverychange.Itisadvisabletosaveolddataloggerprograms.Useachangelogsectioninadataloggerprogramcommentheadertonotedate,author,anddescriptionofdifferencesfromlastdataloggerprogram.i.e.versioning/revisioncontrolForsensorspecificeventsnotethesensorID(BarCodes,Geo-LocationTags,MicrochipEncodedSensors(NEON'Grape'),orintelligentsensorsthatstoreandprovidetheirownmetadatauponconnection).

Maintainingtherecordsandlinkingtoaffecteddatastreams

Asmentionedearlier,thisrecordkeepingisaneffortincommunicationbetweenfieldanddatapersonnelaswellascommunicatingeventstofuturedatausers.Henceagoodpracticeistopermanentlylinkthisinformationtothedataset.Thismaybeachievedondifferentlevels-adescriptioninametadatadocument,anindicatorofamethodforadataseriesoreachdatavalue,aflagindicatingaonetimeevent

atacertaindatavalue.Asaminimumaffecteddatashouldbeflaggedinadifferentcolumnwithinthedatatable.

Followingtheconceptofalogicalsensor,certaineventsshouldtriggerthestartofanew‘method’descriptionwhenthedatastreamisaffectedmorethanregularcorrectionscanaccommodate(e.g.,newsensorusingadifferentmethodsofmeasurement).Inthiscaseitisgoodpracticetoruntheoldandthenewsensorsidebysideforawhiletocompare.Nohardandfastguidelinesareavailablefordecidingwhenamethodchangeoccursandwhenawholenewlogicalsensorstream(i.e.,differentdatasetordatatable)shouldbestarted.TheseconceptsarewellimplementedintheCUAHSIODM,pleaseseethosedocumentsforfurtherdiscussion.

Mostevents,however,canbehandledbywelldocumentedflags(sensorcalibration,sitemaintenanceactivities,disturbances,etc.).Fordocumentation,flagsinthedatafileshouldlinktoadatabasewithmoreextensiveexplanationsoftheevents.

Managingsensorconfigurations

Anumberofsensorsprovidecoremeasurements,butwillalsoprovidetheabilitytoexpandthesensorviaoneormorepluggableports.Whenasub-sensorisconnected,thedatafromthesub-sensorareusuallyaddedtothemaindatastreamasavoltagemeasurementthatgetsconvertedtothemeasurementparameterunitspost-transmission.Trackboththenumberofsub-sensorsandtheirportpositions,sinceachangetoeithermaycauseproblemsinprocessingthedatastreaminmiddlewareapplications.Forinstance,awatersamplerlikeaCTDmayprovideportstoconnectsubsensorsfordissolvedoxygenorturbiditymeasurements.NotethattheDOsub-sensorshouldalwaysbeconnectedto,say,voltageport1,andtheturbiditysensorisalwaysconnectedtovoltageport2,andvoltageport3isempty.

SeealsomiddlewarecapabilitiesandQA/QCproceduredocumentationinthoserespectivesections.

CaseStudies

Casestudy:DatamodelfortrackingsensorsandsensormaintenanceattheUtahWaterResearchLaboratory(J.Horsburgh,September2013)

ThedatabasedesigndiagramdepictsthedatamodelasitisusedattheUtahWaterResearchLaboratory,UtahStateUniversity.ItwasdevelopedbyJ.Horsburghandhisresearchteam.CurrentlyeffortsareunderwaytoextendtheCUAHSIODMtostorethiskindofmetadatabasedontheexperiencewiththisdatamodel.

Casestudy:TwoexamplefieldsheetsfromtheHJAndrewsExperimentalForestinOregon.

HJAndrewsstreamgagechecksheetHJAndrewswatershedchecksheet

Streaming Data Management Middleware

Discusses software features for managing streaming

sensor data.

backtoEnviroSensingClustermainpage

Fig.1Thepositionofmiddlewareinagenericsensordatamanagementsystem.

Contents1Overview2Introduction

2.1ResearchAgenda2.2TechnologicalRequirements2.3PersonnelSkills

3MiddlewareClassifications3.1Classificationbyfunctionality3.2Classificationbyproprietyandtype

4BestPractices5CaseStudies

5.1MarmotCreekResearchSite,RockyMountains,Canada5.2UniversityofTexasatElPaso'sSystemEcologyLab,Jornadaresearchsite,NM

6MiddlewarePackageDescriptions7References

OverviewMiddlewarearesoftwarepackagesandproceduresthatresidevirtuallybetweendatacollectors,suchasautomatedsensors,anddata‘consumers’,suchasdatarepositories,websites,orothersoftwareapplications.Middlewarecanbeusedtoperformtaskssuchasstreamingdatafromdataloggerstoservers,archivingdata,analyzingdata,orgeneratingvisualizations.

Manymiddlewarepackagesareavailablefordevelopingacomprehensive,reliable,andcost-effectiveenvironmentalinformationmanagementsystem.Eachmiddlewareoptioncanhaveauniquesetofrequirementsorcapabilities,andcostscanvarywidely.Asinglemiddlewarepackagemaybeusedifitincludesalloftheuserrequirements,ormultiplemiddlewaremaybebundledintoadatamanagementsystemiftheyarecompatibleorinteroperablewitheachotherandtherestofthedatacollectionandmanagementsystem.

Thissectiondescribesmultiplemiddlewarepackagesthatarecurrentlyavailable,andprovidesexamplesofhowdifferentsoftwareandproceduresarebeingusedtocollect,analyze,visualize,anddisseminatesensor-suppliedenvironmentaldata.

IntroductionTherearemultiplefactorsthatmayaffectthechoice,use,andperformanceofmiddleware.Thesefactorsmaybeclassifiedaccordingtoagroup’sresearchagenda,technologicalrequirements,andpersonnelskillsets.

ResearchAgenda

Theresearchagendaofagroupisamajordeterminantofthetypeofmiddlewaresystemneeded.Agroupfocusedononlyoneorafewnarrowlyfocusedresearchquestionsmayneedfewertypesofsensorsandconsequently,fewersoftwaremodulesmaybeadequatetostreamlinedataprocessingfromcollectiontotheendgoal.Ateamthatinvestigatesmultiplequestionsspanningmultipleresearchdomainsislikelytousemorediverseand/orlargersetsofsensors.Theremaynotbeasinglemiddlewarepackagethatcanmeetalloftheneedsofaresearchgroup.Inthiscase,multiplepackageswillneedtobelinkedintoaworkflow.

TechnologicalRequirements

Thetechnologicalrequirementsofaresearchprogrammayvaryfromsimpletocomplex.Iftheresearchcanbedonewithsensorsfromasingle,well-managedcompany,theproprietarysoftwarepackagedwiththepurchasedsensornetworkmaybeadequateforatleastamajorportionoftheinformationmanagementsystem.Forexample,forCampbellScientificdataloggers,their“LoggerNet”softwareintegratescommunication,datadownload,displayandgraphicsfunctions.However,somedataloggersandsensors(particularlyinnovativeones,custom-built),mayneedcustom-writtensoftware.Itisimportanttoplantimeandbudgetsforrequiredsoftwareupgrades,licensing,additionalpackages,support,andmaintenance.Systemsthatcostlessintheoutsetmaynotalwaysbecheaperoverthelongrun.Itisalsoimportanttoconsiderhowtobestmeetinfrastructureandbandwidthrequirements,whiledeployingmiddlewareonavarietyofserversorlaptopcomputersinthefieldorlabsetting.Dependingonthedataandhardwareinfrastructurecharacteristics,eachmiddlewareoptioncanintroducebenefitsordrawbackstotheoverallsystemfunctionality.

PersonnelSkills

Anotherkeyfactortoconsideristheskillsetofthepersonnel.Acomplexdatamanagementsystemmayrequiremultiplepeople,eachwithauniqueskillsetsuchasdatabasedesign,systemarchitecture,webprogramming,etc.Itisimportanttocorrectlyidentifyeachperson’sskillsetandroleindatamanagementtasks.Itmayalsobenecessarytoplanforadditionalhiresorjob-trainingtoaddressesvariousscenariosandsolutions,toidentifyappropriatesalaries,andtobudgetenoughtimeforsoftwaredevelopmentandsystemadministration.Moredetailsaboutthepersonnelrolesandskillscanbefoundinthe“Rolesandrequiredskillsets“section.

MiddlewareClassifications

Classificationbyfunctionality

Middlewarecanbeclassifiedwithrespecttothefunctionalitytheyprovide,suchas:

Controllinginstrumentationanddatacollection:Modulesmaybeusedtocontrolsamplingintervals,managetheevent-triggered(burst)orcontinuoussamplingregimes,communicateandtransferdatabetweentheinstrumentationandothersystemcomponents.

Datamonitoring,processing,andanalysis:Modulesmayprovidealarmmanagement,performautomatedQA/QCondatastreams,orrunderivativecalculationsincludingaverages,aggregationandaccumulation,datashiftingandtransformation,filteringoftimeseriesrecordswithrespecttothedates,valuerange,location,station/variabletype,orothercriteria.

Exportandpublishingofdata:Modulesmayprovidefunctionalitytoexportsensordatatodifferent

formats(e.g.,ASCII,binary,orxml),differentarchives,makedatadiscoverablethroughgeospatialcatalogues,orpublishthedatathroughwebservices.

Datavisualization:Modulesmayprovidevisualization(e.g.,tables,graphs,sonograms)ofgeospatialand/ortimeseriesdatafromsensorarraysorworkflowstructures.

Documentation:Modulesmaybeusedtodocumentfieldeventsthroughpaperlesscollectionoffielddata,integratesensordataanddocumentation(seesensortracking&documentationsection),orhandlesensorcalibrationrecords.

Othersupportedfunctionality:Modulesmaybeusedtoprovideaccesstoexternaldata(e.g.,ODBC,JDBC,OLEDB),toconnectorchainothermiddlewarecomponents,ortoimplementmobileapplications.

Classificationbyproprietyandtype

Middlewarecanalsobeclassifiedbysoftwareproprietaryrightsandwhethertheyareconsideredapplicationsorplatforms.Accordingly,wecanidentifydifferentgroupsofmiddleware:

ProprietarydatamanagementapplicationsandplatformsProprietaryresearchapplicationsLimitedopensourceapplications(freepackagesthatcanbeusedwithproprietarysolutions)OpensourcedatamanagementapplicationsandplatformsOpensourceresearchapplicationsandprogramminglanguages

Someoftheapplicationsandplatformslistedaboveareoftenidentifiedasasoftwareofchoiceformanydifferentorganizations.Moredetailsabouteachofthesecomponentsareprovidedinthenextsectionofthisdocument.

BestPracticesChoosingthemiddlewarecomponentsthatwillbestfitthetasksandworkenvironmentcanbechallenging.Inadditiontothepersonnelrolesandskills,budget,andinfrastructureconsiderationsalreadydiscussedintheIntroductionandotherchaptersofthisbestpracticesguide,itisimportanttobeawareofthewholesensormanagementprocessinordertoidentifythesuitablemiddlewarecomponents.Insomecases,aproprietarymiddlewaresoftwarewillrequiredaspartoftheinformationmanagementsystemiftheinstrumentationonlyoutputsdatainaproprietaryformat.Inothercases,multipleopensourcesoftwarepackagesmaybesuitableforchainingintoacomprehensivesystemthatmanagesdatafromcollectiontofinalarchivingandsharing.

Somestepsinselectingmiddlewareare:

Fig2.Sensormanagementworkflow.Simplesensormanagementconfigurationispresentedinblue;optionalsystemcomponentsareshowningrey.

1. Identifyyourobjectives.Whatdoyouwantthemiddlewaretodo?2. Assemblealistofcandidatesoftware.3. Ratethecandidatesbasedoncapabilities,cost(keepinginmindthatasimple-to-usebutexpensive

packagemaycutcostsinthelong-term),stability,andeaseofusewithrespecttothepersonnelskillsavailableonyourteam.

4. Ifnosinglesoftwareproductcanmeetalltheobjectives,testtoseehowwelldifferentcandidatesoftwareintegratewithoneanothertoperformtheneededfunctions.

Duringthisplanningstage,considerthefollowingrecommendations:

Identifyworkflowcomponentsanddescribetheirfunctionalrequirementsfromtheinstrumentationtothearchiveleveloforganization(seeFigure2).Somecomponentscanbeoptionalorpartofthemorecomplexsolutions.Planforrobustexecutionandchoosesoftwareandhardwarecomponentsthatcanhandlethelossofconnectivity,power,orotherfailuresrelatedtoharshenvironmentaloroperationalconditions.Choosereusable/sharablecomponents.Keepfielddeploymentofmiddlewareassimpleaspossible(keepoutoffieldifpossible).Useasfewmiddlewarecomponentsaspossiblebasedonresearchgrouprequirements.Documentanddiagramtheentireworkflowandupdateasneeded.

CaseStudiesWepresentseveralrealworldcasestudiesinthatvarywidelyinthetypesofecosystemsthatsensorsaredeployedinandincomplexityoftheinformationmanagementsystem.Somecasestudiesincludeproprietarysoftwareonly,someincludefreeoropen-sourcesoftware,andsomeincludeboth.

MarmotCreekResearchSite,RockyMountains,Canada

Fig.3.MarmotCreekresearchsite'sPacBusNetworkwithmixeddataloggersandRaventoRF401Base

IntroductionMarmotCreekresearchsiteislocatedontheeasternslopesofRockyMountainsinAlberta,Canada.Thesiteisdominatedbytheneedleleafvegetationandpoorlydevelopedmountainsoils.Precipitation,snowdepth,soilmoisture,soiltemperature,shortandlongwaveradiation,airtemperature,humidity,windspeed,andturbulentfluxesofheatandwatervapourdatasetsarecollectedandusedforthehydrologicalmodellingoftheMarmotCreekBasin.TimeseriesrecordsareobtainedatHayMeadow,UpperClearing,VistaView,FiseraRidge,andCentennialRidgehydro-meteorologicalstationsequippedwithdifferentsensorconfigurationsandCampbellScientificdataloggers.

CommunicationequipmentandmethodsThetelemetrynetworkconsistsofoneRavenCDMAcellularmodemandRF401spreadspectrumradiomodemlocatedattheUpperClearingbasestation,fouradditionalRF401modemslocatedateachoftheMeteorologicalstationsservicedbytelemetry,andthedesktopcomputerlocatedattheUniversityofSaskatchewan.Theradiosconnectedtothedataloggersateachofthemeteorologicalstationstalktothebasestationonanongoingbasis.AllofthedataloggersandRF401radioshavePacBusaddressesandtheyoperateasPacBusNodes.Also,dataloggersaresettooperateasroutersenablingroutinginsidethisnetworkthroughthevariouspaths.ThetelemetrynetworkconfigurationispresentedinFigure3.

DatacollectionandprocessingAttheintervalsprescribedwithintheLoggerNetapplicationrunningonadesktopcomputer,dataiscollectedfromthemeteorologicalstations.TheRavenCDMAtransfersdatautilizingadynamicIPaddressanditsstaticaliasassociatedthroughtheAirlinkIPmanagersoftware.TheuniquePakBusaddressisassignedtoeachofthedataloggersinthistelemetrynetwork.Inmostcases,loggerdatafilesattheoff-sitelocationwillbeappendedonadaily,four-hourlyandhourlybasis.Inadditiontothescheduledintervals,fielddatacanbedownloadedondemandthroughtheLoggerNetapplication.

LoggerNet“TaskMaster”utilityisusedtoexecutecustomprogramsaftereachsuccessfulcollectionofthefielddata.Also,theutilitycanbeusedtostartscheduledexecutionsofdifferentprogramsandoperations.ForMarmotCreekrecords,TaskMasterisusedtorenamethecollecteddataloggerfilesanduploadthemtotheFTPserver.

DatapublishingFielddatadownloadedtotheoff-sitecomputerareaccessedbytheRTMCPROLoggerNetutility.Lastmeasuredvaluesaremappedtothespecifiedlocationsonawebpage.ThewebserverhostsdifferentRTMCfilesfordailysummaryinformation,stationdatatables,alarms,andotherrecords.Themaininterfacecontainsindividualwindowsforthemainscreenwebpageaswellasthescreensforindividualstations,weeklydatagraphs,andsiteinformation.RTMCfilesinterfacewiththewebpageviatheRTMCWebServerdesktoputility.

ReferenceCentreforHydrology,UniversityofSaskatchewan.UniversityofSaskatchewanHydrologyFieldDataRetrievalandManagementManual.2009.PDFfile.

UniversityofTexasatElPaso'sSystemEcologyLab,Jornadaresearchsite,NM

Middlewareused:Hobolink,MySQL,R,ArcGIS,HTML5/Javascriptwebsite

IntroductionTheSystemsEcologyLab(SEL)attheUniversityofTexas,ElPasostudiespatternsandcontrolsofland-atmosphericwater,energy,andcarbonfluxesinbotharcticanddesertbiomes.AttheUSDA-ARSJornadaExperimentalRangeinsouthernNewMexico,SEL’sresearchsitecollectsdatausing>100automatedsensors(madebyCampbell,Onset,Decagon,PPSystems,andothers),andmanualfieldobservations.Sensorsaremountedonaneddycovariancetower,eightconnectedmini-towers(whichtogetherformawirelesssensornetwork),andacartmountedona110mlongtramline.>4GBofdataiscollectedperweekfrommicromet,hyperspectral,andgasfluxsensors,aswellascameras(detectchangesinphenology).Thisresearchsiteisalsousedtohelpdevelopandtestnewcyberinfrastructureandinformationmanagementconceptsandtools.Forthiscasestudy,wefocussolelyonmeasurementsmadebythe8-nodewirelesssensornetwork.

Fig.4.SEL-Jornadaresearchinformationsystemframeworkintermsofdataflow(symbolizedbyarrows).Webservicesareusedbyweb-basedapplicationstoquerydatafromdatabases.

DataCollectionandProcessingThe8-nodewirelesssensornetworkiscomposedexclusivelyofOnset’sHobodataloggers(8)andsensors(62).Eachdataloggerispoweredbyitsownsolarpanel.Sensorsmeasureprecipitation,leafwetness,PAR,solarradiation,andsoilmoisture.DataarerelayedtotheJornadaHeadquartersandsenttoOnset.ThedataareavailableforvisualizationanddownloadingviaHobolink(http://www.hobolink.com).Theonlineserviceallowsuserstosetupalertsforsystemmalfunctionsandautomatedreportingofdata.Ourteamdevelopedadatabaseschema(usingMySQLforimplementation)thatusesacorecommonconceptamongalldatasets-ameasurementonafocalentitybyanobserverataspecificlocationandtime-toorganizedata.Aroundthiscore,thereareothertablestostoremetadatasuchasprojectinformation,maintenancerecords,andrelatedfiles.ThewirelesssensornetworkisimportedintothedatabaseviacustomSQLscripts.Withinthedatabase,scheduledqueriescandobasicdataquality-checkingandflagging,andgeneratetablesofsummarizeddata(e.g.,dailymeans)thatcanbeaccessedshortlyaftertherawdatawereimported.

DataPublishingOurteamwrotewebservicesinPythontoquerythesummarizeddataanddelivertheresults(inJSONformat)toaJavaScript/HTML5websiteinwhichAmCharts’JavaScriptChartslibrary(http://www.amcharts.com/;freefornon-commercialuse)areusedtoplotthedataandgeneratereportsincludingthedataandpertinentmetadata.ThiswebsitealsorendersamapofthesiteandsensorsviaanESRIArcSDEgeodatabaseandArcGISServer.

MiddlewarePackageDescriptionsInthissection,severalmiddlewarepackagesaredescribedasabasicintroductiontothepackages.Nospecificendorsementorcriticismisimplied,anditisimportanttokeepinmindthatsoftwarepackagesareoftenrevised.

AquaticInformaticsAquarius:AQUARIUSisasoftwareforwatertimeseriesdatamanagementthatprovidesfunctionalitytocorrectandqualitycontroltimeseriesdata,buildratingcurves,transformandvisualizethehydrologicaldata,aswellastopublishthefielddatainreal-time.Toaccomplishthesetasks,AQUARIUSusesthreemaincomponents.DataAcquisitionSystemenablesaccessingtherealtimedatafromthefieldinstrumentationeitherastheextensiontotheEnviroSCADAorthroughthehotfolders.AquariusServerisawebcontrolleddatamanagementplatformthatenablescentralizedaccess

tothedatabasestoreddatasets.Also,theserverisusedtopublishthedataeitherthroughthepublicwebportalorasaRepresentationalStateTransfer(REST)webservicethatsupportstheWaterMLrepresentationofthedata.Finally,AQUARIUSWorkstationprovidesasetofdataprocessingtoolstoimportandprocessthedata,createratingcurves,andapplyQA/QCprocedurestothetimeseriesrecords.

AquariussystemcomponentsAquariusmodelingapproaches

CampbellScientificLoggerNet:LoggerNetisthemainCampbellScientificsoftwareapplicationusedtosetup,operate,andmanageasensornetworkthatusesCampbellScientificequipment.LoggerNetusesserialports,telephonydrivers,andEthernethardwaretocommunicatewithdataloggersviacellularandphonemodems,RFdevices,andotherperipherals.LoggerNetalsoincludesasuiteoftoolssuchastexteditorsforcreatingCampbellScientificdataloggerprograms,andmethodsforreal-timemonitoring,automateddataretrieval,datapost-processing,datavisualizationandmonitoringofretrievedinformation,anddatapublishingoptions.Moreadvancedfeatures,suchasexporttotoMySQLorSQLServerdatabases,arealsoofferedthroughadditionalLoggerNetapplicationsnotincludedinthestandardversion(LoggerNetDatabase,LoggerNetAdmin,LoggerNetRemoteetc.).

LoggerNet4.1InstructionManual

CUAHSIHIS:TheConsortiumofUniversitiesfortheAdvancementofHydrologicScience,Inc.'sHydrologicInformationSystem(CUAHSIHIS)isanadvancedwebservicebasedsystemcreatedtosharethehydrologicdata.ThesystemiscomprisedofhydrologicdatabasesrunningtheCUAHSIObservationsDataModel(ODM)andservershostedbydifferentorganizationsthatareconnectedthroughthewebservices.CentralizedHISmodulesareusedfordatapublication,access,anddiscovery,whilelocal(andcentral)modulesprovidetoolsfordataanalysisandvisualization.Overall,CUAHSIHISisusedtostoretheobservationdatainarelationaldatamodel(ODM),accessthedatathroughinternet-basedWaterDataServicesthatpublishtheobservationsandmetadatausingaconsistentWaterMarkupLanguage(WaterML),indexthedatathroughaNationalWaterMetadataCatalog,andprovideadiscoveryofdatathroughamapandkeywordsearchsystem.

CUAHSIHIScomponentsCUAHSIHISlistofpublicationsDevelopmentofaCommunityHydrologicInformationSystem

DataTurbineInitiative:DataTurbineisarealtimestreamingdataenginethatactsasablackboxtowhichdataproviders(sources)senddataandconsumers(sinks)receivedatafrom.DataTurbineisimplementedasamulti-tierjavaapplicationwithserversacceptingandservingupthedata,sourcesloadingthedataontotheservers,andsinkspullingthedataforvisualizationandanalysispurposes.Eachofthesecomponentscanbelocatedonthesamemachineordifferentcomputersandcancommunicatewitheachotherovertheinternet.Dataisheterogeneousandthesinkscouldaccessanytypeofdataseamlessly.Whilenewdataisloadedtotheserver(s),olddataisbeingerasedinordertofreethereceivingbuffers.

DataTurbine–SensorNetworksWorkshopUnderstandingDataTurbine

GCEDataToolbox:TheGeorgiaCoastalEcosystems(GCE)DataToolboxisasoftwarelibraryformetadata-basedprocessing,qualitycontrol,andanalysisofenvironmentaldata.ItisdesignedandmaintainedbyWadeM.Sheldon,Jr.oftheGeorgiaCoastalEcosystemsLTERandisavailablefreeofcharge,butdoesrequireaMATLABlicense.TheToolboxcanbeusedforawidevarietyofenvironmentaldatamanagementtaskssuchas:importingrawdatafromenvironmentalsensorsforpost-processingandanalysis;performingqualitycontrolanalysisusingrule-basedandinteractiveflaggingtools;gap-fillingandcorrectingdatausinggatedinterpolation,driftcorrectionandcustomalgorithms/models;visualizingdatausingfrequencyhistograms,line/scatterplotsandmapplots;summarizingandre-samplingdatasetsusingaggregation,binning,anddate/timescalingtools;synthesizingdatabycombiningmultipledatasetsusingjoinandmergetools;miningnear-real-timeorhistoricdatafromtheUSGSNWIS,NOAANCDC,NOAAHADSorLTERClimDBservers;

harvestingandintegratingchanneldatafromDataTurbineservers.Thissoftwareishighlymodularandcanbeusedasacomplete,lightweightsolutionforenvironmentaldataandmetadatamanagement,orinconjunctionwithothercyberinfrastructure.Forexample,newlyacquireddatacanberetrievedfromaDataTurbineorCampbellLoggerNetDatabaseserversforqualitycontrolandprocessing,thentransformedtoCUAHSIObservationsDataModelformatanduploadedtoaHydroServerfordistributionthroughtheCUAHSIHydrologicInformationSystem.

GCEToolboxoverview(GeorgiaCoastalEcosystemsLTER)

KistersWISKI:WISKIsoftwarepackageisatoolforhydrologicaldatamanagement.WISKIisaWindowsbasedclient/serversystemhostedthroughtheMSSQLorOracledatabases.Thesoftwarecombinesdatamanagementfeatureswithtoolstocollect,store,analyze,visualize,andpublishtheobservationdata.Typicaldatainputsourcesareremotedatacollectedfromthefielddataloggers,dataimportedfromthirdpartiesviainputfilesindifferentformats,recordsobtainedfromdigitizationofgraphicalcharts,ormanualinputs.MainWISKImoduleincorporatesthedatamanagementfunctionalityaswellasthedischargeandratingcurvetoolsthatworkcloselywithotherKisterssoftwarecomponentsincludingKiWQM(waterquality),KiWIS(datapublishingthroughwebservices),SODA(telemetryhardwaremoduleforremotedatacollection),KiDSM(taskscheduler),Modelingapps(Link-and-Nodeandstatisticalforecast),ArcGISextensions,WebPublicandWebPro(webserverpublishingapplications).

WISKIsystemoverviewWISKImodules

NexSensiChart:NexSensiChartisaWindows-baseddataacquisitionpackagedesignedforenvironmentalmonitoringapplications.iChartsupportsinterfacingbothlocally(directconnect)andremotely(throughtelemetry)withmanypopularenvironmentalproductssuchasYSI,OTT,andISCOsensors.AdditionallyitcaninterfacewithaNexSensiSICandsubmersibledataloggers.Thesoftwaresimplifiesandautomatesmanyofthetasksassociatedwithacquiring,processing,andpublishingenvironmentaldata.

[http://nexsens.com/pdf/nexsens_wqdata_spec.pdfNexSensWQDataandiChartsoftwareoverviewiChartsoftwareproductspotlight(LakeScientist)NexSensdatawebsite(BucknellUniversity)iChartquickstartguide

OnsetHobolinkandHoboware:OnsethastwomainsoftwareapplicationstosupportitsHobodataloggersandsensors..Hobolinkisanonlineservicesthatprovides5-minutedatafromitsdataloggers,multiplegraphsofdatastreams,customizeableinterface,settingsforautomatedalertsforsensormalfunction,andcustomizeabledatareportingfeatures.Hobowareisadownloadablepackagethatprovidesmorefunctionality,suchaslinechartsformorethanonedatastream,chartingtypesthatareunavailableinHobolink,etc.

HOBOwareProvs.HOBOwareLiteListoffeaturesHOBOware®User’sGuide(Datavisualizationandanalysis)HOBOlink®User’sGuide(DataaccessandcontrolofHOBOdevices)

VistaDataVision:VDVisadatamanagementsystemwithtoolstostoreandorganizedatacollectedfromavarietyofdatalogger“dat”files.Thesoftwareoffersdifferentvisualization,alarming,reporting,andwebpublishingfeatures.Dataloggerfilesareparsed,imported,andstoredintotheMySQLrelationaldatabasefromwherethedatacanbecustomqueriedandexportedorpublishedonawebserver.NumerousaccesscontroloptionsareavailablesoVDVuserscanhavecustomizedaccesstospecificstationorsensordata.

VistaDataVisionbrochuresandmanualsVistaDataVisionversioncomparisonVistaDataVisionReview(LTER)

YSIEcoNet:EcoNetsoftwareworkswithYSImonitoringinstrumentation.ThesoftwareoffersdeliveryofdatafromthefielddirectlytotheYSIwebserver.Nodesktopapplicationsareusedandalldataare

storedontheremoteYSIcomputer.Systemuserscanaccessvisualization,reports,alarms,andemailnotificationtoolsdirectlyontheYSIserver.

EcoNetsystemoverviewEmbeddingEcoNetdata

Thefollowingtablesdescribefeaturesofmiddlewarepackagesknowntotheauthorsofthewiki.Thesetablesdonotimplyendorsementorcriticismofanygivenproduct,andmayreflectolderversionsofproductsthancurrentlyexist.

Basic:Thesoftwarehasbuilt-inbutbasicfeaturescomparedtotheoverallmarket.Standard:Thesoftwarehasbuilt-infeaturesthatarestandardwithcomparisontotheoverallmarket.Advanced:Thesoftwarehasbuiltinadvancedfeaturescomparedtotheoverallmarket.Custom:Thesoftwaredoesn'thavebuilt-infeatures,butaprogrammercandevelopthem.None:Thesoftwaredoesn'thavethefeature,anditcannotbecustom-developed.Has:Thesoftwarehasbuiltinfeatures,butthelevelcomparedwiththeoverallmarketisunknowncurrently.Unknown:Thecapacityofthesoftwareisunknown.

Table1.Middlewarebasicfeatures:licensing,cost,inputandexportdataformats,andrequiredlevelofprogrammingexpertise.

Program Licensing Cost Inputdataformat Exportdataformat

Neededprogramming

expertiseAntelopeOrb Proprietary Pay ASCII,Binary ASCII,

Binary Advanced

Aquarius Proprietary Pay Advanced

ArcGIS Proprietary Pay ASCII,shapefiles ASCII,shapefiles Advanced

B3 Opensource Free ASCII ASCII NonetoBasicBigSenseandLtSense Opensource Free Binary CSV,JSON,

TXT,XML Advanced

CosmCUAHSIHIS Opensource Free ASCII XML,

WaterML Standard

DataTurbine Opensource Free ASCII,Binary ASCII,Binary Advanced

EddyPro Proprietary Pay Binary ASCII,Binary Standard

GCEToolbox

MATLABisproprietary,Toolboxisopensource

MATLABispay,Toolboxisfree

ASCII,Binary,database

ASCII,Binary,.mat,database

ToolboxisStandard,MATLABisAdvanced

Hobolink(Onset) Proprietary Free Proprietary ASCII,

Proprietary None

Hoboware(Onset) Proprietary Pay Proprietary ASCII,

Proprietary None

Kepler Opensource Free ASCII,Binary ASCII,Binary BasictoAdvanced

LakeAnalyzer

Proprietary/Opensource Free ASCII ASCII Basic

LoggerNet(Campbell) Proprietary Pay Proprietary

ASCII,database Standard

Nexsen'sTechnology Proprietary Pay Unknown Unknown Unknown

PandasPythonisFree,PandasisFreeandOpenSource

Binary,encoded,np.array,database,markup

Binary,encoded,np.array,database,markup

AdvancedandCustom

Pegasus Unknown Unknown Unknown Unknown Unknown

R Opensource Free ASCII,Binary,database

ASCII,Binary,database

StandardtoAdvanced

SAS Proprietary Pay ASCII,Binary,database

ASCII,Binary,database

StandardtoAdvanced

Taverna Opensource Free Unknown Unknown StandardtoAdvanced

VistaDataVision Proprietary Pay ASCII ASCII Unknown

VizTrails Opensource Free ASCII ASCII BasictoAdvancedWaterMLsupport Unknown Unknown Unknown Unknown Unknown

WISKI Proprietary Pay ASCII ASCII AdvancedYSIEcoNet Proprietary Pay Unknown Unknown Unknown

Table2.Middlewaredatahandlingfeatures:hardwarecommunication,abilitytodoqualityassuranceandcontrol(QA/QC),abilitytostreamdatatoarchives,datavisualization,datatransformationandanalysis,andabilitytogeneratecustomSQLqueriesorotherscripting.

Program Hardwarecommunication

QA/QCcapacity

Capacitytostreamtoarchive

Datatransformationandanalysis

Datavisualization

CustomSQLqueries/Scripting

AntelopeOrb Custom Custom Custom Custom Custom CustomAquarius Has Advanced Advanced Advanced Advanced Advanced

ArcGIS Unknown Advanced Unknown Advanced Advanced StandardtoAdvanced

B3 None Advanced None Has Has NoneBigSenseandLtSense Custom Custom Has Has Unknown Unknown

Cosm Unknown Unknown Unknown Unknown Unknown Unknown

CUAHSIHIS Custom Advanced Advanced(ODM)

Advanced(HydroDesktop,ODMTools,TSA)

Advanced(HydroServerTSA,HydroDesktop,externalprograms)

Advanced(HydroDesktop,ODMTools)

DataTurbine Custom Custom Custom Basic(NEESRDV)

Basic(NEESRDV) Has

DataFrames.jl ThroughCandPythonlibs

Has,stats.jl,numpy

Hasthroughcode.native

HasthroughGadfly,Matplotlib,D3,orWinston

Has Has

EddyPro Unknown Has Unknown Has Has Unknown

GCEMatlab Custom Advanced StandardtoAdvanced

Advanced(withMatlab) Advanced Advanced

Hobolink(Onset)

Basic None Basic Standard None None

Hoboware(Onset) Advanced Has Unknown Advanced Standard None

Kepler Custom Custom Custom Custom Custom CustomLakeAnalyzer None Basic None Has Has None

LoggerNet(Campbell) Advanced Basic Basic Basic None None

Nexsen'sTechnology Has None Basic Basic None None

Pegasus Unknown Unknown Unknown Unknown Unknown Unknown

RCustomwithopen-sourcetech

Custom Custom Advanced/Custom Advanced/Custom Custom

SAS None Custom Custom Advanced Advanced CustomTaverna Unknown Custom Custom Custom Custom CustomVistaDataVision None Basic Standard Standard Standard Basic

VizTrails Unknown Custom Unknown Custom Custom CustomWaterMLsupport Unknown Unknown Unknown Unknown Unknown Unknown

WISKI Has Advanced Advanced Advanced Advanced AdvancedYSIEcoNet Has None Basic Basic None None

Table3.Middlewareotherfeatures:Taskautomation,capacityformulti-tierarchitecture,websitepublishing,streamingthroughwebservices,supportformodeling.

Program Taskautomation

Multi-tierarchitecture Websitepublishing

Streamingthroughweb

service

Supportformodeling

AntelopeOrb Has Standard Custom Custom Unknown

Aquarius None Advanced Advanced Unknown UnknownArcGIS Advanced Unknown Advanced Advanced AdvancedB3 Unknown None None None HasBigSenseandLtSense Has Has Has(viaRESTfulservices) Has Unknown

Cosm Unknown Unknown Unknown Unknown UnknownCUAHSIHIS

Advanced(ODMSDL) Advanced Advanced(HydroServer,

Website,HydroSeek) Advanced Has(throughexternalprograms)

DataTurbine Has Standard Basic Standard NoneEddyPro Has Unknown Unknown Unknown Unknown

GCEMatlab Advanced None Advanced None Advanced(withMatlab)

Hobolink(Onset) Basic None Advanced Has None

Hoboware(Onset) Basic None None None None

Kepler Custom Unknown Custom Custom Advanced/Custom

LakeAnalyzer

Custom(Matlab)

None None None Basic(linktoGLMmodel)

LoggerNet(Campbell) Standard Basic Basic None None

Nexsen'sTechnology Basic Basic Basic None None

Pegasus Unknown Unknown Unknown Unknown UnknownR Custom None Custom(shinypackage) Custom Advanced/CustomSAS Custom None Custom Custom Advanced/CustomTaverna Custom None Unknown Custom Advanced/CustomVistaDataVision None Standard Advanced None None

VizTrails Custom None Custom Custom Advanced/CustomWaterMLsupport Unknown Unknown Unknown Unknown Unknown

WISKI Advanced Advanced Advanced Advanced HasYSIEcoNet Basic None basic None None

References“OGCWaterMLStandardRecommendedforAdoptionasJointWMO/ISOStandard.”OpenGeospatialConsortium,10Dec.2012.Web.14May2013.

Sensor Data Quality

Discusses different ways sensor data may be

compromised and how to control for it in the data stream.

ReturntoEnviroSensingClustermainpage

Contents1Contacts2Overview3Introduction4Methods

4.1SensorQualityAssurance(QA)4.2QualityControl(QC)ondatastreams

4.2.1Dataqualifiers(dataflags)4.2.2Dataqualitylevel4.2.3Datacollectioninterval

4.3DataManagement5BestPractices6CaseStudies7References8Resources

ContactsTheprimaryeditorsforthispagemaybecontactedforquestions,comments,orhelpwithcontentadditions.

DonHenshaw–U.S.ForestServiceResearch,PacificNorthwestResearchStation–don.henshawatoregonstate.eduMaryMartin–HubbardBrookLTER,UniversityofNewHampshire–mary.martinatunh.edu

OverviewAnewgenerationofenvironmentalsensorsandrecentmajortechnologicaladvancementsintheacquisitionandreal-timetransmissionofcontinuouslymonitoredenvironmentaldataprovidesamajorchallengeinprovidingqualityassurance(QA)andqualitycontrol(QC)forhigh-throughputdatastreams.Deploymentsofsensornetworksarebecomingincreasinglycommonatenvironmentalresearchlocations,andthereisagrowingneedtoaccesstheselargevolumesofdatainnearreal-time.However,thedirectreleaseofstreamingsensordataraisesthelikelihoodthatincorrectormisleadingdatawillbemadeavailable.Additionally,asresearchapplicationsbegintorelyonreal-timedatastreams,thecontinualandconsistentdeliveryofthisinformationwillbeessential.Thisincreasingaccessanduseofenvironmentalsensordatademandsthedevelopmentofstrategiestoassuredataquality,theimmediateapplicationofqualitycontrolmethods,andadescriptionofanyQA/QCproceduresappliedtothedata.

TraditionalQCsystemstendtooperateonfile-basedcollectionsofenvironmentaldatafromfieldsheets,fieldrecordersorcomputers,ordownloadeddataloggerfiles.Manuallyappliedtoolsandtechniquessuchasgraphicalcomparisonsareusedtoprovidedatavalidation.Documentationistypicallynotwell-organizedandnotdirectlyassociatedwithdatavalues.Theapplicationofthesesystemsmustbalancetheneedforreleasewithoutmonthsoryearsofdelayversusthedeliveryofwell-documented,highqualitydata.However,withincreasingdeploymentofsensornetworks,theseoldersystemsfailtoscaleorkeeppacewithuserneedsassociatedwithhighvolumesofstreamingdata.ComprehensiveandresponsiveQCsystemsareneededthataredesignedtoreducepotentialproblemsandcanmorequicklyproducehighqualitydataandmetadata.MethodsdescribedhereforbuildingaQCsystemwillincludeidentificationof:

preventativemeasurestobetakeninthefieldqualitychecksthatcanbeperformedinnearreal-timenecessarydatamanagementpractices

IntroductionAteamapproachisnecessarytobuildaQCsystemandmultipleskillsandpersonnelareneeded.TheQCsystemwillbeginwithsystemdesignandpreventativemeasurestakeninthefieldandcontinuethroughdataqualitycheckinganddatapublishing.Aleadscientistwillproposeresearchquestionsanddescribethetypesofdataandnecessaryquality.Expertiseinfieldlogistics,sensorsystemsandwirelesscommunicationswillplayaroleinsitedesignandconstruction.Asensorsystemexpertwillprovideknowledgeofspecificsensorsandprogrammingskillstoestablishqualitycontrolchecking.Fieldtechnicianswithstrongknowledgeoftheoverallscientificgoalsandcommunicationskillscanhelptoarticulateissuesanddiscoversolutions.Adatamanagerwillbeneededtoguidedeliveryandarchivalofdocumenteddataproducts.Communicationamongallpartiesisnecessaryforthemosttimelydeliveryofwell-documentedandhighqualitydata.

AllteammemberswillbeneededtodefineaQCworkflowthatisusefulindescribingproceduresandpersonnelresponsibilitiesasthedataflowsfromfieldsensorstopublisheddatastreams.AQCsystemmustallowforaniterative,qualitymanagementcycletoaccommodatefeedbacktopolicies,procedures,andsystemdesignasdatacollectionscontinueovertime.Asystemwilldependoncommunicationamongteammemberstoassurethatnotedsensordatacollectionandtransportissuesandproblemsareaddressedquicklyanddocumentedinthedatastream.Anactive,well-documentedQCsystemwillhelptoestablishuser-confidenceindataproducts.

Automatedorsemi-automatedQCsystemsareneededthatcanadequatelyreviewandscreensourcedataandstillprovideforitstimelyrelease.Automatedqualitycontrolprocessessuchasrangecheckingcanbeperformedinnearreal-timeandasystemcanassigndataqualifiercodes,orflags,foranysensorvaluewhenproblemsoruncertaintyoccursinthedatastream.However,theseprocessescanoftenonlyindicatepotentialproblemsinthedatastreamthatstillrequiremanualreview.AcomprehensiveQCsystemisonlyachievableasahybridsystemdemandingbothautomatedQCchecksandmanualinterventiontoassurehighestdataquality.

Forthischapterwewilldefinequalityassurance(QA)asthosepreventativeprocessesorstepstakentoreduceproblemsandinaccuraciesinthestreamingdata.Thesewillincludesensornetworkdesign,protocoldevelopmentforroutinemaintenanceandsensorcalibration,andbestpracticeproceduresforfieldactivitiesanddatamanagement.Qualitycontrol(QC)primarilyreferstothetestsprovidedtocheckdataqualityandtheassignmentofdataflagsandothernotationstoqualifyissuesanddescribeproblems.QCsystemreferstothiscompletesetofQA/QCpreventativeandproduct-orientedprocesses.

Methods

SensorQualityAssurance(QA)

Qualityassurance(QA)referstopreventativemeasuresandactivitiesusedtominimizeinaccuraciesinthedata.Forexample,schedulingregularsitevisitsandmaintenanceprocedures,orcontinuouslymonitoringandevaluatingsitesensorbehaviorcanpreventsensorfailuresorleadtoearlydetectionofproblems.Designingnetworkswithredundantsensormeasurementsprovidesanadditionalmeanstoqualitychecksensordataandassurecontinuityofmeasurement.Ofcourse,thetimeandexpensetoconducthigh-levelmaintenanceproceduresorimplementefficientandredundantdesignsmaybelimitedbyprojectbudgets,butmaybewarrantedbytheimportanceofthedata.HerewedescribeQAmeasurescategorizedbydesign,maintenance,andpractices:

1. Designa. Designforreplicatesensors.Co-locatedsensorsindependentofthedataloggerandincludedinthedataflowcanbeusefulchecks.Forexample,checktemperaturemeasurementsmightbemadealongsideaCampbellthermistorwithaHOBOpendant,SDI-12temperaturesensor,oranalogthermocouple.Ideally,threereplicatesensorsareusedsothatsensordriftcanbe

detected(withtwosensorsitmaynotbeobviouswhichsensorisdrifting).b. Assureanadequatepowersupply.Powerconsiderationsmightincludeaddingalowvoltagecutoff(LVD)topreventlogger“brown-out”,oraddingpoweraccessorieswithswitchedpowersupply(e.g.CSIlogger,IPrelay)toprogrammaticallycontroloptionaldevices(radios,power-cycleloggers).

c. ProtectallinstrumentationandwiringfromUVlight,animals,humandisturbance,etc.suchaswithflexconduitorenclosures.

d. Implementanautomatedalertsystemtowarnaboutpotentialsensornetworkissuesorcertainevents,e.g.,extremestorms.Forexample,automatedalertsmightsignallowbatterypower,indicatesensorcalibrationisneeded,orindicatehighwindsorprecipitation.

e. Addon-sitecamerasorwebcams.Webcamscanbeusedtorecordweatherorsiteconditions,animaldisturbanceorhumanaccess.

2. Maintenancea. Scheduleroutinesensormaintenance.Routinesitevisitsfollowingstandardprotocolscanassurepropermaintenanceactivities.

b. Standardizefieldnotebooks,checksheetsorfieldcomputerapplicationstoleadfieldtechniciansthroughastandardsetofproceduresandassurethatallnecessarytasksareconducted.Thesenotebooksorapplicationscanserveasanentrypointfortechnicalobservationsregardingpotentialproblemsorsensorfailures.

c. Scheduleroutinecalibrationofinstrumentsandsensorsbasedonmanufacturerspecifications.Maintainingadditionalcalibratedsensorsofthesamemake/modelcanallowimmediatereplacementofsensorsremovedforcalibrationtoavoiddataloss.Otherwise,sensorcalibrationscanbescheduledatnon-criticaltimesorstaggeredsuchthatanearbysensorcanbeusedasaproxytofillgaps.

d. Anticipatecommonrepairsandmaintaininventoryreplacementparts.Sensorscanbereplacedbeforefailurewheresensorlifetimesareknownorcanbeestimated.

e. Assureproperinstallationofsensors(correctorientation,cleanwiring,solidconnectionsandmounting,etc.).Protocolsforinstallingnewsensorswillalsoassurethatkeyinformationisloggedregardingasensor’sestablishment(SeeManagementsection).

3. Practicesa. Maintainanappropriatelevelofhumaninspection.Developthecapabilitytoeasilyviewreal-timedataandexamineregularly(daily/weekly).Regularinspectioncanhelpidentifysensorproblemsquicklyandmightallowforfewersitevisitations.Certainproblemssuchasvisibleextremespikes,intermittentvalues,orrepetitivevaluescanbeeasilyviewedinrawdataplots.

b. Spotcheckmeasurementswithareferencesensorcanberoutinelyusedforsomemeasurements,i.e.temperature,snowdepth,etc.toverifytheperformanceofinsitusensors.

c. Aportableinstrumentpackagethatcanberotatedamongsensorsitescanbeusefulinidentifyingproblems.Theportablepackagemightrunalongsideinstalledsensorsoverafixedperiod(dailyorlongercycle)toinspectfordriftingorfailingsensors.Thistypeofco-locationmightbedonetoauditsensorperformanceonanannualorperiodicbasis.

d. Recordthedateandtimeofknowneventsthatmayimpactmeasurements(seeManagementsection).Ideally,thesenotescanbeenteredorcapturedforautomatedaccess.Forexample,sensorsareknowntodemonstratealternativebehaviorduringsitevisitsormaintenanceactivities,andlightortripsensorsmightbeusedinrecordingsensoraccess.

e. RoutinelysynchronizethetimeclockondataloggerswiththepublicNetworkTimeProtocol(NTP)server(http://www.ntp.org/).

f. Provideareferencetimezoneandavoidchangingdataloggertimestampsfordaylightsavingstime.ManywouldarguethebestpracticeistooutputdatainCoordinatedUniversalTime(UTC),whichisparticularlyusefulwhendataspansmultipletimezones.However,mostlocalusersofthedatapreferseeingoutputinlocalstandardtimebecauseitcorrespondstolocalecologicalconditions,i.e.,oceantidesorsolarnoon,andmayeasetroubleshootingorfield-basedchecking.AnotherstrategyistoprovidethelocaloffsetfromUTCwithinthedatastreamtoallowsimpleconversiontoUTC,orallowuserstoquerythedataandchoosewhatevertimezonetheywouldliketoreceivethedatain.ISO8601(http://www.iso.org/iso/home/standards/iso8601.htm)isaninternationalstandardcoveringtheexchangeofdateandtime-relateddataandprovidestimezonesupport.Forexample,2013-09-

17T07:56:32-0500providestheoffsetfromanESTtimezone,however,lackofsupportinmanyinstrumentsandsoftwarepackagesisadrawbacktoitsuse.Recently,RESTservicesareconstructedtoallowthereturnofdatetimevalueswithanimplicittimezoneoffsetenablingconvenientsharingofdatawithtimestampflexibility.

g. Ensurethatfilesstoredontheloggeraretransmittederror-freetothedatacenterforimport(useerror-correctedprotocolslikeFTP,YmodemandHTTP).Schedulemanualfiledownloadandpost-importchecksifnon-error-correctedprotocolsareusedasaninterimmeasure.

QualityControl(QC)ondatastreams

QualityControlofdatastreamsinvolvesautomatedorsemi-automatedprocesseswherebyvaluesandassociatedtimestampsarecross-checkedagainstpredeterminedstandardsandseparateconcurrently-collecteddatastreams.QCtakesplacepost-collectionduringthestreamingprocessorafterdataisassimilatedintoacentraldatabase.Someprocessescanbeperformedin“nearreal-time”,oratthetimethedatastreamsarebroughtintothedatabase,anddatacanbereleasedas“provisional”afterthisinitialinspectiontosatisfyimmediateuserneeds.Otherprocessesmayrequiresomedelaysuchastrendanalysisforsensordriftdetection.Resultsofthesetestsaretypicallyaccountedforinadataqualifierflagforeachvalue.Manualinspectionandresolutionofsuspectorproblemdataisalsoanecessarystepbeforedataisreleasedwith“provisional”tagsremoved.Revisedorcorrecteddataversionscanbepublishedatalaterdate,anditisimportanttoprovidedocumentationonthetypesofqualitychecksconductedwitheachreleaseofthesedata.

Threecategoriesofautomatedorsemi-automatedQCprocessescanbedescribed:

1. independentevaluation,wherebyasingledatapointischeckedagainstpredeterminedstandards(suchasrangechecks)

2. point-to-pointevaluation,wherebyasingledatapointiscomparedtootherconcurrently-observeddatapoints(suchasreplicatesensors)

3. many-point,ortrendanalysis,wheresometimeframeofobservationsareexaminedstatisticallyoragainstotherdatatrends.Thefirsttwoareessentiallynearreal-timechecks,whereasthethirdcaninvolvetimeframesseveralordersofmagnitudelongerthanthemeasurementinterval.

Nearreal-timeprocessinginvolvesautomatedcheckingofeachdatapointanditsassociateddateandtime.Dataqualifiercodes,ordataflags,willbeassignedbasedonthesechecks.Theseautomatedchecksandflagassignmentsareessentialinprocessingthemassvolumesofdatastreamingfromsensornetworks,butarenotsufficient.Humaninspectionofdataiscriticalandparticularlymightfocusondatapointsthatareflaggedbyanautomatedsystem.ThefollowingterminologycorrespondswithqualitycontroltestslistedinCampbelletal.2013.

Themostcommonandsimplestcheckstoimplement

1. Timestampintegritychecks–ensuresthateachdate-timepairissequential.Withfixedintervaldataitispossibletocross-checktherecordedandexpectedtimestamp.

2. Rangechecks-ensuresthatallvaluesfallwithinestablishedupperandlowerbounds.Boundscanbeestablishedbasedonthespecificsensorlimitations,orcanbebasedonhistoricalseasonalorfinertime-scalerangesdeterminedforthatlocation.Separateflagsmightbeassignedtoqualifyimpossiblevalues(basedonsensorcharacteristics)versusextremevaluesthatareoutsideofthehistoricnormsbutwithinthesensoroperatingrange.

Othercheckscanbeemployedfornearreal-timeorinpost-streamingQC

1. Persistence-checksforrepeated,unchangingvaluesinmeasureswhereconstantchangeisexpected.

2. Spikedetection-checksforsharpincreasesordecreasesfromtheexpectedvalueinashorttimeintervalsuchasaspikeorstepfunction.Thesetestsoftenemploystatisticalmeasuressuchasthestandarddeviationoftheprecedingvaluesindetectingoutliersorspikesthatexceed2-3sigma(standarddeviations)fromwhatisexpected.Analternativealgorithmistochecktoseethatthe

medianvalueofpointst,t+1andt-1isnotmorethanafixedmagnitudefrompointt.3. Internalconsistency–plausibilitychecksforconsistencybetweenrelatedmeasurementssuchasthatthemaximumvalueisgreaterthantheminimumvalue,orthatsnowdepthisgreaterthanitssnowwaterequivalence.Thesechecksmayalsoexaminevaluesthatarenotpossibleunderknownconditionssuchasincomingsolarradiationrecordedduringnighttime.

4. Spatialconsistency–checksforsensordriftorfailurebasedonintersitecomparisonsofnearbyidenticalsensors.Theintegrationofseveraldatastreamsmaybepossibleinpost-processinganddriftingmaybedetectedbasedonknowncorrelationsorpriorconditioningwithredundantornearbysensors.

Dataqualifiers(dataflags)

TheQCsystemmustbeabletoassignoneormorecodestoeachdatapointbasedontheresultofQCtestsorotheravailableinformation.DataflagsmaybeassignedduringtheinitialQCteststhatareintendedtoguidelocalreviewinidentifyingerroneousorproblematicdata(e.g.,invalidvaluesoutofrangeorbelowdetectionlevel),ormightbeflagsthatindicatesite-specificevents(e.g.,lowbatteryvoltage,anicingorothereventorsitecondition,ornotificationofaduedateforsensorcalibration).Theseinternalflagsmayusearichervocabularyoffine-grainedflagsthanwhatisnecessarytosharepublicly.Reviewinginternalflagsisnecessarytoresolveissuesthatmaybeevidentinthedatabeforethesedataaremadeavailableinfinalpublishedversions.Somesystemsmightemploya“rejected”flagasameansofpreservinganoriginalvaluebutallowcapabilitytowithholdthatvaluefrompublicuse.

Externalflagsprovidedinpublisheddatawilllikelybeamoregeneral,simplersuiteofflagsbettersuitedforpublicconsumption.Multipleinternalflagswouldbemappedintothismoregeneralflagset.Whilemanyvocabulariesareinuse,anexamplesuiteofexternalflagsfollows:

A:AcceptedE:EstimatedM:MissingQ:QuestionableSpecificationofuncertainty

The“Accepted”flagshouldbeassignedtovalueswherenoapparentproblemsarediscovered,buttheQCteststhatwereappliedshouldbedescribed.The“Accepted”flagislikelylesscommonlyusedthansimplyleavingtheflagblank.Iftheblankflagisuseditshouldbeincludedinthelistofflagsanddefined,e.g.,“noQCtestswereapplied”or“norecognizableproblems”or“provisionaldata”.Ablankflagcanbeincludedinanenumeratedlistingofvalidflagsbutmaynotbethebestpracticewithinsomemetadatastandards.A“Provisional”flagisnotlistedherebutmaybeappropriate.Alternatively,“provisional”datamightbeindicatedwithina“qualitylevel”attributeontherecordlevelorfilelevelratherthanassociatedwithanindividualmeasurement(SeeDataQualityLevelsectionbelow).

ExamplesofQualityFlagSets(listedcodesmayonlyrepresentasubsetofeachflagset)

AndrewsLTER WISKI(Univ.ofSaskatchewan) HFRLTER VCRLTER SeaDataNet

A-Accepted 10-Rejected M-Missing Blank-OK 0-noQC

E-Estimated 15-Disregard E-Estimated

Q-Questionable

1-Goodvalue

M-Missing 20-Manuallyedited Q-Questionable M-Missing

2-Probablygood

Q-Questionable 25-Simulated R-RangeError 4-Badvalue

Measurementspecific,e.g.,B-Belowdetection 30-Filled S-Data

Spike6-BelowDetection

Theevaluationofextremevaluesmaybenefitfrom“expertinspection”thatcanbebuiltintotheQC

system.Historicalrangescanbedevelopedforsiteswithlong-termsensormeasurementsatannual,seasonalorfinertimescales.Forremotesitesthataredatasparsetheserangesmaybeaprimarytoolforascertainingdataquality,and,forexample,aQCsystemmayflagvaluesthatfalloutsideoftwostandarddeviationsoflong-termmeans.Whereothernearbyinsitumeasurementsareavailableorwherenationalsurfacestationnetworksareavailable,qualitychecksmaybeimprovedthroughcomparisonofvalues.Accesstomultipleclimateelementsmayprovidetheabilitytocreaterelationshipsamongstationsandallowspecificationofuncertaintyforallvalues.EvaluationofaQCsystem’sperformanceindetermininguncertaintyorinestimatingvalueswillbeimportantinmakingsystemimprovementsandpotentiallyallowingaretrospectivere-applicationofqualitycontrol(Dalyetal.2005).

Wherespecificationsofuncertaintycannotbedetermined,valuesmaybedeemed“Questionable”byanautomatedsystem.Ultimately,manualevaluationmayberequiredandadecisionmadeastowhetheradatapointcanbereleasedas“Accepted”versusremovingfromthedatastreamandlistingas“Missing”versusleavingthevalueflaggedas“Questionable”.AsDalyetal.2005pointsout,“intheend,thefundamentaldilemmawithnearlyallqualitycontrolisatensionbetweentherelativemeritsandcostsofaccidentallyrejectinggooddata,oraccidentallyacceptingbaddata,andatradeoffisusuallyinvolved”.

Wheredataaremissing,anoptionmightbetofillgapswith“Estimated”data.FromCampbelletal.2013,“fillingthesegapsmayenhancethedata’sfitnessforusebutcanpossiblyleadtomisinterpretationorinappropriateuse,andcanbeacomplexendeavor.Thedecisionaboutwhethertofillgapsandtheselectionofthemethodwithwhichtodosoaresubjectiveanddependonfactorssuchasthelengthofthegap,thelevelofconfidenceintheestimatedvalue,andhowthedataarebeingused”.

Dataqualitylevel

ThelevelofQCtestingappliedtoasetofdatashouldbewell-describedandtransparenttothedatauser.Publishingofdataisindependentofdataquality,andusersneedtobeabletoquicklyidentifyitsqualitylevel,forexample,todiscernwhetherthedataisunchecked,rawdatavs.thoroughlyinspectedandreviewed.GroupssuchasNEONandCUAHSIhaveassignedaqualityleveltodataproductsincludingoriginalrawdata,initiallyinspectedandflaggedrawdata,publishedrawdata,andestimated,gap-filledorothersyntheticproductsinvolvingmodel-basedorscientificinterpretation(Seereferencesindata_quality_level.pdf).Whilethesegroupsdonotnecessarilyagreeontheactuallevelassignment,therearesomegeneralconceptsofqualitylevelthatcanbeagreeduponandarerepresentedhere:

Level0(raw)-Unfiltered,rawdata,withnoQCtestsappliedandnodataqualifiers(flags)applied-Typically,theseareoriginaldatastreamsthatarenotpublishedbutthatshouldbepreserved.Dataqualityflagsarenotassigned.Conversionofrawmeasurementvaluestomoremeaningfulunitsmaybeacceptable,e.g.,thermocoupletableconversionsofmillivoltstodegreesC.

Level1(provisional)-Provisionaldatareleasedinnearreal-timewithinitialQCtestingapplied-PreliminaryQCtestsordatacalibrationareapplied,potentiallyinnearreal-timethroughautomatedscripts.Dataqualifiersareassignedandmaybeforinternaluseintendedtoguidefurtherreviewofthedata(SeeDataqualifierssubsection).Alldataqualifiersshouldbewell-defined.Rangeanddate-timecheckingarecommonlyappliedtothisprovisionallevel.TheQCtestsappliedshouldbewell-described.

Level1(published)-Publisheddatawithadelayedreleaseafterautomatedandmanualreview-QCtestingiscompleteandsuspectdatahasbeeninspectedandflaggedappropriately.Eachvalueisassignedadataqualifierandthesetofflagsmaybeamoresimplesetdevisedforpublicuseofthedata.Impossibleormissingvalueswouldbeassignedanappropriatemissingvaluecodeandadataflagof“Missing”.Datawouldnolongerbeconsideredprovisionalandwouldbeunlikelytochange.

Level2(gap-filled)-Gap-filledorestimateddatainvolvinginterpretation-Thisisqualityenhanceddatawherecarefulattentionhasbeenappliedtoestimateorfillgapsindataortootherwisebuildderiveddatatoaccommodatedatauserneeds,forexampleestimategapsinasensorstreamusinganearbysensor.Asgap-fillingtypicallyinvolvesinterpretationandmayemploymultiplemodelsoralgorithms,otherversionsoflevel2datamaybeusedinpractice.Methodsemployedingap-fillingorderivingdatashouldbewell-described.

Aggregatingdatafromonetime-steptoanother,e.g.,creatingdailysummarydatafrom10minutedata,thatdoesnotinvolveanyinterpretationinthatsimplemeans,maximum,andminimumsaredeterminedwouldnotnecessarilyalterthequalitylevel.Thatis,meandailytemperaturedeterminedfromlevel1(published)datawouldstillretainaqualitylevel1.However,interpretationmaybeinvolvedwhendetermininganappropriatequalifierflagforthedailymean.Forexample,ifsomeofthe10minuteobservationsaremissingatwhatpointdoesthedailymeanalsobecomemissing(e.g.,morethan20%aremissing)orbecomequestionable(e.g.,morethan5%aremissing).ThistypeofprocessingmayyielddailymeanvaluesthatarebestdescribedasLevel2asinterpretationisinvolved.

Datacollectioninterval

Dataloggersofferthecapabilitytoeasilyoutputmeandatavaluesatmultipletimesteps,e.g.,10minutes,hourly,daily.SavingvaluesatmultipletimestepsmaypresentanextracomplicationintheQCprocessasseparatetablesareusuallystoredforeachtimestep.Whenasinglesensormeasurementisreportedatseparatetimesteps,conflictingQCresultsmayoccurifbothstreamsareQC’dindependently.Onestrategytosimplifythisproblemistooutputmostoralldataintheshortestcommontimestepandusepost-processingtostatisticallyaggregatethedataatlongertimesteps.Forexample,asystemmightQCandoutputthe10minutedataandthenaggregatehourlyanddailyvaluesfromthisfinerresolution10minutedatastream.Dataloggersmighttypicallycalculateandoutputdaily(24-hour)datastreams,butaccurateQCmaybeimpossibleastheexactvaluesusedinthisaggregationareunknown,andtheaggregationmaybeonlyrepresentingasubsetofvalues,e.g.,iftherewasapowerdiscontinuitytothelogger.However,theremaybecaseswheretheoutputofdailyvaluesbytheloggerareimportant.Forexample,aninstantaneousmaximumorminimumvaluebasedonasingleloggersamplewouldnotbecapturedthroughthisaggregation,andadailyminimumormaximumbasedona10minuteorhourlymeanoutputmaydiffersignificantlyfromtheinstantaneousvalue.

DataManagement

TimingofQCsystemprocesses

AutomatedQCsystemproceduresprovidethemosttimelyandefficientprocessingofstreamingdata.Theuseofsystemproceduresprovidesconsistentassignmentofdataflagsandremovesmuchofthesubjectivityinherentinmanualassignment.Ideally,theQCsystemwillbeemployedeverytimedataisacquired,e.g.,every10minutes,andsecondarilyoperateonhourlyordailytimeperiods.Morecomprehensivevisualorprogrammaticchecksortheassignmentofuncertaintyusingnearbyorotherrelatedsitesmightoccuratalatertime.Thefrequencyandtimingofamanualorvisualreviewprocesseswilldependonthedataflowatthesite,softwarestack,anddataprocessingcapabilities.Thenecessarytimeframefordatadeliveryofprovisionalversusfullyprocesseddatashouldbeconsidered.

DocumentationoftheQCprocesses

ThedocumentationofQCprocessesshouldidentifythenearreal-timestreamingQCmethodsincludingassumptionsandthresholds,andadditionalalgorithmsorvisualmethodsapplied.IfnoQCisappliedthatshouldbemadeapparent.DescriptionsofdataprocessingandQCworkflowsarealsousefulindescribingdataprovenanceandallworkflowversionsshouldberetained(Seeexampleworkflow).Datameasurementattributesandqualifierflagsshouldbedefined.

Fig.1ExampleofageneralmonitoringdataflowmodelfromNevadaResearchDataCenter,ScottyStrachan,UniversityofNevada,2014

TheapplicationoftheQCtestsemployedoranyalgorithmsappliedtoaggregate,estimateorgap-filldatashouldbedescribedforalldatalevels,anddatalevelscanpotentiallybedefinedinconjunctionwithadatareleasepolicy.Ideally,dataateachlevelshouldbelocallyarchived.Level0rawdatashouldberetainedlocallyinitsoriginal,unmanipulatedstate.Level1(published)orlevel2datamaybethebestcandidatesformoreformalarchiving.Datasetsshouldbetransparentlytaggedwithadataqualitylevelasdataarereleased.

Sensordatadocumentation

Developanduseacommonvocabularyandsyntaxforsensormeasurementattributenamesandfilenamingconventions.Researchorganizationswithmultiplesensorsitesmeasuringcommonsetsofparameterscangreatlyimproveefficiencyandmoreeasilyemployautomatedmethodswhenacommonvocabularyisemployed.Thesenamingconventionsshouldbeplannedfromtheoutsetintodataloggerprogramsandothersoftwareemployedwithinthedataflow.

Dataqualifierflagsprovidedocumentationforeachmeasuredvalueandshouldbeplacedalongsidethevalueasdatafilesareproducedforarchivalstorage.Anadditionalattributeormethodcodemayalsobeaddedtonoteshiftsinmethodorinstrumentationorotherkeychangesincollectionprocedures.Inclusionofamethodcodedirectlywithinthedatafileplaceskeydocumentationclosetothedatavalueandismorevisibletothedatauser.Inlong-termdatastreamswherethequalitylevelmaychangeovertime,e.g.,periodsoftimewheregap-fillingisemployed,adataqualityattributemightbeusedtoassigndataqualityattherecordormeasurementlevel.

BestPracticesReorganizedfrom:Campbellet.al.2013.

SensorQualityAssurance(QA)

MaintainanappropriatelevelofhumaninspectionReplicatesensors,n=3isoptimalSchedulemaintenanceandrepairstominimizedatalossHavereadyaccesstoreplacementpartsRecordthedate,time,andtimezoneofknowneventsthatmayimpactmeasurementsImplementanautomatedalertsystemtowarnaboutpotentialsensornetworkissues

QualityControl(QC)ondatastreams

EnsurethatdataarecollectedsequentiallyPerformrangechecksonnumericaldataPerformdomainchecksoncategoricaldataPerformslopeandpersistencechecksoncontinuousdataComparedatawithdatafromrelatedsensorsUseflagstoconveyinformationaboutthedataEstimateuncertaintyinthevalue,iffeasibleCorrectdataorfillgapsifitisprudent

Datamanagement

AutomateQA/QCproceduresRetaintheoriginalunmanipulateddataIndicatedataqualitylevelwitheachreleaseofthedataProvidecompletemetadataDocumentallQA/QCproceduresthatwereappliedandindicatedataqualitylevelDocumentalldataprocessing(e.g.,correctionforsensordrift)Retainallversionsofworkflowsandmetadata(dataprovenance).

CaseStudiesWearelookingforcasestudiesthatwilldescribesomecompleteQCsystems,QCprocessingandgeneralsetup(e.g.,numberandtypeofsensors,dataloggers,telemetry,etc.)ExamplesusingGCEToolbox,VistaDataVision,R,etc.wouldbeusefulGeneralworkflowexamplefromNevadaResearchDataCenter

ReferencesCampbell,JL,Rustad,LE,Porter,JH,Taylor,JR,Dereszynski,EW,Shanley,JB,Gries,C,Henshaw,DL,Martin,ME,Sheldon,WM,Boose,ER.2013.Quantityisnothingwithoutquality:AutomatedQA/QCforstreamingsensornetworks.BioScience.63(7):574-585.http://www.treesearch.fs.fed.us/pubs/43678

Taylor,JRandLoescher,HL.2013.Automatedqualitycontrolmethodsforsensordata:anovelobservatoryapproach,Biogeosciences,10,4957-4971doi

Daly,C,Redmond,K,Gibson,W,Doggett,M,Smith,J,Taylor,G,Pasteris,P,Johnson,G.15thAMSConf.onAppliedClimatology,AmericanMeteorologicalSoc.Savannah,GA,June20-23,2005.pdf

ResourcesQCResources

Campbelletal.2013Biosciencehttp://www.treesearch.fs.fed.us/pubs/43678TaylorandLoescher2013.Biogeoscienceshttp://www.biogeosciences.net/10/4957/2013/bg-10-4957-2013.pdfdoiDalyetal.2005.15thAMSConf.onAppliedClimatology,Amer.MeteorologicalSoc.http://ams.confex.com/ams/pdfpapers/94199.pdfCUAHSIhttp://wdc.cuahsi.org/wdc/Docs/ODM1.1DesignSpecifications.pdfNOAASatelliteandInformationService(NationalClimateDataCenter)http://www.ncdc.noaa.gov/oa/climate/ghcn-daily/Carbondioxideinformationanalysiscenterhttp://cdiac.ornl.gov/epubs/ndp/ushcn/daily_doc.htmlSeaDataNethttp://www.seadatanet.org/Standards-Software/Data-Quality-ControlDataQualityAssessment:StatisticalMethodsforPractitionershttp://www.epa.gov/quality/qs-docs/g9s-final.pdf

Flagsetexamples

NOAANationalClimaticDataCenterhttp://www.ncdc.noaa.gov/oa/hofn/coop/coop-flags.htmlCampbelletal.2013Bioscience(Seep.580)http://www.treesearch.fs.fed.us/pubs/43678

Dataqualitylevel

NEONhttp://www.neoninc.org/documents/513CUAHSIhttp://his.cuahsi.org/documents/ODM1.1DesignSpecifications.pdf,pp.19-20,57-58Amerifluxhttp://public.ornl.gov/ameriflux/available.shtmlEarthScienceReferenceHandbookhttp://eospso.gsfc.nasa.gov/sites/default/files/publications/2006ReferenceHandbook.pdf(p.31)ILRSDataproducts:(CODMAC-CommitteeonDataManagement,ArchivingandComputing)http://ilrs.gsfc.nasa.gov/about/reports/9809_attach7b.html

Sensor Data Archiving

Introduces different approaches and repositories for

archiving and publishing data sets of sensor data.

ReturntoEnviroSensingClustermainpage

Contents1Overview2Introduction3Methods

3.1PublishingofSnapshots3.2PersistentIdentifiers3.3Versioning3.4DataStorageFormats3.5DataStorageStrategies

4BestPractices5CaseStudies

5.11.LTERNIS5.22.KNB5.33.GIWS,UniversityofSaskatchewanWISKIdataarchive

6Resources7References

OverviewArchivingdatasnapshotsandusingappropriatemetadataandpackagingstandardscanincreasethelongevityanddiscoveryofdataimmensely.However,theselocalcurationtechniquesarestillsusceptibletothreatstotheprojectsorinstitutionsthatmaintainthelocalarchive.Peopleincriticaltechnologypositionsthatmaintainarchivesmaychangecareersorretire,projectscanlosefunding,andinstitutionsthatseemsolidcandissolveduetochangesinpoliticalclimate.Forthesereasons,partneringacrossinstitutionstoprovidearchivalservicesofdatacangreatlyincreasetheprobabilitythatdatawillremainaccessiblefordecadesorintothenextcentury.

Inthischapter,wediscusstechniquesandissuesinvolvedwitharchivingdataonamulti-decadalscale.Forsensordata,wepromotetheuseofperiodicdatasnapshots,persistentidentifiers,versioningofdataandmetadata,anddatastorageformatsandstrategiesthatcanincreasethelikelihoodthatdatawillnotonlybeaccessibleintothefuture,butwillalsobeunderstandabletofutureresearchers.

IntroductionAdataarchiveisalocationthathasareasonableassurancethatdataandthecontextualinformationneededtointerpretthedatacanberecoveredandaccessedaftersignificantevents,andultimatelyafterdecades.Dataarchivesshouldbemaintainedthroughbackupstrategiessuchasredundancyandoffsitebackup,inmultiplelocationsandthroughinstitutionalpartnerships.Archivingactivitiesshouldhaveinstitutionalcommitment,andideallycross-institutionalcommitment.Archivesmaybelocallymaintained,maybepartofanationalornetwork-widearchiveinitiative,orboth.Forrawdata,anarchivecanbealocalorregionalfacility,whereasqualitycontrolled,‘published’datashouldbearchivedinacommunity-supportednetworkarchiveandavailableonline.

Environmentalresearchscientistsareinneedofaccessingstreamingdatafromsensornetworksbothprovisionallyinnearreal-time,afterQA/QCprocessing,andinfinalformforlong-termstudies.Withoutappropriatearchivingstrategies,dataareatgreatriskoftotallossovertimeduetoinstitutionalmemoryloss,institutionalfundingloss,naturaldisasters,andotheraccidents.Thesetypicallyincludenear-termaccidentsandlong-termdataentropyduetocareerandlifechangesfortheoriginalinvestigator(s)[Michener1997].Data,andthemethodsusedtogenerateandprocessthem,areofteninsufficientlydocumented,whichmayresultinmisinterpretationofthedataormayrenderthedataunusableinlaterresearch.Likewise,lackofversioncontroloruseofpersistentidentifiersforallfilescausesdownstream

confusion,andhindersreproduciblescience.

Datamanagersareincreasinglyaskedtobothpreserverawdatastreamsandtoadditionallyprovideautomated,nearreal-timequalitycontrolandaccesstoprovisionaldatafromthesesensors.Typically,theseprovisionaldatastreamsundergofurthervisualandotherqualitycheckingandfinaldatasetsarepublished.Commonly,furtherinterpretationoccurswheresomemissingdataaregap-filledthroughimputationprocedures,orfaultydataareremoved.Thereisastrongneedtoarchivethesedatastreamsandprovidecontinuedaccess,whichultimatelysafeguardstheinvestmentofbothtimeandmoneydedicatedtocollectthedatainthefirstplace.Thereareanumberoforganizational,storage,formatting,anddeliveryissuestoconsider.However,fourmainarchivingstrategiesshouldbeused:creatingwelldocumenteddatasnapshots,assigningunique,persistentidentifiers,maintainingdataandmetadataversioning,andstoringdataintext-basedformats.Thesepractices,describedbelow,willincreasethelongevityandinteroperabilityofthedata,andwillpromotetheirusefulnesstocurrentandfutureresearchers.

Methods

PublishingofSnapshots

Generatingperiodicsnapshotsofnearreal-timesensorstreamsallowsthedatatobestoredanddescribedinadeterministicmanner.Theratethatsnapshotsareproduceddependsontheneedsofthecommunityusingthedata,buttypicallysnapshotfilesareorganizedusinghourly,daily,weekly,monthly,orannualdatasets.Italsodependsonthesamplerateandsamplesize.Producingthousandsoftinydatafiles,oronefilewithgigabytesofdata,woulddecreasetheusefulnessofthedatafromatransferandhandlingperspective.Makeiteasyontheresearchersusingthedata,andsizethesnapshotsappropriately.

Withoutdetaileddocumentationofthecontextualinformationneededtointerpretindividualmeasurements,evenwell-archiveddatawillberenderedunusable.Developmetadatafilestoaccompanythedatausingamachine-readablemetadatastandardappropriatetothecommunityusingthedata.CommonstandardsincludetheISO19115GeographicInformationMetadata[ISO/TC211,2003],theContentStandardforDigitalGeospatialMetadata(CSDGM)[FGDC,1998],theBiologicalProfileoftheCSDGM(FGDC,1999],andtheEcologicalMetadataLanguage[Fegrausetal.,2001].AlsoconsiderdocumentingsensordetaileddeploymentsettingsandprocesseswithSensorML[OGC,2000].

Likewise,snapshotsofdatathatrepresentatime-seriesshouldbedocumentedandpackagedappropriatelysuchthattherelationshipsamongfilesareclear.Manyoftheabovemetadatastandardshavetheirownmeansoflinkingdatawithmetadata,howevertheyareallimplementeddifferently.FederatedarchivingeffortssuchasDataONEhaveadopted‘resourcemaps’[Lagoze,2008]todescriberelationshipsbetweenmetadataanddatafilesinalanguage-agnosticmanner.(SeeDataONEpackagingintheResourcessection,andtheOpenArchivesInitiativeOREprimer).Considerpublishingresourcemapsofyourdataandmetadatarelationshipstoimproveinteroperabilityacrossarchiverepositories.

Oncedatacollectionsaresufficientlydescribed,deliverycanalsobeachallenge.Whileprovidingresolvablelinksdirectlytothemetadataanddatafilesisagoodpractice,scientistsoftenwouldliketobeabletodownloadfullcollections.Providingaservicethatpackagesfilesintoadownloadablezipfileiscommonplace,butrelationshipsbetweendataandmetadatacanbelost.ConsiderusingtheBagItspecification(seeBagItintheResourcessection)[Boyko,2009],whichprovidessimpleadditionstozipfilessuchasamanifestfilethatmaintainsthemachine-readablerelationshipsbetweentheitemsinthecollection,whilestillallowingresearcherstodownloaddatapackagesdirectlytotheirdesktop.

PersistentIdentifiers

Theabovesnapshotarchivingstrategieshingeontheabilitytouniquelyidentifyeachfileorcomponentofapackageinanunambiguousmanner.Filenamescanoftencollide,particularlyacrossunrelatedprojects.So,assigningunique,persistentidentifierstoeachfile,andtheoriginatingsensorstream,isparamounttosuccessfularchiving.Apersistentidentifierisusuallyatext-basedstringthatrepresentsanunchangingsetofbytesstoredonacomputer.Persistentidentifiersshouldbeassignedtosciencedataobjects,science

metadataobjects,andotherfilesthatassociatethedataandmetadatatogether,suchasresourcemaps.Opaqueidentifierstendtobebestforpersistenceanduniqueness(likeUUIDs),butcanbelessmemorable.SystemssuchastheDigitalObjectIdentifierservice(DOI)andEZIDcanhelpinmaintainingunique,resolvableidentifiers(seeUUIDs,DOIs,andEZIDintheResourcesSection).Eachversionofafileorproductsderivedfromfiles(seeversioningbelow)shouldalsohaveapersistentidentifier.Ifsnapshotsofdataarebeingextendedwithnewdata,anewversionofthedatasetneedstobepublished.Shorteridentifiersarebest,andavoidusingspacesandotherspecialcharactersinidentifierstoincreasecompatibilityinfilesystemsandURLs.Ultimately,theuseofpersistentidentifiersallowsassociatedmetadatatotracktheprovenanceofcleaned,qualityassureddataorotherderivedproducts,andpromotesreproduciblescienceandcitabledata.

Versioning

Datafromsensorstreamsareusuallyconsidered‘provisional’untiltheyhavebeenprocessedforqualitycontrol,andmultipleversionsofthedatamaybegenerated.However,provisionaldataareoftenusedinpublicationsandarecitedassuch.Thatsaid,inordertosupportreproduciblescienceusingsensordata,eachversionshouldbemaintainedwithit’sowncitableidentifier.Overwritingfilesordatabaserecordswithnewvaluesorwithannotatedflagswillultimatelychangetheunderlyingbytes,andeffectivelybreakthe‘persistence’oftheidentifierpointingtothedata.Thisappliestometadataorpackagingversionsaswell,andsocaremustbetakentoplaninversioningwithinyourstoragesystem.Yourversioningstrategiesofrawdatawillbedependentonyoursnapshotstrategies(e.g.appendingtohourlyfiles,thensnapshottingandupdatingmetadatafiles,oralternatively,say,producingdaily,weekly,monthly,orannualpackagesthatincludedatafilesandmetadatafilesforthetimeperiodofcovered).However,bymakingcitableversions,researcherswillbeabletoaccesstheexactbytesthatwereusedinajournalpublication,andpeerreviewofstudiesinvolvingsensordatastreamswillbemorerobustanddeterministic.

DataStorageFormats

Sensordatamaybestoredindifferentstructures,eachwithitsownadvantagesanddisadvantages.Asuiteofvariablesfromonestationandcollectedatthesametemporalresolutionmaybestoredwithinonewidetablewithacolumnforeachvariable,eachtimebeingonerecordofseveralvariables.Alternativesmightbeatableforeachvariableoronetableoftheformatof[time,location,variable,value].Thislattersystemmaybevaluecentricwithmetadataattachedtoeachvalueorseriescentricwithmetadataattachedtoacertaintimeintervalforonevariable(e.g.,atimeseriesofairtemperaturebetweencalibrations).Nomatterhowyouorganizeyourdata,long-term,archivalstoragefileformatsneedtobeconsidered.Inthedigitalage,thousandsoffileformatsexistthatarereadablebycurrentsoftwareapplications.However,someformatswillbemorereadableintothefuturethanothers.Asanexample,MicrosoftExcel1.0files(circa1985),arenotreadablebyMicrosoftExcel2012sincethebinaryformathaschangedovertimeinabackward-incompatiblemanner.Therefore,unlessthesefilesarecontinuallyupdatedyearafteryear,theywillberenderedunusable.Thesameistruefordatabasesystemfiles(.dbf)thatholdtherelationaltablestructuresincommonlyuseddatabasessuchasMicrosoftSQLServer,Oracle,PostgreSQL,andMySQL.Databasefilesmustbeupgradedwitheverynewdatabaseversionsotheydonotbecomeobsolete.Agoodruleofthumbistoarchivedatainformatsthatareubiquitous,andarenottiedtoagivencompany’ssoftware.ArchivedatainASCII(orUTF-8)textfilespreferably,sincethisformatisuniversallyreadableacrossoperatingsystemsandsoftwareapplications.IfASCIIencodingwouldcausemajorincreasesinfilesizesformassivedatasets,considerusingabinaryformatthatiscommunity-supportedsuchasNetCDF.NetCDFisanopenbinaryspecificationdevelopedattheUniversityCooperativeforAtmosphericResearch(UCAR),andprovidesopenprogramminginterfacesinmultiplelanguages(C,Java,Python,etc.)thataresupportedbymanyscientificanalysispackages(Matlab,IDL,R,etc.).Bychoosinganarchivestorageformatthatisn’ttiedtoaspecificvendor,datafileswillbereadableindecadestocomeevenwheninstitutionalsupportformaintainingmorecomplexdatabasesystemsfallsshort.

DataStorageStrategies

Forlocalarchives,themostcommonstoragestrategyistojustdirectlystorefilesinahierarchicalmanner

onafilesystem.Manydatamanagersuseamixoflocation,instrument,andtime-basedhierarchytostorefilesintofolders(e.g./Data/LakeOneida/CTD01/2013/06/22/file.txt).Thisisaverysimple,reliablestrategy,andmaybeemployedbygroupswithlittleresourceforinstallingandmanagingdatabasesoftware.However,filesystem-basedarchivesmaybedifficulttomanagewhenvolumesarelarge,orthenumberofinstrumentsorvariablesaregrowinganddon’tfitastraighthierarchicalmodel.

Manygroupsuserelationaldatabasestomanagesensordataastheystreamintoasite’sacquisitionsystem.Wellknownvendor-basedsolutionslikeOracleDatabaseandMicrosoftSQLServerareoftenused,aswellasopensourcesolutionssuchMySQLandPostgreSQL.Thesesystemsprovideameansofdataorganizationthatcanpromotegoodqualitycontrolandfastsearchingandsubsettingbasedonmanyfactors,beyondthetypicallocation/instrument/datehierarchydescribedabove.Databasesalsoprovidestandardizedprogramminginterfacesinordertoaccessthestoreddatainstandardwaysacrossmultipleprogramminglanguages.Fromanarchiveperspective,theuseofdatabasesmayhelpinmanagingdatalocally,butshouldbeseenasonecomponentofaworkflowtogetdataintoarchivalformats.

Localdatabasesusedformanagingdatashouldbebacked-upregularly,ideallytoanoffsitelocation,andtheunderlyingbinarydatabasefileformatsshouldregularlybeupgradedtothenewest,supportedversionsofthedatabasesoftware.Ultimately,datastoredindatabasesshouldbeperiodicallysnapshottedandstoredinarchivalfileformats,describedabove,withcompletemetadatadescriptionstoenhancetheirlongevity.

Althoughlocalfilesystemsanddatabasesarethemostpracticalmeansofmanagingdata,theyareoftenatriskofbeingdestroyedorunmaintainedoverdecadalscales.Naturaldisasters,computerfailure,staffturnover,lackofcontinuedprogramfunding,andotherrisksshouldbeaddressedwhendecidinghowtoarchivedataforthelongterm.Oneofthestrategiesforbestdataprotectioniscross-institutioncollaborationsthatprovidestorageservicesfortheirparticipants.Thesesortsofarrangementscanguardagainstinstitutionalorprogramdisollution,lackoffunding,etc.ConsiderpartneringwithcommunitysupportedarchivessuchastheLTERNIS,orfederatedarchiveinitiativessuchasDataONEtoarchivesnapshotsofstreamingsensordata(seebothintheResourceSection).

BestPracticesThefollowinglistofbestpracticesaretakenfromtheaboverecommendations,aswellasadditionalconsiderationswhenarchivingsensor-deriveddata.

Developandmaintainanarchivaldatamanagementplansuchthatpersonnelchangesdon’tcompromiseaccesstoorinterpretationofdataarchives(potentiallythroughUniversityLibraryprograms)

Employasounddatabackupplan.Archiveddatashouldbebackeduptoatleasttwospatiallydifferentlocations,farenoughapartthattheywon’tbeaffectedbythesamedestructiveevents(naturaldisasters,powerorinfrastructureissues).Performdailyincrementalbackupsandweeklycompletebackupsthatmaybereplacedperiodically,andannualbackupsthatwon’tchange.(Crashplan,acronis)

Generateperiodicsnapshotsofnearreal-timesensorstreams(acronis)

Developmetadatafilestoaccompanythedatausingamachine-readablemetadatastandard

Assignpersistentidentifierstosciencedataobjects,sciencemetadataobjects,andotherfilesthatassociatethedataandmetadatatogether

Maintainversionedfileswiththeirowncitableidentifier

PreferablyarchivedatainASCII(orUTF-8)fortextfiles,orcommunitysupportedformatslikeNetCDFforbinaryformat

Archiveallrawdata,butallrawdatadonotnecessarilyneedtobeavailableonline.However,assignapersistentidentifiertoeachrawdatafiletobeabletodocumentprovenanceofthepublished,qualitycontrolleddata.

Partneracrossinstitutionstoprovidearchivalservicestomitigateprogrammaticlosses

PreferablymakedatapubliclyavailablethathaveappropriateQA/QCproceduresapplied.

AssignadifferentpersistentidentifierforpublisheddatasetsofdifferentQClevelsinanarchive.Inthemethodsmetadata,documenttheprovenanceandqualitycontrolproceduresapplied.

Documentcontextualinformationforeachdatapoint.i.e.,inadditiontoassigningaqualityflag,assignamethodsflagwhichdocumentsfieldeventslikecalibrations,smallchanges,sensormaintenance,sensorchangesetc.Includenotesthathandleunusualfieldevents(e.g.,animaldisturbanceetc.)Encodemetadataforsensor-deriveddatausingcommunityandornationallyacceptedstandards.

Ensurethetimezoneforalltimestampsiscaptured.Dataloggerarebeingmanuallysettoacertaintime.Considerdaylightsavings.

Establishthemeaningfulnamingconventionsforyourvariablestakingintoaccountthetypeofobservationthatisarchived,adjectivesdescribingthelocation,instrumenttype,andothernecessaryvariabledeterminants.

Determinetheprecisionforyourobservationvaluesinadvance

Preferablyfollowanamingconventionorcontrolledvocabularyforvariables(SeetheResourcesSection)

Avoidusingdatabasesforarchivalstorage,butusethemformanagementandqualitycontrol.However,ifdatabasesareusedformanagingsensordatathenperiodicsnapshotsintoASCIIoropenbinarydataformatsarerecommended.

Trackchangestodatafileswithinmetadatafilestomaintainanaudittrail

CaseStudies

1.LTERNIS

TheNSFLong-TermEcologicalResearchNetworkInformationSystem(LTERNIS)isthecentraldataarchiveforalldatageneratedbyLTERresearchandrelatedprojects.AlldataincludingsensordataaresubmittedwithmetadataintheEcologicalMetadataLanguage(EML).DataarepubliclyavailablethroughthisportalandthroughDataONE,ofwhichtheLTERNISisamember.Specificapproachestoarchivestreamingsensordataarefollowingthebestpracticerecommendationsgiveninthisdocument:Datasetsaresubmittedassnapshotsintimeanditisuptothesiteinformationmanagerstodecidethelengthoftimeineachsnapshot,i.e.,howfrequentlyanewdatasetissubmitted.Minimallyqualitycontrolleddatasetsaresubmitted,whiletherawdataarearchivedateachsite.AsofthiswritingnostandardsforqualitycontrollevelsordataflagginghavebeenadoptedbytheLTERcommunity.

FeaturesoftheLTERNISinclude:

PublicavailabilityofdataandmetadataCongruencecheck-qualitycheckofhowwellthemetadatadescribethestructureofthedataUseofpersistentidentifiersStrongversioningofmetadataanddatafilesinthesystemMemberNodeofDataONESupportofLTERandrelatedprojectsdatastorageusingaccesscontrolrules

Replicationofdataandmetadataacrossgeographicallydispersedservers

2.KNB

TheKnowledgeNetworkforBiocomplexity(KNB)isaninternationalnetworkthatfacilitatesecologicalandenvironmentalresearchonbiocomplexity.Forscientists,theKNBisanefficientwaytodiscover,access,interpret,integrateandanalyzecomplexecologicaldatafromahighly-distributedsetoffieldstations,laboratories,researchsites,andindividualresearchers.TheKNBrepositoryhasbeenstoringandservingdataforoveradecade,andstoresover25,000datasets.

FeaturesoftheKNBinclude:

PublicavailabilityMetacat,anopensourcedatamanagementsystemMorpho,anopensource,desktopmetadataeditorSupportforanyXML-basedmetadatalanguage,butoptimizedfortheEcologicalMetadataLanguageUseofpersistentidentifiersStrongversioningofallfilesinthesystemSupportforcross-metadatapackagingusingresourcemapsCross-institutionalpartneringwiththeLTERandDataONESupportforbothpublicandprivatedatastorageusingaccesscontrolrulesReplicationofdataandmetadataacrossgeographicallydispersedserversInternationalparticipation,andsupportformulti-languagemetadatadescriptions

RecentdevelopmentsoftheKNBincludesupportfortheDataONEprogramminginterface(API)inboththeMetacatandMorphosoftwareproducts.ThisAPIpromotesinteroperabilityofarchivalrepositories,andenablesfederatedaccesstoenvironmentaldata.SincetheKNBproductssupportthisopenAPI,anyonecancreatetheirownwebordesktopapplicationsthatareoptimizedfortheirresearchcommunity.

3.GIWS,UniversityofSaskatchewanWISKIdataarchive

GlobalInstituteforWaterSecurity(GIWS)attheUniversityofSaskatchewanisdirectlyinvolvedinthecollectionofthefielddatafromdifferentresearchareasincludingRockyMountains,BorealForest,Prairie,andothers.

Inadditiontothe“in-house”manageddata,GIWSusesexternaldatasetsfromorganizationssuchasEnvironmentCanadaandAlbertaEnvironment.DatamanagementplatformonwhichGIWScurrentlyoperatesistheWaterInformationSystemKisters(WISKI).ThissystemisusedtogetherwithCampbellScientificLoggerNetsoftwareandcustom.NETmodulesinautomatedtasksthathandledatacollection,centralizeddataprocessing,storing,andreporting.Afterprocessing,theenvironmentaldatasetsarepublishedandmadeavailabletospecificgroupsofusersthroughtheKistersWISKIWebProwebinterfaceandKiWISwebservice.Bothapplicationscanquerythecentralizeddatabaseandreturndataintheformatsthatareusedforvisualizationorfurtherprocessinganddisseminationpurposes.

FeaturesoftheGIWSsystemincludepublicavailability,useofpersistentidentifiers,supportforcross-institutionalpartnering,dataaccesscontrolfordifferentgroupsofusers,supportforOGCWaterML2dataformat.[SeeGIWSintheResourcesSection]

ResourcesBagItZipfileformat:https://wiki.ucop.edu/display/Curation/BagIt

DataONE:http://www.dataone.org/what-dataoneandhttp://www.dataone.org/participate

DataONEPackaging:http://mule1.dataone.org/ArchitectureDocs-current/design/DataPackage.html

DigitalObjectIdentifier(DOI)System:http://doi.org

EcologicalMetadataLanguage:http://knb.ecoinformatics.org/software/eml/

FGDCMetadataStandards:http://www.fgdc.gov/metadata/geospatial-metadata-standards

GIWS:http://giws.usask.ca/documentation/system/GIWS_WISKI.pdf

ISO19115metadataStandard-GeographicInformationhttp://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=53798

EZIDIdentifierService:http://n2t.net/ezid

LTERNetworkInformationSystem:http://nis.lternet.edu

OpenArchivesIntiativeObjectReuseandExchange(ResourceMaps)Primer:http://www.openarchives.org/ore/1.0/primer.html

UniversallyUniqueIdentifiers(UUID):http://en.wikipedia.org/wiki/Universally_unique_identifier

CFMetadatahttp://cf-convention.github.io/

ReferencesBiologicalDataWorkingGroup,FederalGeographicDataCommitteeandUSGSBiologicalResourcesDivision.1999.CONTENTSTANDARDFORDIGITALGEOSPATIALMETADATA,PART1:BIOLOGICALDATAPROFILE.FGDC-STD-001.1-1999https://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/biometadata/biodatap.pdf

Boyko,A.,Kunze,J.,Littman,J.,Madden,L.,Vargas,B.(2009).TheBagItFilePackagingFormat(V0.96).RetrievedMay8,2013,fromhttp://www.ietf.org/id/draft-kunze-bagit-09.txt

Fegraus,E.H.,Andelman,S.,Jones,M.B.,&Schildhauer,M.(2005).Maximizingthevalueofecologicaldatawithstructuredmetadata:anintroductiontoEcologicalMetadataLanguage(EML)andprinciplesformetadatacreation.BulletinoftheEcologicalSocietyofAmerica,86(3),158-168.http://dx.doi.org/10.1890/0012-9623(2005)86%5B158:MTVOED%5D2.0.CO;2

Lagoze,Carl;VandeSompel,Herbert;Nelson,MichaelL.;Warner,Simeon;Sanderson,Robert;Johnston,Pete(2008-04-14),"ObjectRe-Use&Exchange:AResource-CentricApproach",arXiv:0804.2273v1[cs.DL],arXiv:0804.2273

MetadataAdHocWorkingGroup,FederalGeographicDataCommittee.1998.CONTENTSTANDARDFORDIGITALGEOSPATIALMETADATA.FGDC-STD-001-1998.http://www.fgdc.gov/standards/projects/FGDC-standards-projects/metadata/base-metadata/v2_0698.pdf

Michener,WilliamK.,JamesW.Brunt,JohnJ.Helly,ThomasB.Kirchner,andSusanG.Stafford.1997.NONGEOSPATIALMETADATAFORTHEECOLOGICALSCIENCES.EcologicalApplications7:330–342.http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2

OpenGeospatialConsortium,Inc.(OGC).Url:http://www.opengeospatial.org(2010).