ResponsiveStorage:HomeAutomationfor
ResearchDataManagement
RyanChardPostdocFellow,ArgonneNationalLaboratory
TheProblem- Datagenerationratesareexploding
- Complexanalyticsprocesses
- Thedatalifecycleofteninvolvesmultipleorganisations,machines,andpeople
Thiscreatesasignificantstrainonresearchers
ØBestmanagementpractises(cataloguing,sharing,purging,etc.)canbeoverlooked
ØUsefuldatamaybelost,siloed,andforgotten
RIPPLE:AprototyperesponsivestoragesolutionTransformstaticdatagraveyardsintoactive,responsivestoragedevices
• Automatedatamanagementprocessesandenforcebestpractices
• Event-driven:actionsareperformedinresponsetodataevents
• Usersdefinesimpleif-trigger-then-actionrecipes
• Combinerecipesintoflowsthatcontrolend-to-enddatatransformations
• Passivelywaitsforfilesystemevents(verylittleoverhead)
• Filesystemagnostic– worksonbothedgeandleadershipplatforms
RIPPLEArchitectureAgent:
- Sits locally on the machine
- Detects & filters filesystem events
- Facilitates execution of actions
- Can receive new recipes
Service:
- Serverless architecture
- Lambda functions process events
- Orchestrates execution of actions
RippleAgent
SQLite
Filesystem
Docker,PBS,
SLURM,…
LambdaFunctions
ProcessMonitor
ObserversSNS Topics
ExternalServices
RIPPLEAgentPythonWatchdogobserverslistenforevents- inotify,polling,forfilesystemevents(create,delete,etc.)- GlobusTransferAPIforevents(transfer,create,delete)
RecipesarestoredlocallyinaSQLitedatabase
Localandcloud-basedactions- Dockercontainersandsubprocesses actonlocalfiles(metadataextraction,dispatch
jobs,etc.)- AWSLambdaperformsothertasks(Globustransfers,createsharedendpoints,send
emails,invokeotherLambdafunctionsetc.)
RippleAgent
SQLite
Filesystem
Docker,PBS,
SLURM,…
ProcessMonitor
Observers
RIPPLERecipesIFTTT-inspiredprogrammingmodel:
Triggers describewheretheeventiscomingfrom(filesystemcreateevents)andtheconditionstomatch(/path/to/monitor/.*.h5)
Actions describewhatservicetouse(e.g.,globus transfer)andargumentsforprocessing(source/dest endpoints).
Scenario:LargeSynopticSurveyTelescopeDevelopedarepresentativetestbedoftheLSSTstoragerequirements
• Automaticallypropagatedatabetweenstoragetiersandfacilities
• InvokeDockercontainerstoextractmetadataandmaintainafilecatalog
• Compressandarchivefiles
• Recoverdeleted/corruptedfileswhendeleteandmodificationeventsoccurCustodial Store
(Chile)
Archive: ANL’s Sparrow
Archiver
Landing
Magnetic
Forwarder
File Catalog
File Catalog
Custodial Store (NCSA)
Landing
Magnetic
Archive
metadataminidgzip
catalog....
1.
2.3.
4.
6.
7.
Scenario:AdvancedLightSourceDeployedRippleonanALSandNERSCmachinetoautomatedataanalysis
• AtALS: DetectnewheartbeatbeamlinedataandinitiatetransfertoNERSC
• AtNERSC: Extractmetadata,createsbatch file,dispatchanalysisjobto
Edisonqueue,detectresultandtransferbacktoALS
• AtALS: createasharedendpoint,notifycollaboratorsofresultviaemail