NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big...

Post on 18-Jun-2020

0 views 0 download

Transcript of NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big...

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

1

Ad Hoc Big Data Task Force of the

NASA Advisory Council Science Committee

Meeting Minutes

Inaugural Meeting February 16, 2016

NASA Headquarters Glennan Conference Room, 1Q39

_____________________________________________________________CharlesP.Holmes,Chair

____________________________________________________________ErinC.Smith,ExecutiveSecretary

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

2

ReportpreparedbyJoanM.ZimmermannIngenicomm,Inc.

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

3

TableofContentsIntroduction 3Charter/ScienceCommitteeandSubcommitteeFeedback 3LegacyfromNACITIC 4Discussion 5HPDBigData 6ScienceCommitteeGreetings 8BigDataandEarthScience 9SupercomputingandBigData 10APDandBigData 11Publiccomment 13OtherFederalBigDataInitiatives 13PlanetaryScienceBigData 14Discussion/wrap-up 15 AppendixA-AttendeesAppendixB-MembershiprosterAppendixC-PresentationsAppendixD-Agenda

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

4

IntroductionDr.ErinSmith,ExecutiveSecretaryoftheNASAAdvisoryCouncil(NAC)AdHocBigDataTaskForce(BDTF),calledthemembershiptoorderandmadesomeadministrativeannouncements.Dr.CharlesHolmes,ChairoftheBDTF,openedtheinauguralmeetingoftheBDTF.Introductionsweremadearoundthetable.Charter/SubcommitteeFeedbackDr.SmithpresentedanoverviewoftheTaskForce,whichwascreatedinresponsetoanumberofWhiteHousedirectivesontheBigDataconcept,whichrelatedtothepurviewsofNASA’sHeliophysicsandEarthSciencesdivisions(HPDandPSD),whichengageinthestudyofsolaractivityandsolarstorms,andweatherforecasting.Theadministrationalsoexpressedagreatdealofinterestintheinteroperabilityofdatasets,andrelatedusesofBigData.Successfulapplicationsofscienceintheseareaswillrequirethebreakdownofsubdisciplinestovepipes,andtheinteroperabilityofNASAdatasetswiththoseoftheNationalOceanicandAtmosphericAdministration(NOAA)andtheUSGeologicalSurvey(USGS),makingdataavailabletonumerousenduserssuchasemergencyresponseanddisasterreliefagencies.BigDatamayalsoenabletheidentificationofactionablescienceinformation,makingdatausefulforunforeseenapplications.BigDataalsomeansdifferentthingstodifferentusers,andforspecificdata-handlingtools,dataformats,andthecreationofdatastandards.ApplicationsvaryfortheAstrophysics(supernovamodels),Planetary(identifyingexoplanets,galaxyformation),andHeliophysicsdivisions(onetarget/manymissions,coronalmassejections,radiationenvironmentforhumanexploration).NASA’sEarthScienceDivisionhasbeenmanagingandexploitingBigDataformanyyearsincreatingclimatemodels,andforsocietalapplicationssuchasdroughtforecastinganddisasterresponse.ManyNASAspacebornemeasurementsarecurrentlybeingusedtoimproveairqualitydecisionsupportsystemsinTexas,andinproducingaccuratecloudformationmodels.HPDdataandengineeringdataarebeingfedintoanIntegratedRadiationProtectionSystem,tohelpdeterminehowtogettoacceptableriskfiguresforradiationexposureinhumanexploration.Thetermsofreference(TOR)fortheBDTFformabroadcharter,whichcanbedescribedasexaminingwhatthecommunityasawholeisdoinginBigData,aswellaswhatotheragenciesaredoing,andidentifyingwhatcanbedonebetter.TheintentistocataloguebestpracticesinNASAandotherfederalagencies,aswellasinprivateindustry,researchinstitutions,andacademia.Oneofthefinalproductsmaybeawhitepaperreportingoutfindingsandrecommendations.AmajorchallengefortheTaskForcewillbetodefinewhattheterm‘bigdata’meanstothevariouscommunities;toanastronomeritisanarchiveissue.ToHPDandESD,itisinteroperabilityissuesandengineering.Otherchallengeswillbetodeterminethemostusefulandefficientarchitectures,storagemodes,dataaccessibility,datarates,datasecurity,andintellectualpropertyrequirements.Howdowecommunicatewhatdatasetsaresaying,andhowdowetrainpeopleinuseofdatasets?Itisadynamicarea.Todate,theBDTFhas

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

5

completeditsethicstrainingandisintheprocessofsigningonitslasttwomemberstoroundoutthecommittee.TheNACScienceCommitteehasprovidedfeedbacktotheBDTF,namelytoacquiremorerepresentationfromcommercialentitiesandothernon-NASAsciences,aswellastoconsiderground-basedsciencesthatmayhaveproducedscientificdata;Feedbackwasalsotolookatdatavisualization;datapermanence;anddatausage.TheScienceCommitteehasaskedthattheBDTFactasago-betweenforcommunity,andtofindlinksandleveragepointswithexistingeffortsonbigdata.TheScienceCommitteealsorecommendedthatBDTFinvitepeoplefromtheNASAarchives,NASAAmesResearchCenter,simulationexperts,modelers,andindustrypartners.Withindisciplines,practitionersshouldbeabletounderstandthemselveswithintheirsubfields,andtoallowforcross-pollinationbetweensubfields.TheBDTFhasalsobeenaskedtofindthebestwaytogatherfeedbacksothattheScienceCommitteeanditssubcommitteescanbenefitfromthiseffort(surveytoindustrymembers,townhalls,e.g.).TheNACSciencesubcommitteeswouldliketheBDTFtoaddressdatausability,managementandaccess,utilization(includingreal-time),analysisanddataminingoflargedatasets,algorithmandstatisticsdevelopment,datacuration,archivingtoolsandtechnology,visualization(suchashyperwall),andusingstateoftheartinformationtechnology(IT)systemsandtools.Otherquestionstoaddress:Whatopportunitiesarethereinbigdata?Whichsubjectmatterexperts(SMEs)shouldbeconsulted?Whatkindofproductsaredesirable?Dr.Holmesnotedthatgiventheextensiveshoppinglist,hewishedtodeviseaworkplantousethelimitedtimeavailable,inordertodistilltheTaskForceoutputintosomethingvaluable.Astotheterm“interoperability,”hechallengedDr.Smithtofine-tunethisdefinition,asitisawide-opentopic.Hebelievedthatinnovationcomesfromthebottomup,andworriedthat“interoperable”raisessomeredflagsforthecreationoftop-downmanagement.Dr.ClaytonTinoworriedabout“needsforfutureuse,”whichwouldrequireafundamentalunderstandingofdataformats;itisnearlyanon-solvableproblemtomakedataunderstandabletoallcommunities.Dr.JamesKintercommentedthatinteroperabilitytendstobecomeacatchallphraseforsimulationandmodeling,bestpractices,andinteroperabilitybetweendisciplinescientists(includingmetadataanddocumentation).Dr.RetaBeebenotedthat“datamining”connotessomethingmagicalandisamajorquestion.Externally,peoplethinkthatdataminingismagicallydone.Datasetsaresodifferent,particularlyinPlanetaryScience,thatdataminingbecomesamajorproblem.Dr.Holmesreiteratedhisbeliefinthebottoms-upapproach,andtoallowsuccessesfromthisapproachtoreplicatethroughotherscientificareas.LegacyfromNACITInfrastructureCommitteeDr.HolmesgaveanoverviewoftheBDTF’shistory,havingservedasvicechairoftheNACInformationTechnologyInfrastructureCommittee(ITIC),whichstoodfrom2010-2013.ItsmainaffiliationwaswiththeNASAChiefInformationOfficer(CIO),butithadtiesacrossNASAaswell,inareassuchascybersecurity.TheNACrecommendedthatboththeITICandtheScienceCommitteeexploreanapproachtoimproveaccessto

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

6

NASAsciencedatarepositories,withthatexplorationtoincludebestpractices,etc.,thathavebeentranslatedtothepresentTORfortheBDTF.InFall2013,theNACadvisorycommitteestructurewasrevamped,cybersecuritywasputundertheaegisofanewcommittee,andtheworkoftheformerITICnowcontinueswiththecurrentBigDataTaskForce,reportingtotheScienceCommittee.OneofthefirstrecommendationsoftheformerITICwasthatNASAshouldtakeadvantageofassetsintheFederalgovernment,suchasGPUclusters,cloudcomputingundertheNationalScienceFoundation(NSF),andothersponsorship.ITICalsorecommendedthatNASAimprovethecyberinfrastructurethatsupportsAgencyscience.OneofthefindingsoftheITICnotesthatNASAsciencedatadoesnotsitinoneplacebutisdistributedacrossNASAcenters,atUSGS,industry,anduniversities.NASAdatacentersarediscipline-focused,andaremanagedinthisway.Thenumberofsciencepublicationscomingoutofthesecentersisgrowingdramatically.EducationandPublicOutreachcontinuestotapintothesedatastores,sometimesdirectly,andsometimesthroughagroupthatprocessesitforthegeneralpublic.TheDepartmentofEnergy(DOE)hassetupabackbonethroughoutthecountrywithmanynodesnotfarfromtheNASAcenters;itwouldbegoodtoleveragethispipeline,aswellasa10-Gpsnetworkresearchthatlinksresearchinnovationlaboratories.UseofNASAsupercomputersatbothGoddardSpaceFlightResearchCenter(GSFC)andAmesResearchCenter(ARC)isgrowing.TheEarthObservingSystemDataandInformationSystem(EOS-DIS)isalsogrowinginitsdataproductdistribution.Webservicestosupportdisasterapplications,suchastheShort-termPredictionResearchandTransition(SPoRT)CenteratMarshall,aretransitioningresearchdatatotheoperationalweathercommunity.TheSolarDynamicsObservatory(SDO)isrevolutionizingthewayweunderstandthesun,andiscollectingroughlyapetabyteofdataperyear,with5petabytesperyearworthofprocessing.Therehasbeenatwo-order-of-magnitudejumpinwhatsolarphysicshadbeeningestingpreviouslyfromoldermissionssuchasHinode.NASA’sMultimissionArchiveatSpaceTelescope(MAST)isshowingalmostexponentialgrowth,andwhichwillgrowevenmorewhenfuturetelescopemissionscomeon-line.Thereare200-plusappsintheAppleiStorethatwillreturnfromasearchonNASA;manyoftheseappsareinhighdemandfromthepublic,andpullprocessedresultsoutofNASA’sdatastores.Morethan250,000peoplehavetakenpartinNASA’sGalaxyZooprogram.In2012,theOfficeofScienceandTechnologyPolicy(OSTP)sentoutamemotothepublicannouncingaBigDataInitiative,earmarking$200Mtobespentonimprovingaccesstothegovernment’sbigdatastores.In2013,thereweremorememosandExecutiveOrderscomingoutonthisissue,butNASAwasmissingfromthelistofrecipients(DOE,DepartmentofDefense,andothers);soitmustbeasked-wheredidNASAmisstheboat?Dr.HolmesnotedanITICfindinginNovember2012,thatNASAacquirefiber-opticpathwaystosupportcurrentandfuturedata,andarecommendationthattheybuyratherthanownthesepathways.DiscussionThecommitteediscussedadraftworkplantodeterminehowtheBDTFwouldmoveforward.Dr.HolmesfeltthattheBDTFshouldn’taddresstheareasofdatasearchability

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

7

andavailability,proprietaryperiods,long-termarchiving,andotherfrequentrequeststhataremadeofNASA’sdatastores,feelingthatprocessesarealreadyinplaceforthisatNASA.TheBDTFshouldbreaknewgroundinstead,andshouldsurveythecommunity,choose3to4topics,andproduceproducts.TheBDTFshouldformaconciseproblemstatement,research,organizeanddeveloppositions,formaconsensus,anddraftandpresentresultsinawhitepaper(4-6pp)accompaniedbyaslidepresentation.BecausetheBDTFexpiresinDecember2017,thereareonly4-5moreface-to-facemeetingsinadvanceofeachofthefutureScienceCommitteemeetingsinwhichtodevelopfindingsandrecommendationstotaketotheScienceCommittee.Tothisend,theTaskForceshouldalsoholdteleconferencesasappropriate.Dr.HolmesreviewedhisdutiesasChairasprimarilybeingtherepresentativetotheScienceCommittee,andclosedwiththethought:“Dogood,workhard,NASAneedsus.”Dr.RayWalkeragreedthatdataavailability/searchabilitydidnotrequireahardlook,butnotedthatasdatavolumesgetlarger,itwillbenecessarytofigureoutthepieceswewanttouse;inthissensetheissueisstillimportanttoconsider.Dr.HolmesinvitedDr.WalkertowriteupanactionablerecommendationontheissueandsendittoDr.Smith.Dr.Tinocommentedthattherearemodel-level,internal,andexternalusedomains;whatisitthatareweactuallytryingtodo?Heagreedtowriteupanitemonthisquestion.Dr.Kintersaidthatitseemsthatbydefinition,BigDatameansthebiggestandbaddestdatasets;inthatrespect,wetypicallyweseeaccessibilityasawaytoaggregateandanalyzedatafromanentiredataset(petabytes);veryfewuserswillhavetheresourcestooperatedatasetsofsuchmagnitude.TheTaskForceshouldalsothinkaboutfacilitatingtheanalysisofdatasetsthataretoobigtomoveandtoobigtoanalyzein-situ.Dr.Holmesagreedtorevisetheworkplanwiththeadditionsofthewrittencontributions,andtolookatareasthatcanbeextendedbeyondthestateofwork;theBDTFneedstolookatbenchmarksregardingthisissue.HPDBigDataDr.JeffreyHayespresentedareasofconcernfortheHeliophysicsDivision(HPD)intermsofBigDataneeds.HPDstudiesthesun’svariance,theresponseofgeospace,andtheSun-Earthsystem’simpactsonhumanity.Todothis,HPDengagesinthescienceofspaceweather,triestounderstandtheinterconnectionsbetweentheSunandEarth,anddevelopsknowledgetoimprovethepredictionofextremeeventssuchasmajorcoronalmassejections(CMEs).Themissionportfolioincludesaresearchandanalysis(R&A)line,anExplorersmissionline,alongwithLivingwithaStar,SolarTerrestrialProbes,andthesoundingrocketsprogram.MissioninvestmentisguidedbytheDecadalSurveysandNASA’sadvisorybodies.TheHPSystemObservatoryincludesnumeroussatellitessuchasIRIS,Wind,STEREO,theVanAllenprobes,andtheInterstellarBoundaryExplorer(IBEX).Withinthecurrentmissionsandtheoperationsbudgets,thereisacertainamountoffundingfordataarchiving,andthecreationofstandardsandaccessibility.Dr.Hayesfeltthatmostmissionswereabletorespondquicklytodecisionsondataarchivingandcuration.SeniorReviewsaddressthescientificmeritsofHPDmissionseverytwoyears,andtakeintoaccounttheaccessibility,usabilityandutilityofdata(includingarchivingafterthemissioniscomplete).Asaresult,thedatapipelineisdoingverywell.

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

8

About70-80%ofHPDdatacomefromextendedmissionphases.Thesunvariesinaroughly22-yearcycle;alloftheseHPDmissionsoperatingsimultaneouslyarebeginningtoenabletheunderstandingofaverycomplexsystem.TheaveragecostofaHeliophysicssatelliteoperationis$2.9Mannually.TheSolarDataAnalysisCenter(SDAC)andSpacePhysicsDataFacility(SPDF)aretheactivearchivesforHPDandrunatabout$3.3Mperyear.ThereisalsoaROSESelementamountingtoabout$1Mayear.Thus,thetotaltocuratethedataisabout$4.5Mperyear,plussomemoneyinthemissionlinesthemselves.Dr.Hayesnotedthat“Scientistswantallthedataallthetime,forever.”Intheearly2000s,theDecadalSurveycameoutwithapriorityforaVirtualObservatory,inwhichtheideawastocollectallthedata(bothAstrophysicsandHeliophysics)andmakeituniversallyaccessiblethroughcommonstandards.Atthetime,Astrophysicshadonestandard,andHeliophysicshadmultiplestandards.Overthelast20years,NASAhasbeentryingtogetthesestandardsinline,andDr.Hayesfeltthatgoodprogresswasoccurringinthisarea.Heliophysicshasanexplicitpolicythatestablishedstandards,whichareFITS,CDF,andNetCDF.NASAisinamuchbetterplacethanitwas10yearsagointermsofstandardization.HPDhasalsorestoredalargefractionofdatafromitsoldermissions,andhasbeensystematicallyexaminingoldarchivesandrestoringdataarchivesanddatasetsofscientificinterest.Foranymetadata,itisnecessarytogeteveryonetoagreeonkeywords.HPDhasgottengoodbuy-in,anduserscannowusetheSpacePhysicsArchiveSearchandExtract(SPASE)metadatawrapperstodoaninventory,searchbydateorevent,etc.,tohelpdosystemscience.Theprocesshasgottenalotbetter,andappearstobegoingfaster.HPD’sthreemostrecentmissionsaresuccessfullyusingtheSPASEmetadatawrappers.ThefirstdatafromMagnetosphericMultiscale(MMS),forexample,willbeavailableonSPDFonMarch1.HPDisstartingtogetterabytesofdata-thisisanewexperience.Thereare800TBfromSDOtodate,andthevolumeisgrowing.HPDisnowlookingatstoring1PBintheSDAC;thisdatavolumewillprobablytripleorquadrupleasfuturemissionscomeonline.StanfordUniversitywillnotalwayssupportSDAC;atsomepointthedatawillhavetobroughtbacktoNASA.Dr.Hayesfeltthatputtingdataonthecloudwasstillaniffyprospect,andcitedarecentaccidentaldeletionofstoreddataasoneofitspotentialdrawbacks.Solarprojectdatavolumegrowth,intermsofbothlifetimedatavolumeanddatarate,willcontinuetogrow.Thequestioniswhereandwhowillstoreit,andhowwillitbemovedaround?HPDcan’tthrowdataawaybecauseHeliophysicsscienceneedsthecontext.Datapolicyisworkingwell.HPDhasaregistryandinventoryofthedata,andisconstantlyupdating.Legacydatasetshaveprettymuchcompletedtheirextractions.NowHPDisconcentratingonstandards.AfuturechallengeishowtousetheSPASEmetadata,howtousethedata,andhowtomakeitaccessibletothenon-expertuser.Remotesensingvs.in-situmeasurementsareverydifferentandthesedifferencesmustbetakenintoaccount.Formodeling,howdowearchiveuseful,powerfulcomparisons?Atthispoint,modelsdonothaveastandard;weareworkingtowardit.Aswemoveawayfrom

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

9

theVirtualObservatoryconcepttoamoreconsolidatedwayofgettingdataout,wemustfocusonmetadataandlinkstogenericaccessmethods,andavoidstovepiping.Theinterdisciplinaryaspectsofdatawillbeaddressedbyalargergroup.Dr.HayesnotedthattheVirtualObservatoryconceptdidnotfail,butthetechnologyhassincemovedon.Dr.HolmesaskedDr.HayestoidentifyHPDneedsfromtheBDTFstandpoint.Dr.Hayesrepliedthatoneusefulfindingacknowledgingthevalueofstandards.Theotherissueofconcernforhimwastheunfundedmandateaboutkeepingversionsofdatainperpetuity.ThereisaNASApolicyinresponsetotheOSTPaboutpublicaccessibilityandpublications,howevertheworrisomeissueiswhetherthereferencedatainapaperhascertainpedigreethatmayormaynotbepreservedinthearchive.Whoownsthefinaldata?Whichversionofthesoftware?Thereisneverenoughdiskspace.Anotherusefulfindingwouldbeastatementthathavingdataactive,on-line,isagoodthing.Data,especiallytaxpayer-fundeddata,shouldn’tbeburiedinsomeone’sdeskdrawer.NASAtendstogetpushbackfromprincipalinvestigatorsonthisissue-theyfeeltheirdataisproprietary.Dr.HayesagreedtowriteupanitemforDr.Smith.Dr.Kintercommentedthatthereisnodatastandardformodels,andthatthisisachallengeforthefuture;hewonderedhowmuchinteractionthereisbetweentheHeliophysicscommunityandthetroposphericandweathercommunities.Dr.Hayesfelttherewasnotmuchinteraction,certainlynotatthetroposphericlevel.Therearemeetingsongoing,however,andHPDwouldbeopentoanythingtheothercommunitieshavethatcanbeused.Thevariablesmaybedifferent,butitissomethingthatcouldbeexplored.Dr.WalkermentionedthattheNationalScienceFoundation(NSF)islookingintodataassimilation.Dr.HolmesnotedthatthecommunityhadlookedatcompatibilitybetweenEarthScienceandHeliophyicsdatatenyearsago,andstoppedbecauseofdatasparseness.Dr.NealHurlburtagreedthattheeffortwasstillatthecasestudy-level.IRISisagoodexampleofwherewewereforcedtousemodels.Dr.Kinternotedthattherearealsooceandataassimilationsthathaveasimilarproblemwithdatasparseness.Thetroposphericproblemhasmovedwellduringthelastdecade,andcanaccommodatedatasparsenessalittlebetter.GSFChassomeexpertisehere.Dr.HolmesaskedDr.KinterprovidePOCsatGoddard.Dr.WalkermentionedthatthePlanetaryDataSystem(PDS)hasbegunastudyofarchivingmodels,aswellastheCommunityCoordinatedModelingCenter(CCMC),andEuropeanworkinbothHeliophysicsandPlanetaryattheUniversityofParis;thesecanprovideusefulLessonsLearned.ScienceCommitteeGreetingsScienceCommitteeChair,Dr.BradleyPeterson,addressedthecommittee,thankingmembersfortheirimportantcontributions.Henotedthattimewasapressingissue,andurgedtheBDTFtofocusonfindingcommonalitiesandbestpracticesacrossthesubdisciplines,andbuildingontheexistinginfrastructureonlyifitisuseful.HeaskedthemembershiptoregardtheNASAbudgetisazero-sumgame,asNASAwillbuyintorecommendationsonlyiftheyareaffordable,orwhethertheyareworthgivingupsomethingfor.Eatingintothebudgetformissionsandresearchwouldbeanundesirableoutcome.Dr.PetersonsuggestedthattheBDTFconsultwithsubcommittee

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

10

chairswhenuseful,inordertoiterateideasacrosstheScienceCommittee,subcommittees,andBDTF.BigDataandEarthScienceDr.KevinMurphypresentedanoverviewoftheEarthScienceDataSystemsprogram,andstatedthatregardlessofvaryingdefinitionsofbigdata,EarthSciencehasit,aswellasalargeuserbase.Objective2.2ofthe2014NASAStrategicPlaninformstheusageofEarthSciencedatatoformaviewofEarththatcanbeusedacrossdisciplines:ocean,atmosphere,cryosphere,etc.andtheirinteractions.TheEarthObservingSystemDataandInformationSystem(EOSDIS)isthelargestcomponentoftheEarthSciencedatasystem,andisassociatedwiththecompetitivelyselectedprograms,MakingEarthSystemdatarecordsforUseinResearchEnvironments(MEaSUREs)andAdvancingCollaborativeConnectionsforEarthSystemScience(ACCESS).EOSDISworksinternationallyandamongthefederalagenciestogetdatatothepublic,andprocessesdatafromlevel0tohigherproductstomakeavailabletousers.EOSDISwasinitiatedin1990,incorporatingheritagedatasetsin1994fromsatellites,aircraftandin-situsensors(e.g.fluxtowers),andwasdesignedtohandleaterabyteofdataperday.EOSDISreprocessesdataquiteoftenasinstrumentsdeteriorateorasbettersignalprocessingmethodsbecomeavailable.Thereareabout15petabytes(PB)ofdatacurrentlyavailable,allofwhichinteroperatewithotheragenciesandarchivesthroughestablishedstandards.EOSDIShasadistributedframework,andhashadanopendatapolicysince1997.Thesystemgeneratesbiophysicalproductsandgeolocatesthem,anddistributestotheendusers.EOSDIShasanextensivevolumeofdatarepresentedinover9200datatypes,whichrangeoverhumandimensions,land,atmosphere,oceandynamicsandthecryosphere.Thesystemworkscloselywithmissionsinformulationanddevelopmentinordertopreparedataplans.EOSDISisspreadoutovertheUS.MissiondataareprocessedbyScienceInvestigator-ledProcessingSystem(SIPS),whicharethenpassedalongtotheDistributedActiveArchiveCenters(DAACs)tosupporttheuserbase.DAACsarelocatedathostorganizationsthatarewidelyrecognizedbythecommunity,andeachDAAChasaworkinggroupthathelptodirecthowtheDAACswork.ThereisalsoaProgramScientistwithineachDAACthatroughlyalignswitheachsubdiscipline.ThetwocomponentsoverseeingtheDAACsareprimarilyHeadquartersformanagementandtheGoddardSpaceFlightCenter(GSFC)forimplementation.TheEarthScienceDataandInformationSystem(ESDIS)managesthecoordinationofEOSDISactivitiestoavoidduplicationofefforts.ESDISholdsannualmeetingsandcontinuallytakesinputthroughweeklyteleconferencesandannualmeetingswithDAACsmanagersandDAACsystemsengineers.Roughly160-180peoplegototheannualmeetings.TheEOSDISinfrastructurealsotiestogetherusersandDAACsthroughearthdata.nasa.gov,acommonmetadatarepository(CMR),GlobalImageryBrowseServices(GIBS),EOSDISMetricsSystem(EMS),andvarioususersupporttools.EOSDISperformsanannualcustomersatisfactionsurvey,andalsohasDAACUserWorking

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

11

Groups,whichreceiveregularfeedback.EOSDISmetricsfrom2015show9462uniquedataproducts,and2.6MdistinctusersofEOSDISdataandservices.EOSDISdistributesabouttwiceasmuchdataasitingests.In2015,thesystemreceivedanACSIscoreof77(consideredverygood).Thetrendforproductdeliveryisincreasing.EOSDISconvertshigh-valueproductsintoimagery,suchastheNASAWorldviewwebsite,whichusesdatafromtheAqua/Terra/ModerateResolutionImagingSpectroradiometer(MODIS)satellites,andNOAA’sVisibleInfraredImagingRadiometerSuite(VIIRS).WorldviewworksmuchlikeGoogleEarth;userscanzoominandgobackintime.Userscanalsooverlaydata,suchastheSO2cloudoveraneruptingvolcano,andfindspecificdatasuchasfirehotspots.EOSDISholdsSeniorReviewstoevaluatethevarioussubsystemstoevaluateperformanceandscientificmerit.Dr.Walkernotedthemanyhighlyderiveddataproducts,andaskedhowEOSDISkeptupwithevolvingalgorithms.Dr.Murphyexplainedthatstandardproductsareproducedincollections,andEOSDISiscurrentlygoingfromMODIScollection5tocollection6,reprocessingdata.Collection5willbemaintaineduntilcollection6iscomplete.Scienceteamswilldeterminewhenthenewcollectionisdone.Dr.HolmesaskedwhattheBDTFcouldforEarthScience.Dr.MurphyfeltthatNASAreceivedlittlerecognitionforthisimportantwork,asitisgenerallynotwellunderstood.Thedataproductrampiscurrentlylimitedbyadaptingtoinputfromnewinstruments.EOSDIShastoputalgorithmsclosertothedatainawaythatallowsunimpededaccesstoproducts;howtodothisisstillanopenquestion.NASAalsoneedstolearnhowtoworkwithcommercialhigh-performancecomputinggroups,maybe.Dr.Hurlburtaskedhowmanyofthe2.9Mdistinctuserswerepartoftheactive(science)community.Dr.Murphyrepliedthatpeoplewhousealotofthedatawillfrequentlyuseallofit(operationaluserswhouseLevel1data).Thenumbersofgraduatestudents,etc.,arehardtoestimate.Dr.KinteraskedhowESODISdealtwiththebudgetrealities.Dr.MurphynotedthatEOSDISrecognizestheneedtodeveloporadoptstandardized-enoughcomponentstoallowpeopletodeveloptheirowntools,astrategythatsavesbothtimeandeffort.NASAdoesn’twanttobethefirstadopterorthelast.Thestrategydependsonthecommunity.EOSDISkeepstheprincipleofopenapplicationprogramminginterfaces(APIs),andopenaccess.Thecommunityiswellawareofthedatapolicy.Dr.WalkeraskedabouttheextentofwhichNASAprovidesinteroperabilityinitsjointworkwithNOAA.Dr.MurphyexplainedthatNASAoperateswithNOAAonacataloguelevel,usesopensoftwaresourcing,sharesobservations,andworkscloselywithNOAAontheClimateInitiativeandintheairborneprogram.SupercomputingBigDataDr.TsengdarLee,ProgramManageroftheEarthScienceDivisionSupercomputingProgram,presentedanoverviewoftheprogram,andtheNASAvisionforfuturecomputingservices.NASAhastwosupercomputingcenters,oneatAmesResearchCenter(ARC),whichservestheentireagency)andoneatGSFC,whichservesprimarilyEarthScience.ARCsupportsagency-wideactivities,fromlaunchvehiclestogeneralrelativity.

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

12

InAugust2015,theNASAFlagshipcomputer,Pleiades,reachedahalfbillionSBUs(computingcycles)deliveredaccumulativelyfrom2008,translatingtonearly$300Mofservices,atacostofroughly26centsperSBUin2015.NASAcontinuestogrowthesystem,relyingonMoore’slawtogoforward(Dr.Leenotingthatsomearguethatthelawhascometoitsend).Scientificandengineeringeffortswillgrow,thusNASAwillhavetocomeupwithauserpolicybecausethesystemhasbecomeoversubscribed.TheROSESselectionprocessisnowbeingtightlycoupledtotheavailabilityofcomputingtime.ForEarthScienceimagingandmodeling,thesystemcanpushtheresolutiondownto1.5kmcurrently;theholygrailofatmosphericscienceis0.5km.Theworkloadischanging,shiftingintodataprocessing.Asanexample,theKeplermissionisusingPleiadestosupportvalidationfornewexoplanets.Thishasbecometheprimaryavenueforproducingdiscoveriesinthatarea.Dataassimilationsystemsarebeingusedtocreatephysicallyconsistentlong-termdatasets,from1979tothepresent,andarealsodownscalingtohigherresolutiondataforclimatestudies.TheOrbitingCarbonObservatory(OCO-2)ispresentingdataprocessingchallenges.NASAisdoingadatare-processingcampaignwithnewalgorithms,withabout60%ofthisworkbeingdoneonthesupercomputerand40%ontheAmazoncloud.HighEndCapabilityComputing(HECC)isbeingusedtoclear5yearsofanunmannedaerialvehiclesyntheticapertureradar(UAVSAR)dataprocessingbacklog,toreducelatency.Processingismovingintothebigdataarea,pitchinghigh-performancecomputingagainstLargeScaleInternet.Canhigh-performancecomputing(HPC)beusedasaprivatecloud?Howdoweputtogetheranarchitecturetoprocess,analyzeandminedata?Currently,datastorageanddatamanagementisthecoreofthebusiness,withdatainthemiddle,andalltheserviceandprocessingsurroundingthedataset.AScienceCloudarchitectureideallyprovidesanagile,highlevelofsupport,withthesystemowningthedata,usingadatamanagementsystem,dataanalyticsservice,openstack,etc.NASAisconstantlylookingatnewtechnologies:cloudandvirtualization,high-performanceobjectstore,andSciDB(thelatterheavilysupportedbyDARPA).Thesciencebenefitofasciencecloudhashelpedtovalidatemanytypesofmeasurements,suchasglobalfires.CouplingHPCandcloudcomputingcancreateabest-of-breedcomputingserviceenvironment.HECC’spathtogrowthisconstrainedatpresent;NASAhasmaxedouttheinfrastructureintermsoffacilities,building,water,andelectricity,andisengagedinastudyonhowtobuildnext-generationdatacenters.Drs.Holmes,Walker,andHurlburtexpressedconcernsaboutuserconstraints,giventhat70-80%oftheprogram’sworkloadrequiresatightlycoupledprocess.Dr.LeeagreedtowriteastatementonthisstateofbeingforusebytheBDTF.Headdedthatcertaintypesofworkloadscouldbecloud-computed,andNASAisexploringthoseoptionsaswell.Dr.ClaytonTinoaskedifDr.Leehadanysenseofthecapacitytheprogramwaslosingduetomixedmodeservices.Dr.LeerepliedthatNASAwasdoingthemixedworkloadbecauseofthedemand.Someoftheprojectsdidn’tplanfortheirHPCuse,andneedtodoabetterjobofsuchplanninginthefuture.AstrophysicsandBigDataDr.PaulHertz,DirectoroftheAstrophysicsDivision(APD)presentedBigDataneedsasviewedbytheAstrophysicscommunity.Astrophysicsaddressestheevolutionofthe

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

13

universe,theoriginofgalaxiesandstarsandthequestionofwhetherwearealoneintheuniverse.TheAPDisdrivenbytheDecadalSurveys,scienceroadmaps,andimplementationplanstosupportitsabilitytohandlelargedataquestions.Sixtypercentofthebudgetsupportsdevelopingspacemissions,20%operations,another5-10%isdedicatedtoresearchanddevelopment.Dataarchivesarefundedasaninfrastructureinvestment.APD’scurrentsuiteofmissionsrunfrommanysmallmissionssuchasNeutronstarInteriorCompositionExplorer(NICER),tothelargespacetelescopes,HubbleandthefutureJamesWebbSpaceTelescope(JWST).ThenextlargeflagshipafterJWSTisWide-FieldInfraredSurveyTelescope(WFIRST),whoseprimescienceistounderstanddarkenergyanddarkmatter,whichcanonlybedonebymeasuringthesmallimpacttheseforceshavehadinthehistoryoftheuniverse,bylookingatlargeswathsofuniverse;i.e.lookingatlargeamountsofdatatoseesmallperturbations.ThusWFIRSTwillbecomputationallyintensive.WFIRSTwillbelookingatmillionsofgalaxies,searchingforevidenceofmicrolensing,whichisalsocomputationallyintensive.Euclid,aEuropeanmissionwithsimilaritiestoWFIRST,willalsocreatelargedatasets.Anotherfutureground-basedobservatoryistheLargeSynopticSurveyTelescope(LSST).Allthreeoftheseprojectswillbecombiningtheirdatainpixel-by-pixelanalysis.Thevariousagenciesarestudyingthebestwayofcarryingoutthisdataprocessing,adecadeinadvanceoftheneed.Awhitepaperonthistopiccanbefoundat[[arxiv.org/abs/1501.07897]];Jainetal;TheWholeisGreaterThantheSumoftheParts.AllNASAAstrophysicssciencedataareopentothecommunity,andalldatacentersgothroughtheSeniorReviewprocesseverytwoyears.Allastrophysicsarchivesshareasetofcommonprotocolsandstandards,allowingtheusercommunitytocombinedatafrommultiplegroundandspaceobservatories.TheNASAAstrophysicsVirtualObservatory(NAVO)managestheprotocols,whileNSFfundsthetools.ThethreeAstrophysicsarchivesmanagetheNAVObackbone.APDrecentlyheldaSeniorReviewofthearchives,andrecommendedthattheybecomemoreproactiveandaggressiveaboutevolvingintothefuture(increasingbandwidth,keepingupwithtechnologicaladvances,preparingforlargevolumesofdata).Sometypesofcomputingmightbemoreexpensiveinthecloud,anditmustbedeterminedwhicharewhich.NASAandNSFarecurrentlyfundingtheoreticalandcomputationalAstrophysicsnetworks(TCAN).Dr.HertzwasnotawareofanyissuesthusfarongettingtimeonNSFsupercomputers.(Dr.LeenotedthatNASAcivilservantscan’ttypicallygetonNSFsupercomputers,butuniversityPrincipalInvestigatorscan.)AnothercomputationallyintensiveareaislaboratoryAstrophysics:interpretingx-raysfromChandra,farinfrareddatafromHerschel,andvisible-to-ultravioletHubblespectrallines.Theseatomiclinecalculationsareneededforcreatinglinecatalogues.Dr.TinoaskedifunderestimationofcomputingtimewereathemeinAPD.Dr.HertzexplainedthatprocessingKeplerdatahasbeenmorecomputationallyintensivethanwasappreciatedatthebeginningofthemission,butthatanewmission,TransitingExoplanetSurveySatellite(TESS),whichhasasimilardataproducttoKepler,hadplannedaccordinglytoLessonsLearnedontheneedforanticipatingcomputingtime.Dr.LeenotedthatNASAisalsomakingtighterconnectionsbetweenHPCandthebudget-planningprocess.Intermsof

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

14

recommendations,Dr.HertznotedthatAstrophysicswasaminorityuserofHPC,andwasinterestedinareaswhereitcouldleverageexistingassets,orincommercialorotherresearchthatcanimproveAstrophysicsscience.APDhaspartneredwithDOEinthepast,whentheyareinterestedinthescienceproblem.DOEisnotinterestedinexoplanets,butitisinterestedindarkenergyanddarkmatter,thereforeAPDwillbeworkingwiththemonjointWFIRST-Euclid-LSSTanalysis.PubliccommentperiodNocommentswerenotedfromtheonlineaudience.AtNASAHeadquarters,TrippCorbettmadesomecommentsfromthevendorperspective,sayingthathewasnotingabitofdisconnect,astoolsareavailableatNSSCthatshouldbemorewidelycirculated.AtarecentNASAmeeting,hehadheardabriefingonworkingwiththecloud-computingcommunityinabudget-consciousway,andagreedtosendmorespecific.informationtotheBDTF.OtherFederalBigDataInitiatives(NSF)TheNSFBigDataHubsProgramdirector,Dr.FenZhao,briefedtheBDTFbyphoneonherprogram,whichisfundedatabout$20Myear.TherearerelatedprogramsatNSFthatlookatBigDatainfrastructure,pilotandimplementationefforts,andEducation-relatedactivitiessuchastheBigDataWorkForce($30Mayearlookingattraineeships).TheBigDataHubsprogramlooksatthecomplexrelationshipsbetweendataprojects,endusers,andcommercialentities,andinvolvescross-disciplinaryeffortsanddatasharingacrosstheresearchecosystem.TheinspirationforBDHubscamefromOSTP’s2012BigDataInitiative,inwhichaBigDataPartnershipsWorkshopinitiativeresultedin29newpartnerships,with90organizationsparticipating,representingareassuchasenergy,healthcare,andfinance.Theinitiativechosevariousissuessuchasclimatechangeandpersonalizedhealthcare,andNSFinitiatedtheBDHubsefforttoallowthesepartnershipstogel.BDHubswaslaunchedinMarch2015,withfourhubsinfourregionsoftheUS,andmadeawardsinSeptember2015(ColumbiaUniversityintheNortheast,GeorgiaTechandxintheSouth,UIUCintheMidwest,andUniversityofSD,UCBerkeley,andtheUniversityofWashingtonintheWest).Hubsaredifferentlyconstructedconsortia;thecurrentphaseisallowinghubstostartuptheiractivities.TheprojectsarecalledBDSpokes,whichrepresentspecificactivitywithineachtopicalarea,suchasaplatformforsharingneurosciencedata.Thespokesarefundedat$1Moverthreeyears,andaremeanttoleverageexistingefforts.TheHubsarecurrentlyorganizingdraftsforeachspoke,andfullproposalsareduethismonth.Alargenumberofideascameinonsmartcities,andInternetofThings;thefood/energy/waternexus;andhumanhealthcare.NSFintendstofundtheseproposalsthisfiscalyear,andtherearelatentprojectswaitinginthewingsthatcanhelptransitionsomeoftheseideastopractice.NSFhopestodothisagainnextyear.Dr.HolmesofferedkudostoNSFforsettingupthisopen-endedeffort.Dr.Zhaonotedthatthereisanendgoalofsorts,aseachHubisresponsibleforgenerating29projectsattheendofthreeyears.ThisideaisnotcompletelynovelatNSF.TheFoundationhopetofundeachspokeforasecondthreeyears,tohavethembecomeself-sustaining.AsimilareffortwasundertakenunderUS-Ignite,tosupportnetworking.The

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

15

ideaistolookfortheunknowns,asinterestingthingscanhappenintheselarge,multiplecollaborations.Everyonebringstheirownphysicalinfrastructure,andalsotriestoidentifyserviceproviders.Dr.HolmesnotedthatmostoftheHubsweregeographicallyclosetoNASAPIs.Drs.HolmesandZhaoagreedthataclosercollaborationwouldbeideal.PlanetaryScienceBigDataDr.MichaelNew,ProgramScientistforthePlanetaryDataSystem(PDS),presentedtheneedsofBigDatafromtheplanetaryperspective.MostplanetarydataworkisbasedatGSFC.PlanetaryScienceDivision(PSD)datapoliciesstatethatallsciencedatareturnedfromplanetarymissionsbelongstothepublicdomain.Anyexclusivedataaccesscannotexceedsixmonths.Infundedscienceresearch,anydatanecessarytoreplicatepublishedresearchresults,thatarealsotheproductofaNASAaward,mustbemadeimmediatelyavailabletothepublic.TheplanetarydataenvironmentincludesPDS,thePlanetaryCartographyProgram(PCP;USGS),MinorPlanetsCenter(MPC;Harvard)andtheAstromaterialsCurationFacility(ACF;JohnsonSpaceCenter).Datarangesfromground-basedassets,individualinvestigators,mapping,dataanalysis(e.g.,trajectories),samplereturns,ANSMET(Antarcticmeteorites),toatmosphericdust.TheoutputofthePDSisprimarilytotaxpayers,educatorsandtalentedamateurs.AttheACF,NASAstoresspace-exposedhardware,lunarsamples,cosmicdustsamples,andHayabusa(comet)samples.NASAiscurrentlyre-engineeringitssamplecataloguetomakethesesamplesavailableonline.TheMPCisresponsibleforsmallbodies,andtheorbitsofminorplanetsandcomets.ThePCPmaintainsthecartographiccapabilityformappingtheplanetsandtheMoon,anddevelopsandmaintainstheIntegratedSystemforImagersandSpectrometers(ISIS),whichenablesthingslikespectrographicmapsofIo.ISISispreparingtoincorporateanopen-sourcevisualizationtool,theSPICE-basedCosmographia.(“SPICE”isaNASAinformationsystemanditsuseextendsfrommissionconceptthroughpost-missiondataanalysis,andithelpstocorrelateindividualinstrumentdatasetswiththosefromotherinstrumentsonthesameoronotherspacecraft.)PDSisafederatedarchive,withdatadistributedacrossthecountry;itsdisciplinenodeswererecentlyre-competed.Managementofthesystemasawholeisalsobasedonafederatedmodel.PlanetarydataaremanagedbyplanetarySMEs.Dataisphysicallystoredatthenodes,andthedeeparchiveismaintainedattheNASASpaceScienceDataCoordinatedArchive(NSSDCA).TheNavigationandAncillaryInformationFacility(NAIF)implementsstandardsandtoolsthatareneededtounderstandthemotionofcelestialobjects.Inplanetarydatasets,everythingismovingrelativetoeverythingelse:spacecraft,instrument,Earth,andSun,allofwhichneedtimeconversionstandards.ThecollectionofthesevariablesiscalledObservationGeometry(OG).ThecurrentPDSisdistributedacrosssixnodes,whichafterarecentcompetitionarenowintheirfirstyearofa5-yearCooperativeAgreement.ThePIsateachnodecollectivelyformamanagementcouncil,andprovideinputaboutstandardsanddecision-making.PDS-4hasjustrecentlybeenrolledout.ItisanXML-based,model-driven,service-orientedmodel,andamoderntechnicalfoundationforplanetarysciencedata.ExistingPDS-3

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

16

productswillbeconvertedtoPDS4whenpracticalandsensible.TheEuropeanSpaceAgencyandJAXA’planetarydatasystemsarebothadoptingPDS-4standards.ThetotalvolumeofPDSisabout1PB.Almostallcomputationsareperformedonindividualworkstations.PDShasjuststarteditsnext10-yearroadmap,andwillbeannouncinganopportunitytoself-nominateinearlyMarch.Areasofimprovementtobeaddressedintheroadmaparetoinclude:simplifyingandimprovingthepipeline;improvingsearchcapability;developingmoreusefulmetrics;improvingtoolsforarchivingsmalldatasets;andimprovingarchivepreparationanddocumentation,especiallyfornon-missiondataproviders.Relevantwebsitesare:naif.jpl.nasa.govandpds.nasa.govDr.HurlburtaskedaboutPDSmetrics.Dr.Newadmittedtohavingpoormetricsofusageandusers,andnotedthattheroadmapeffortwouldhelptoidentifythemetricsPDSwants,andtoadaptthesystemtoprovidethem.Dr.BeebecommentedthattheinternationalplanetarydataallianceacceptedSPICEastheirdatatoolattheirlastmeeting,afavorableindicator.Dr.New,whenaskedaboutBigDataneeds,allowedthattherewerenotmanyspecificareasinplanetary,withtheexceptionofmagnetosphericandplasmadata,orwhengeneratingveryhigh-fidelitygravitymodels.Thelunargravitationalmappingmission,GRAIL,iscurrentlyworkingonagravityfieldmodelontheHPC.Hehadn’theardaboutanyissueswithpipelineassociatedwiththeGRAILwork.Dr.NewfelttheBDTFcoulddirectaquestiontotheAgencyastohowitwouldliketohandlethestorageofgrantdata.PSDneedsacleardirectstatementonthisissue,whichneedstobeinformedattheAgencylevelbecauseitwillbearesponsetoanOSTPdirective.Thereare1500granteesinPSD;itwouldtakealabor-intensiveefforttostorealltheirdata.AnotherquestioniswhatkindofdataPDSisexpectedtoarchive.Dr.Holmesnotedthatthedirectiveappliestotheotherdisciplinesaswell,andinstructedDr.Smithtonotethisasanissue.Ameetingparticipantnotedthatthegrantdispositionquestionwasbeingaddressedintheroadmappingtask,entailingacommunity-basedreappraisalofthesubjectoverthenext6-9months.DiscussionDr.HolmesfollowedupbrieflywithDr.LeeonHPC,andaskedwhatvisibilityexistedfortheprogram,andwhatthechancesforcollaborationwithDOEExascalemightbe.Dr.leeidentifiedhimselfasChairoftheHigh-EndComputingInteragencyWorkingGroup(HECIWG),butnotedthattheExascalecomputingfacilityisunderNationalStrategicComputingInitiative,adifferentgovernance.TheHECIWGismeetingmonthlyatthemoment,andDr.Leefelthecouldstartvectoringthediscussionintheirdirection.HenotedthatDOEsetsupaprocessforeligibility;ataskneedstohaveacertainprofile,andxnumberofcores.ThegateforeligibilitytogetontheDOE’sleadershipcomputingsystems,however,ishigherthanNASA’sentiresystem.NASAisfarbehindNSFandDOEinthesupercomputingarena.NASA’sleadingsystemislessthan5Tflops.Dr.HolmesconsideredthatBDTFmakeafindingonthematter,asNASAisworkingonprojectsofnationalsignificance.Dr.TinoaskedifExascalewasspecificallydesignedtosolveDOEproblems,withspecificallyimplementedarchitecture.Dr.LeereportedthatDOEhasaco-designconcept,andtheybringinanapplicationthatworksontheexascalesystem.

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

17

Theyareconsideringclimate-changeasaco-designedsystem.DOEdoesn’thavetheinteroperabilityrequirement.Dr.WalkercommentedthatDOEhasspecificproblems,whileNASAismorebroad.Dr.HolmesnotedthatDOEisaddressingbothastronomyandclimate,andthatwhilesomeofthescalesaredifferent,thephysicsaresimilar.Dr.TinofeltthatNASAshouldeitherfocusonproductsandservices,oracceptgenerality.Dr.HolmessuggestedNASAmanagersaddressutilizationmodelsatfuturemeetings.Dr.KinteraskedaboutwhatHPCwoulduseBigIronforafteritsnominal3yearsofoperation..LeesaidthatNASAplanstorepurposeBigIronafter3years,backintoageneralizedcluster.NASAisstilllimitedbyfacilitiesre:powerandcooling.Dr.HolmesaskedDrs.TinoandKintertowriteatalkingpointonthefacilitiesissue.BDTFmembersraisedsomegeneraltopicsforfurtherexploration.Dr.Tinonotedthateachofthepresentershadadoptedsomeformofstandard,illustratingthatpeoplerecognizethatstandardsdomatter.Fromamanagementstandpoint,however,thesubdisciplineshadinconsistentmetricsonusers,andquestionedwhyarchiveshadtobemaintained,intheabsenceofusage.Dr.Walkerexplainedthatsomedatahaveextremelylonglives;everytimewegetanewmissiontoJupiter,forinstance,VoyagerandPioneerdatasetsareindemandagain.It’scriticalthatsomeofthesedatasetsbesafeguarded.Dr.HolmesnotedthattheSeniorReviewmightbeavehiclefordeterminingwhichdatashouldbekept.Dr.Hurlburtsuggestedusermetricsinformthesesortsofjudgments.Dr.Tinofeltusersurveyswerenotalwayseffective,andthatmetricsonactualusewouldbemoreusefulingettingsmartonwhatdatatostore.Dr.HolmesaskedDr.Tinoetal.tofleshthisoutthoughtanddomoreresearchinadvanceofthenextmeeting.Dr.Beebeaddedthatonealsoneedstoconsidertheintrinsicsizesofcommunitiesandtheirstability;theyalsotendtomovearoundwhenmajormissionsarise.Dr.HolmeswassurprisedatthelackofaclearvisionforthefutureandaskedDr.Hurlburttowriteafindingonthistopic.Dr.HolmesaskedDr.SmithtosoundouttheScienceMissionDirectoratetodeterminethelevelofconcernovergrantdatastorage.Dr.Beebereportedthatitwasamajorconcernthathasalreadyreachedthetopleveloftheadministration,whichhadestablishedworkshopsforpeoplepreparingforfederalgrants.Dr.HolmesgaveanactiontoDr.SmithtoclarifyDr.Murphy’sstatementontheuseofopensourcesoftware,andaskedBDTFmemberstoexaminetheNSFnodesoftheBDHubeffort,todeterminehowclosetheyaretoco-locatedNASAPIs.Dr.HolmesaskedthatthenextBDTFmeetingtakeplaceatGSFCfor2.5daysintheApril-Maytimeperiod,andtoperhapsconsiderasitevisittoARCinthefuture,toincludesomeinteractionwithSiliconValley.Dr.SmithreportedthatshewouldbeworkingonanextensionoftheTOR,off-line.Dr.Holmesadjournedthemeetingat4:59pm.

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

18

AppendixAAttendees

AdHocBigDataTaskForceMembersCharlesP.Holmes,Chair,BigDataTaskForceRetaBeebe,NewMexicoStateUniversity(viatelecon/Webex)NealHurlburt,LockheedMartinJamesL.Kinter,GeorgeMasonUniversity(viatelecon/Webex)ClaytonTino,Virtustream,Inc.RayWalker,UniversityofCaliforniaatLosAngelesErinSmith,ExecutiveSecretary,NASAHQNASAAttendeesLouisBarbieri,NASADanCrichton,NASAJPLElaineDenning,NASAHQDeborahDiaz,OCIONASAJohnEvans,NASAT.JensFeeley,NASAHQNavidGolpayegani,NASAJeffreyHayes,NASAHQPaulHertz,NASAHQTsengdarLee,NASAHQEdwardMasuoka,NASADuaneMcMahon,NASATomMorgan,NASAHQKevinMurphy,NASAHQMichaelNew,NASAHQHerbertSchilling,NASAGrifSchilly,NASAJohnSprague,NASAOCIOElizabethYoseph,NASANon-NASAAttendeesJosephBredenkamp,NASAretiredTerryBlankenship,BoozAllenHamiltonJungByun,BoozAllenHamiltonChiehsanCheng,GlobalScienceandTechnologyTrippCorbett,ESRIJosephDohry,BoozAllenHamiltonAlexDuner,MedillNews,Inc.

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

19

GraceHu,OMBEricFeigelson,PennStateUniversityRobertKohon,NovettaBradleyPeterson,OSU,Chair,NACScienceCommitteeAmyReis,Ingenicomm,Inc.AlyssaRetski,Lobbyit.comMarciaSmith,SpacePolicyOnlineConnieSpittler,GlobalScienceandTechnologyGeordanTilley,MedillNews,Inc.JoanZimmermann,Ingenicomm,Inc.

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

20

AppendixBMembership

Dr. Charles P. Holmes, Chair NASA HQ (Retired) Dr. Reta F. Beebe New Mexico State University Dr. Neal E. Hurlburt Lockheed Martin Space Systems Company Dr. James L. Kinter George Mason University Dr. Clayton P. Tino Virtustream Incorporated Dr. Raymond J. Walker University of California, Los Angeles Dr. Erin Smith, Executive Secretary NASA Headquarters

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

21

AppendixCPresentations

1. BigDataTaskForceCharter/SubcommitteeFeedback;ErinSmith2. LegacyfortheNACInformationTechnologyInfrastructureCommittee;Charles

Holmes3. HeliophysicsDivisionBigDataNeeds;JeffreyHayes4. BigDataandEarthScience;KevinMurphy5. SupercomputingandBigDataatNASA;TsengdarLee6. AstrophysicsDivisionBigDataNeeds;PaulHertz7. OtherFederalBigDataInitiatives(NSF);FenZhao8. PlanetaryScienceBigDataNeeds;MichaelNew

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

22

Appendix D Agenda

Ad Hoc Big Data Task Force

of the NASA Advisory Council Science Committee

Inaugural Meeting February 16, 2016

NASA Headquarters

Glennan Conference Room, 1Q39

Agenda (Eastern Standard Time)

Tuesday, February 16 8:00 – 8:30 Opening Remarks / Introduction of Members Dr. Erin Smith

Dr. Charles Holmes

8:30 – 9:15 Big Data Task Force Charter / Subcommittee Feedback Dr. Erin Smith 9:15 – 9:30 BREAK 9:30 – 10:15 Legacy from NAC IT Infrastructure Committee Dr. Charles Holmes

10:15 – 10:30 Discussion 10:30 – 10:45 BREAK 10:45 – 11:15 Planetary Science Big Data Dr. Michael New 11:15 – 11:45 Heliophysics Big Data Dr. Jeffrey Hayes 11:45 – 12:45 LUNCH 12:45 – 1:00 Greetings from the Science Committee Dr. Bradley Peterson 1:00 – 1:30 Earth Science Big Data Dr. Kevin Murphy 1:30 – 2:00 Supercomputing Big Data Dr. Tsengdar Lee

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

23

2:00 – 2:30 Astrophysics Big Data Dr. Paul Hertz 2:30 – 2:45 Public Comment 2:45 – 3:00 Other Federal Big Data Initiatives (NSF) Dr. Fen Zhao

3:00 – 3:10 BREAK 3:10 – 3:30 Work Plan and Future Meetings 3:30 – 5:00 Discussion / Findings / Recommendations 5:00 ADJOURN Dial-In and WebEx Information

For entire meeting February 16, 2016 Dial-In(audio):DialtheUSAtoll-freeconferencecallnumber1-800-988-9663ortollnumber1-517-308-9427andthenenterthenumericparticipantpasscode:4718658.Youmustuseatouch-tonephonetoparticipateinthismeeting.WebEx(viewpresentationsonline):Theweblinkishttps://nasa.webex.com,themeetingnumberis999765122,andthepasswordisBigD@T@16.

* All times are Eastern Standard Time *