Post on 17-Feb-2019
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXXP
LDMSVersion3Tutorialhttps://github.com/ovis-hpc/ovis
JimBrandt,TomTucker,AnnGentile,Nichamon Nasksinehaboon,Narate TaeratOpenGridComputing,Inc.
SandiaNationalLaboratories04/2017
OGC | Open Grid Computing, Austin, TX
Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.
SAND2017-5153 O
Aboutthisdocument
• Thisisasub-selectionofmaterialsfromanLDMStutorial.ThefulltutorialincludesVM’swithanLDMSinstallation.TheVMisnothere,howevertherunscriptsfromtheexercisesareincluded.
• IfyouinstallLDMSonyoursystem,youcanthenusethesescriptsasmodelsandworkthroughtheexercises.
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
Resources
• Documentation(Building,Using)• https://github.com/ovis-hpc/ovis/wiki
• SourceCode• https://github.com/ovis-hpc/ovis• git clonehttps://github.com/ovis-hpc/ovis.git
• Publications:• https://ovis.ca.sandia.gov
TutorialFormatOverviewoftheLightweightDistributedMetricService(LDMS)
• IntroductiontoHPCmonitoring• OverviewoftheLDMSframework
• LDMSarchitecturedescriptionSetup
• Environmentsetupdescription andverification• Introductiontosupport programsandhelperscriptsforuseinlabwork
Hands-onlabsInstructorwalkthroughandfacilitatedstudentexploration• Lab1:Samplers
• Samplerstartupandlocalandremoteverification• Lab2:Aggregators
• Aggregationstartupandverificationusingsampler• Aggregationofallotherattendees’samplers
• Lab3:Dynamicconfigurationsandresilience• Lab4:StoringdatainCSVstores• Lab5:CalculatingderiveddataandsavingtoaCSVstore• Lab6:StoringthedatainanSOSdatabase• Lab7:ExploringdatainanSOSdatabase• Lab8:Dataanalysis andVisualizationfromanSOSdatabase
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
IntroductiontoHPCMonitoring• CanonicalMonitoringGoal:Real-timetroubleshooting(e.g.,nodesdown,outofmemory,resourcecongestion)
• HPCmonitoringconcerns:• Impactonrunningapplications• Howtoaggregatedatafromdifferentsourcesforanalysis.
• Network,filesystem,CPUutilization,memoryutilization• Whatanalyseswouldbemeaningful.
• e.g.,Whatrawandderiveddatawouldindicateperformance-impactingnetworkcongestion.• Howtoprocesslargeamountsofdatainreal-time
• Asaresult,canonicalsystemmonitoring:• Typicallyperformedatintervalsofminutes• Analyseslargelyconsistsofdetectingmonitoringvaluesexceedingpre-definedthresholds
• Dataisunsuitableforgainingsignificantinsightsintoapplicationperformanceproblems
MonitoringCanEnableResource-AwareComputingLightweighthigh-frequencycontinuousrun-timemonitoring,analysis,andfeedbackcouldenable:
• Fasterproblemdetection,includingcomponent-specificissuesbasedonaparticularcomponent’sknownbehaviorsandenvironment(e.g.,thermalvariations)
• Insightintoalarge-scaleapplication’suseofresourcesunderproductionconditions,includingcontentionfromotherapplications
• Dynamicapplication-to-resourcemappingbasedonapplicationneedsandsystemstate
• Co-schedulingofapplicationsbasedoncontentionforsharedresources• Dynamicsystemoperationsbasedonadatacenter’spowerdemands,temperatureetc.
LDMSOverview• WhatistheLightweightDistributedMetricSystem(LDMS)?
• Collectnumericdata• Moveandaggregatedata• Storedata• Analyzedata
• Troubleshooting• Optimization• Informfuturedesigns
• Typicalusecasedescriptions• Supportedtechnologies
• LinuxonallbutIBMBlueGeneplatforms• Sourcesofcode,information,andsupport
LightweightDistributedMetricService(LDMS)HighLevelOverview
*Onlythecurrentdataisretainedon-node
LDMSPluginArchitecture
Memory
LDMSAPI(libldms)
SamplerPlug-in Interface
TransportDriverInterface
MemorySampler
HSNSampler
RDMATransport
SocketTransport
LDMSDaem
on
MetricSet
MetricSet
MetricSet
MetricSet
StoragePlug-inInterface
LDMSAPI(libldms)
Storage
SOS
MySQL
CSVCSVStore
OtherStore
DataFlow
Supportedplatformsandnetworks
• Platforms• Rhel 6and7• SLES11&12• Ubuntu• CrayXE6,XKandXC
• Transports• Socket• Crayugni
• Aries• Gemini
• RDMA• Infiniband• iWarp
Builddependencies
• Typicalcomputenodeenvironment• Autoconf >=2.3,automake,autotool• Libevent2-devel>=2.0.31• OpenSSH-devel
• Endusehosts(monitorcluster,specialaggregationhosts,etc.)• Python
• 2.6withtheargparsemodule• 2.7
• Swig• Doxygen fordocumentation
LDMSInstallationmethods
• Manuallyinstallusingautoconf andautomake• DeploymentusingRPMNote:Forthisdemo,LDMSispre-installedonstudentVMsin/opt/ovis.
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
Gettingstarted:Loginandsetupyourenvironmentssh –Yovis_public@XXXXXXX
$ovis_public@XXXXXX's password:*******
ovis_public@ovis-demo-login ~[sshd:]
$ssh –Yovis_public@ovis-demo-01
Note:“/home/ovis_public/demo/ldmsd/env/ldms-env.sh” isusedtosetupLDMSenvironment
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
VMdirectorystructure• VMsincludesourcecode,scriptsandconfigurationfilesforeveryexercise,helpermini-applicationsforuseintheexercises,andsupportingvisualizationtools(e.g.,gnuplot).
• Directorystructure:• source-code/
• ldms/sourcecodeofLDMSlatestreleaseversion• util/utilitycodesforuseintheexamples
• data/Pre-collectednumericdataandlogmessagedata• ldms-data/ReleasednumericdatafromNCSABlueWaters
• csvAsubsetofBlueWatersdataintheCSVformat• demo/
• ldmsd/• conf/ConfigurationfilesusedintheLDMSdemo• data/Placeholdersfortheto-be-storedLDMSdata• env/Scriptstosetupenvironmentvariables• scripts/HelperscriptstodeployingLDMSdaemons
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
Gettingstarted:SetupandverifyyourEnvironment• Systemenv.var.
• PATH= ${OVIS_HOME}/bin/:${OVIS_HOME}/sbin/:${PATH}• LD_LIBRARY_PATH= ${OVIS_HOME}/lib/:${LD_LIBRARY_PATH}• PYTHONPATH= ${OVIS_HOME}/lib/python2.7/site-packages/:${PYTHONPATH}
• LDMSenv.var.• ZAP_LIBPATH= ${OVIS_HOME}/lib/ovis-lib• LDMSD_PLUGIN_LIBPATH= ${OVIS_HOME}/lib/ovis-ldms
• LDMSauthentication• LDMS_AUTH_FILE= <pathtofilewithyoursharedsecret>
• Permissions600• Format:secretword=<8ormorecharacters>(e.g.secretword=mylittlesecret)
NOTE:${OVIS_HOME}=/opt/ovis inthisexample
Note:VM’snotinthereleasematerials.Additional configurationscriptsintheassociatedtarball
Testcode:memeater.c• Memeater codewhichrepeatedlyallocs mem.RunwithLDMStoseechangesinmemoryutilizationvaluesreportedin/proc/meminfo.
• Locatedat/home/ovis_public/source-code/util/memeater.c.Compilewithcc.
./a.outPeriodicallyincreasememoryallocated
Sleepbetweenalloc.Changethiswrtsampling frequency.
…
Sleepbeforereleasingmemory
…
LabExercises
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
LAB1: Samplers
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
StartandconfigureaLDMSdaemonLabGoals:• BasicLDMSdaemonstartupandconfigurationflags/args
• Manualandrun-timeconfigurationoptions• Outputoptions
• Logfilesandloglevels• Debuginformation
• manpages• man/opt/ovis/share/man/man8/ldmsd.8– opensldmsdmanpages• man/opt/ovis/share/man/man8/ldmsd_controller.8– opens“ldmsd_controller”manpages
• Useofldms_ls utilityasadiagnostictool• manpages
• man/opt/ovis/share/man/man8/ldms_ls.8– opensldms_lsmanpages
LDMSPluginArchitecture
Memory
LDMSAPI(libldms)
SamplerPlug-in Interface
TransportDriverInterface
MemorySampler
HSNSampler
RDMATransport
SocketTransport
LDMSDaem
on
MetricSet
MetricSet
MetricSet
MetricSet
StoragePlug-inInterface
LDMSAPI(libldms)
Storage
SOS
MySQL
CSVCSVStore
OtherStore
StartaLDMSdaemon
• Startldmsdldmsd –x sock:10001 –l sampled.log –S sampled.sock –r sampled.pid –p 20001
• -x: Transport: listeningport• -l: Specifythelogfilepathandname• -S: SpecifytheUnixdomainsocketforcommunicationwithldmsctl orldmsd_controller• -r: Specifywheretowritethepid file• -p: SpecifythelistenerportforremoteconfigurationNote:ThelogandUnixdomainsocketnamesarejuststrings.Weuse“samplerd”heretodenotethosebeingusedbyaldmsd thatwillberunning“samplers”asopposedtoperformingaggregation.
Checktoseeifldmsd isrunning
• Usingpsps auxw |grepldmsd |grep–vgrep
• Returnssomethinglike:“ovis_pu+35820.00.14016042204?Ssl12:510:00ldmsd -xsock:10001-Ssamplerd.sock“if running
• Returns:blanklineifnotrunning• Usingldms_lsldms_ls –hlocalhost–xsock–p10001
• Returns:“Connectionfailed/rejected.”ifldmsd specifieddoesnotexist• Returns:blanklineiftheldmsd specifiedexistsbuthasnometricsetsconfigured
Exercise:Runldmsd
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
Manuallyloadandconfigureasamplerplugin
LabGoals:• Basicsamplerpluginoperation
• Manualdynamicconfigurationusingthe“ldmsd_controller”utility• Staticconfigurationusingaconfigurationfile• manpages
• man/opt/ovis/share/man/man7/Plugin_meminfo.7– opensmeminfo pluginmanpages• man/opt/ovis/share/man/man7/Plugin_vmstat.7– opensvmstat pluginmanpages
• Useofldms_ls utilityasadiagnostictool• manpages
• man/opt/ovis/share/man/man8/ldms_ls.8– opensldms_lsmanpages
ConfigureLDMSdaemonSamplerPlugin(s)
• Loadthe“meminfo”samplerplugin• Configureloaded“meminfo”samplerplugin
• Givethesetname(instance)• Givethenodename(producer)• GivethecomponentID• Plugin-specificarguments
• Startsamplerpluginwithaparticularsamplingintervalandoffset
optional
Connectldmsd_controller toanldmsd
• Setup“ldmsd_controller”connectiontotheaggregatoroversocket$ldmsd_controller --host localhost --port 20001--auth_file ~/.ldmsauth.conf
Welcome to the LDMSD control processor
localhost:20001>
Exercise:Connecttoldmsd withldmsd_controller
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
LDMSPluginArchitecture
Memory
LDMSAPI(libldms)
SamplerPlug-in Interface
TransportDriverInterface
MemorySampler
HSNSampler
RDMATransport
SocketTransport
LDMSDaem
on
MetricSet
MetricSet
MetricSet
MetricSet
StoragePlug-inInterface
LDMSAPI(libldms)
Storage
SOS
MySQL
CSVCSVStore
OtherStore
InteractiveConfigurationusingtheldmsd_controller
• Loadthe“meminfo”samplerlocalhost:20001> load name=meminfo
• Configurethe“meminfo”samplerlocalhost:20001> config name=meminfo
producer=<$HOSTNAME> instance=<$HOSTNAME>/meminfocomponent_id=<host number>
QuerycurrentsetsonanLDMSDaemonusing“ldms_ls”• Useldms_ls toquerythecurrentsetsavailableonanLDMSdaemon
$ ldms_ls –h localhost -x sock -p 10001
ovis-demo-01/meminfo
$
Getthesetinformationbeforestartingthe“meminfo”sampler
$ ldms_ls –h localhost -x sock -p 10001 –v ovis-demo-01/meminfo
ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]
METADATA --------
Producer Name : ovis-demo-01
Instance Name : ovis-demo-01/meminfo
Schema Name : meminfo
Size : 1904
Metric Count : 45
GN : 2
DATA ------------
Timestamp : Wed Dec 31 18:00:00 1969 [0us]
Duration : [0.000000s]
Consistent : FALSE
Size : 400
GN : 1
-----------------
Querycurrentmetricvaluesbeforestartingthe“meminfo”sampler$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo
ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]
M u64 component_id 1
D u64 job_id 0
D u64 MemTotal 0
D u64 MemFree 0
D u64 MemAvailable 0
D u64 Buffers 0
D u64 Cached 0
D u64 SwapCached 0
D u64 Active 0
D u64 Inactive 0
…
Startthe“meminfo”sampler
• Startthe“meminfo”samplerlocalhost:20001> start name=meminfo interval=1000000 offset=0
• Thisstartsthesamplerupdatingthemetricvaluesevery1second
Getthesetinformation$ ldms_ls -x sock -p 10001 –v ovis-demo-01/meminfo
ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:46:55 2017 [3486us]
METADATA --------
Producer Name : ovis-demo-01
Instance Name : ovis-demo-01/meminfo
Schema Name : meminfo
Size : 1904
Metric Count : 45
GN : 2
DATA ------------
Timestamp : Fri Feb 10 12:46:55 2017 [3486us]
Duration : [0.000068s]
Consistent : TRUE
Size : 400
GN : 259
-----------------
Querycurrentmetricvalues
$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo
ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017[4156us]
M u64 component_id 1
D u64 job_id 0
D u64 MemTotal 1884188
D u64 MemFree 828244
D u64 MemAvailable 1639232
D u64 Buffers 948
D u64 Cached 915992
D u64 SwapCached 0
D u64 Active 84336
D u64 Inactive 891196
…
Checksourceforreference$cat/proc/meminfoMemTotal:1884188kBMemFree:828420kBMemAvailable:1639912kBBuffers:948kBCached:916396kBSwapCached:0kBActive:85144kBInactive:890212kBActive(anon): 58272kBInactive(anon):8372kBActive(file): 26872kBInactive(file):881840kB
Exercise:Manualsamplerconfiguration
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
• Killallofyourldmsd inpreparationforthenextsection$pkill ldmsd
• Killaparticularldmsd• ps auxw |grepldmsd |grep–vgrepovis_pu+35820.00.14016042204?Ssl 12:510:00ldmsd -xsock:10001-Ssamplerd.sock• kill3582
• Checktomakesureitisdead$ps auxw |grepldmsd |grep–vgrep
Startldmsd andsamplerpluginusingaconfigurationfile• ldmsd canbestartedusingaconfigurationfile
• Syntaxisidenticaltothatusedformanualconfiguration• CanbeusedtorunandconfigureBOTHsamplerandaggregatorldmsd
• Sampleconfigurationfileformeminfo example:$cat /home/ovis_public/demo/ldmsd/conf/simple_sampler.conf
load name=meminfo
config name=meminfo producer=<$HOSTNAME> instance=<$HOSTNAME>/meminfocomponent_id=<host number>
start name=meminfo interval=1000000
• Runldmsd usingthisconfigurationfile$ldmsd -x sock:10001 -l samplerd.log -S samplerd.sock –c /home/ovis_public/demo/ldmsd/conf/simple_sampler.conf
Querycurrentmetricvalues
$ldms_ls -x sock -p 10001 -l ovis-demo-01/meminfo
ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]
M u64 component_id 1
D u64 job_id 0
D u64 MemTotal 1884188
D u64 MemFree 828244
D u64 MemAvailable 1639232
D u64 Buffers 948
D u64 Cached 915992
D u64 SwapCached 0
D u64 Active 84336
D u64 Inactive 891196
…
Exercise:Staticsamplerconfiguration
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
ConfigurationToolsSummary
Dynamic/manualconfiguration(remoteorlocal)• ldmsd_controller – Pythonscriptthatcanconnecttoaldmsd viaaconfigurednetworksocketor alocalUnixDomainSocket
Staticconfiguration(local)• Configurationfile– loadedatldmsd runtime
Configurationoptionandtool.
• CMDlineconfiguration–c• ldmsctl
• CinterfacetoconfigureLDMSD.• Onlyforsamplerdaemon
• ldmsd_controller• PythoninterfacetoconfigureLDMSD.• ConnecttoanLDMSDusingUNIXdomainsocket(local)orsocket(remote).• Auto-completion• Commandhelp
• Moredetailscanbefoundathttps://www.opengridcomputing.com/wordpress/index.php/ovis-3-3-user-guide/#ldmsd-config
Startldmsd_controller
• ConnectwithUNIXdomainsocketldmsd_controller --sockname samplerd.sock
• Connectwithsocketldmsd_controller --host localhost --port 20001 --auth_file ~/.ldmsauth.conf
ldmsd_controller:Getcommandlist
samplerd.sock> help
Documented commands (type help <topic>):
========================================
EOF prdcr_del stop udata version
add prdcr_start store udata_regex
config prdcr_start_regex strgp_add updtr_add
env prdcr_stop strgp_del updtr_del
help prdcr_stop_regex strgp_metric_add updtr_match_add
include quit strgp_metric_del updtr_match_del
info say strgp_prdcr_add updtr_prdcr_add
load shell strgp_prdcr_del updtr_prdcr_del
loglevel source strgp_start updtr_start
logrotate standby strgp_stop updtr_stop
prdcr_add start term usage
DefinitelyuseforsamplerdDefinitelyuseforaggregatorsUsetoloadandconfigpluginGethelpanddaemonstatus
ldmsd_controller:commandhelp
samplerd.sock> help prdcr_add
Add an LDMS Producer to the Aggregator
Parameters:
name= A unique name for this Producer
xprt= The transport name [sock, rdma, ugni]
host= The hostname of the host
port= The port number on which the LDMS is listening
type= The connection type [active, passive]
interval= The connection retry interval (us)
LAB2: Aggregators
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
LDMSPluginArchitecture
Memory
LDMSAPI(libldms)
SamplerPlug-in Interface
TransportDriverInterface
MemorySampler
HSNSampler
RDMATransport
SocketTransport
LDMSDaem
on
MetricSet
MetricSet
MetricSet
MetricSet
StoragePlug-inInterface
LDMSAPI(libldms)
Storage
SOS
MySQL
CSVCSVStore
OtherStore
ConfigureaLDMSdaemon(ldmsd)toAggregatemetricset(s)
Goals:• Addlistofconnectionstosamplerldmsd’s• Starttheconnections• CreateanUpdatepolicy
• Howoftentogetametricset’supdate• Fromwhichsamplerldmsd’s toaggregate
• StarttheUpdatepolicy
Startanldmsd thatwillbeusedforaggregation
• StartLDMSD
ldmsd -x sock:10002 –m 10M -l aggd.log –S aggd.sock –p 20002
• -x: transport:listenerport• -m: Allocatesetmemoryforaggregatedmetricsets(default:512K)• -l: Specifythelogfilepath• -S: Specify“UnixDomainSocket”nameusedforlocalconfiguration• -p: Specifythelistenerportforremoteconfiguration
Interactiveaggregatorconfiguration
• Setup“ldmsd_controller”connectiontotheaggregatoroversocket$ldmsd_controller --host localhost --port 20002--auth_file ~/.ldmsauth.conf
Welcome to the LDMSD control processor
localhost:20002>
SimpleAggregatorConfiguration
• Configuretheaggregatortoaggregatethe“meminfo”setfromthesamplerdaemonlocalhost:20002> prdcr_add name=bar host=$HOSTNAME port=10001 xprt=sock type=active interval=20000000
localhost:20002> prdcr_start name=bar
• name:policytag• host:hostnameofthesamplerdaemon• port:Listenerportofthesamplerdaemon• xprt:Transportthesamplerdaemonlistenson• type:Always“active”• interval:Re-connectinterval
Pluginstatus(onagg afterstartedprdcr butbeforeupdtr)localhost:20002> status
Querycurrentmetricvaluesontheaggregator
$ldms_ls –h localhost -x sock -p 10002 -l
ovis-demo-01/meminfo: inconsistent, last update: Wed Dec 31 18:00:00 1969 [0us]
M u64 component_id 1
D u64 job_id 0
D u64 MemTotal 0
D u64 MemFree 0
D u64 MemAvailable 0
D u64 Buffers 0
D u64 Cached 0
D u64 SwapCached 0
D u64 Active 0
D u64 Inactive 0
…
SimpleAggregatorConfiguration
• Configuretheaggregatortoupdate the“meminfo”setlocalhost:20002> updtr_add name=foo interval=1000000 offset=200000
localhost:20002> updtr_prdcr_add name=foo regex=.*
localhost:20002> updtr_start name=foo
• name: policy tag
• interval: update interval (in usec)• Example: interval=1000000 means aggregate every 1 seconds
• offset: Target (in us) from <epoc sec>.000000
• Example: offset=10000 means aggregate every <interval> seconds at 10ms into the second.
• regex: regular expression to match the target producers tag(s)
Pluginstatus(onaggregatorafterstartedprdcr andupdtr)
localhost:20002> status
Querycurrentmetricvaluesontheaggregator
$ldms_ls -h locahost -x sock -p 10002 -l ovis-demo-01/meminfo
ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]
M u64 component_id 1
D u64 job_id 0
D u64 MemTotal 1884188
D u64 MemFree 828244
D u64 MemAvailable 1639232
D u64 Buffers 948
D u64 Cached 915992
D u64 SwapCached 0
D u64 Active 84336
D u64 Inactive 891196
…
Exercise:Validatemanualconfigurationandaggregationfromsampler
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
Startldmsd andaggregationusingaconfigurationfile• ldmsd canbestartedusingaconfigurationfile
• Syntaxisidenticaltothatusedformanualconfiguration• CanbeusedtorunandconfigureBOTHsamplerandaggregatorldmsd
• Sampleconfigurationfileformeminfo example:$cat /home/ovis_public/demo/ldmsd/conf/simple_aggregator.conf
prdcr_add name=localhost host=$HOSTNAME port=10001 xprt=sock type=active interval=20000000
prdcr_start name=localhost
updtr_add name=foo interval=1000000 offset=200000
updtr_prdcr_add name=foo regex=.*
updtr_start name=foo
• Runldmsd usingthisconfigurationfile$ldmsd -x sock:10002 -l aggd.log -S aggd.sock –c /home/ovis_public/demo/ldmsd/conf/simple_aggregator.conf
Querycurrentmetricvalues
$ldms_ls -x sock -p 10002 -l ovis-demo-01/meminfo
ovis-demo-01/meminfo: consistent, last update: Fri Feb 10 12:50:25 2017 [4156us]
M u64 component_id 1
D u64 job_id 0
D u64 MemTotal 1884188
D u64 MemFree 828244
D u64 MemAvailable 1639232
D u64 Buffers 948
D u64 Cached 915992
D u64 SwapCached 0
D u64 Active 84336
D u64 Inactive 891196
…
Exercise:Validatestaticaggregatorconfigurationandaggregationfromsampler
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
AggregatefromstudentVMs
• Killaggregatorldmsd• Restartldmsd using“-cstudents_all_aggregator.conf”• Killaggregatorldmsd• Restartldmsd using“-cstudents_subset_aggregator.conf”
Pluginstatus(onaggregator fromallstudents)
localhost:20002> status
…
Exercise:Validatestaticaggregatorconfigurationandaggregationfromsampler
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
LAB3: DynamicChangesandResilience
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
DynamicConfigurationChanges
• Dynamicconfiguration• Samplerdaemons
• stopsamplerplugins• startwithdifferentintervals
• Aggregatordaemons• stopprdcr/updtr/strgp• removeprdcr/updtr/strgp• changeinterval
DynamicChangesandRobustness• On-the-flyadditionsofsamplerswillbediscoveredbytheaggregatingldmsd
• Exercise– onestudentwilladdthevmstat samplervialdmsd_controllertohisrunningldmsd.Allotherswillseeitappearintheiraggregatorswhicharecollectingfromthatsampler.
• Exercise – onestudentwillstophismeminfo samplervialdmsd_controller inhisrunningldmsd.Allotherswillseeinldms_ls timestampoutputthatthatstudent’smetricsetceasestoupdate.
• Exercise – thesamestudentwillrestarthismeminfo samplervialdmsd_controller inhisrunningldmsd.Allotherswillseeinldms_ls timestampoutputthatthatstudent’smetricsetresumesupdating.
• SamplersandAggregatorscanbestartedinanyorder• LDMScollectionandtransporttopologiesarerobusttoSamplersandAggregatorsbeingkilledandrestarted
• Exercise – onestudentwillkillhisldmsd sampler.Allotherstudentswillseeinldms_ls timestampoutputthatthatstudent’smetricsetceasestoupdate
• Exercise – thesamestudentwillrestarthisldmsd sampler.Allotherstudentswillseeinldms_ls timestampoutputthatthatstudent’smetricsetresumesupdating.
LAB4: StoringdatainCSVstores
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
LDMSPluginArchitecture
Memory
LDMSAPI(libldms)
SamplerPlug-in Interface
TransportDriverInterface
MemorySampler
HSNSampler
RDMATransport
SocketTransport
LDMSDaem
on
MetricSet
MetricSet
MetricSet
MetricSet
StoragePlug-inInterface
LDMSAPI(libldms)
Storage
SOS
MySQL
CSVCSVStore
OtherStore
Storingdatatocsvfile(s)• Goals:
• Configureacsvstorewithldmsd_controller• Configureacsvstorewithconfiguration file• Storeoptions
• Exampleoutput:#Time,Time_usec,ProducerName,component_id,job_id,MemTotal,MemFree,MemAvailable,Buffers,Cached,SwapCached,Active,Inactive,Active(anon),Inactive(anon),Active(file),Inactive(file),Unevictable,Mlocked,SwapTotal,SwapFree,Dirty,Writeback,AnonPages,Mapped,Shmem,Slab,SReclaimable,SUnreclaim,KernelStack,PageTables,NFS_Unstable,Bounce,WritebackTmp,CommitLimit,Committed_AS,VmallocTotal,VmallocUsed,VmallocChunk,HardwareCorrupted,AnonHugePages,HugePages_Total,HugePages_Free,HugePages_Rsvd,HugePages_Surp,Hugepagesize,DirectMap4k,DirectMap2M1487105964.002482,2482,ovis-demo-09,9,0,1884188,571028,1688632,0,1212004,6108,104536,1122496,8276,8580,96260,1113916,0,0,839676,793956,420,0,10552,24812,1796,52124,40104,12020,1792,3280,0,0,0,1781768,387984,34359738367,7216,34359728128,0,2048,0,0,0,0,2048,47040,20500481487105963.002583,2583,ovis-demo-02,2,0,1884188,1665280,1671132,948,107512,0,71540,80920,44128,8308,27412,72612,0,0,839676,839676,0,0,44000,22264,8436,35680,24304,11376,1600,2940,0,0,0,1781768,296444,34359738367,7216,34359728128,0,6144,0,0,0,0,2048,34752,20623361487105963.001964,1964,ovis-demo-08,8,0,1884188,1623168,1644996,948,129700,0,89312,101956,60788,8332,28524,93624,0,0,839676,839676,0,0,60620,23912,8500,36456,24608,11848,1872,4364,0,0,0,1781768,403252,34359738367,7216,34359728128,0,16384,0,0,0,0,2048,44992,2052096
AggregatorConfigurationtostoremetricsetdatausingCSVstore
• Configuretheaggregatortostorethe“meminfo”settoacsvfile usingldmsd_controller
• Loadthestore_csv plugin• Configuretheplugin
$ldmsd_controller --host localhost --port 20002 --auth_file ~/.ldmsauth.conf
localhost:20002> load name=store_csv
localhost:20002> config name=store_csv path=/home/ovis_public/demo/ldmsd/data action=init buffer=0
• name:pluginname• path:Pathtothebasedirectoryforthecsvfilecontainer.Thisdirectorymustpre-exist.• action:‘init’toinitializetheplugin(otheractionswillnotbedescribedinthistutorial)• buffer:‘0’todisablebuffering• manpage:
• man/opt/ovis/share/man/man7/Plugin_store_csv.7– opensstore_csv pluginmanpages
AggregatorConfigurationtostoremetricsetdatausingCSVstore
• Configuretheaggregatortostorethe“meminfo”settoacsvfile.localhost:20002> strgp_add name=meminfo_store_csvplugin=store_csv container=csv schema=meminfolocalhost:20002> strgp_start name=meminfo_store_csv
• name:storagepolicytag• plugin:storepluginusedforstoringmetricsetdata• container:thestoragebackendcontainername.Forcsv,thisisthedirectorywheretheoutputfilewillgo.Thiswillbecreated.
• schema:metricsetschematobestored
PluginStatus(storeinfoonly)localhost:20002> status
ExaminingtheCSVfile• Thedataissavedin:/home/ovis_public/demo/ldmsd/data/csv/meminfo
1.Checkingthecsvfile$tail–f/home/ovis_public/demo/ldmsd/data/csv/meminfo• Ifaggregatingfromothers’vm’s,seemultiplehostsintheoutput
2.Datachanges:• Runthememeater executable$./a.out
• Comparethelivememeater outputwiththetail–fvalues
Exercise:StoreCSV
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
Startcsvstorewithaconfigurationfilewithadvancedconfigurationoptions
• Aggregatorconfigurationfileat:/home/ovis_public/demo/ldmsd/conf/agg.confloadname=store_csvconfig name=store_csvpath=/home/ovis_public/demo/ldmsd/dataaction=initbuffer=0
rollover=120rolltype=1altheader=1strgp_add name=meminfo_store_csv schema=meminfo plugin=store_csv container=csvstrgp_startname=meminfo_store_csv
• Newconfigurationoptions:• Rolloverbytimeorsize:
• rollover=120 rolltype=1 – rollsoverevery120sec.Outputfileispostpended withepochtimestamp(meminfo.12345)
• Headerinaseparatefile:• altheader=1
Startcsvstorewithaconfigurationfilewithadvancedconfigurationoptions
• Uncommentthelinesforstore_csv only(notstore_function_csv)• Killcurrentaggregator(notthesampler)andRestartaggregator:
ldmsd -x sock:10002 -l agg.log -p 20002
-c /home/ovis_public/demo/ldmsd/conf/agg.conf
• Notethefilerolloverandalternateheader
Exercise:CSVstorewithaconfigurationfileandadvancedconfigurationoptions
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
LAB5: CalculatingderiveddataandsavingtoaCSVstore
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
Storingdatatostorefunctioncsvfile(s)Goals:• Configureafunctioncsvstorewithldmsd_controller• Configureafunctioncsvstorewithaconfigurationfile• Functionoptions
Exampleoutput:#Time,Time_usec,DT,DT_usec,ProducerName,component_id,job_id,RAW_ACTIVE,RAW_ACTIVE.Flag,RAW_MEMTOTAL,RAW_MEMTOTAL.Flag,RATIO100,RATIO100.Flag, TimeFlag1487107627.002486,2486,0.999712,999712,ovis-demo-i03,103,0,828068,0,1884188,0,43,0,01487107628.002425,2425,0.999939,999939,ovis-demo-i03,103,0,975536,0,1884188,0,51,0,01487107629.002402,2402,0.999977,999977,ovis-demo-i03,103,0,975528,0,1884188,0,51,0,01487107630.018970,18970,1.016568,16568,ovis-demo-i03,103,0,980228,0,1884188,0,52,0,01487107631.002405,2405,0.983435,983435,ovis-demo-i03,103,0,1122996,0,1884188,0,59,0,0
Active/Memtotal ratioincreasingwhilememeater runs
Store_function_csv configurationfileConfigurationFileat/home/ovis_public/demo/ldmsd/conf/fct.conf#SCHEMANEW_METRICNAMEFUNCTIONN_MET<METS_CSV>SCALE|THRESHWRITEOUTmeminfo RAW_ACTIVERAW1Active11meminfo RAW_MEMTOTAL RAW1MemTotal 11meminfo RATIO100DIV_AB2RAW_ACTIVE,RAW_MEMTOTAL 100 1
• Functions:RAW(rawvalue),ScalarandVectoradd/subtract/multiply/divide,thresholdchecks,min/max
• manpage• man/opt/ovis/share/man/man7/Plugin_store_function_csv.7 – opensstore_function_csv pluginmanpages
• Chainvariablesforacomplexcomputation• V3Limitations(addressedinfutureversions):
• u64castatallsteps.Canusescaletokeepprecision.• Functionsareonlyperinstanceofametricset(e.g.,cannotcombinedatafrommeminfo andvmstat,cannotcombineinfofromdifferentcomponents)
• Outputflags:Flagforinvalidforeverycomputationandforageusec
AggregatorConfigurationtostoremetricsetdatausingstore_function_csv
• Configuretheaggregatortostorederiveddatafromthe“meminfo”settoacsvfile.
$ldmsd_controller --host localhost --port 20002 --auth_file ~/.ldmsauth.conf
localhost:20002> load name=store_function_csv
localhost:20002> config name=store_function_csv
path=/home/ovis_public/demo/ldmsd/data buffer=0 ageusec=2000000
derivedconf=/home/ovis_public/demo/ldmsd/conf/fct.conf
• action:‘init’toinitializetheplugin• derived_conf: derivedconfigurationfile(cantakemultiples:csv)• ageusec: flagwhentheDTbetweendatapointsisgreaterthanthisvalue
AggregatorConfigurationtostoremetricsetdatausingstore_function_csv
•Configuretheaggregatortostorederiveddatafromthe“meminfo”settoacsvfile.
localhost:20002> strgp_add name=mem_fplugin=store_function_csv container=csv_fctschema=meminfo
localhost:20002> strgp_start name=mem_f
PluginStatus(storeinfoonlyshown)localhost:20002> status
StoringderiveddatatoafunctionstoreCSVfile• Thedataissavedat/home/ovis_public/demo/ldmsd/data/csv_fct/meminfo
• Checkingthecsv_fct file:tail-f/home/ovis_public/demo/ldmsd/data/csv_fct/meminfo
Exercise:Store_function_csv
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
StoringderiveddatatoafunctionstoreCSVfileusingtheldmsd configurationfile
• Uncommentthelinesforstore_function_csv (store_csv linesarestilluncommented)
• Killcurrentaggregator(notthesampler)andRestartaggregator:ldmsd -x sock:10002 -l agg.log -p 20002
-c /home/ovis_public/demo/ldmsd/conf/agg.conf
• Checkingthecsv_fct filetail -f /home/ovis_public/demo/ldmsd/data/csv_fct/meminfo
• Runthememeater codeatsametimeasstoringdata:./a.out #thememeater executable
comparethelivememeater outputwiththetail–fvalues
Exercise:Store_function_csv withconfigurationfileandmemeater
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
LAB6: StoringthedatainanSOSdatabase
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
LDMSPluginArchitecture
Memory
LDMSAPI(libldms)
SamplerPlug-in Interface
TransportDriverInterface
MemorySampler
HSNSampler
RDMATransport
SocketTransport
LDMSDaem
on
MetricSet
MetricSet
MetricSet
MetricSet
StoragePlug-inInterface
LDMSAPI(libldms)
Storage
SOS
MySQL
CSVCSVStore
OtherStore
Configuretheaggregator’sSOSstoreplugin
• Steps:• Loadthestore_sos plugin• Configuretheplugin
localhost:20002> load name=store_soslocalhost:20002> config name=store_sospath=/home/ovis_public/demo/ldmsd/data/sos
• name:pluginname• path:PathtothedirectorytocontaintheSOSdatabase
AddastoragepolicytosavethememinfodatatotheSOSstore
•Configuretheaggregatortostorethe“meminfo”settoaSOSdatabase.
localhost:20002> strgp_add name=meminfo_sos plugin=store_sos container=meminfo schema=meminfolocalhost:20002> strgp_start name=meminfo_sos
• name:storagepolicytag• plugin:storepluginusedforstoringmetricsetdata• container:thestoragebackendcontainername• schema:metricsetschematobestored
Useaconfigurationfiletoconfigurethestorageback-end• Edittheconfigurationfileat~/demo/ldmsd/conf/agg.conf
• Uncommentthestore_sos configurationlines• Killcurrentaggregator(notthesampler)• Restarttheaggregator
ldmsd -x sock:10002 -l agg.log -p 20003 \–c ~/demo/ldmsd/conf/agg.conf
LAB7: ExploringdatainanSOSdatabase
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
Exercise:UsetheSOStoolstoexplorethedatabase• sos_cmd
• Createcontainers• Createandqueryschema• Importandquerydata
• lmq• PlotdatastoredintheSOSdatabase
• DatavisualizationonGrafana
Queryavailableschemasinyourdatabase
$ sos_cmd -C /home/ovis_public/demo/ldmsd/data/sos/meminfo/ -l
-attribute : MemTotaltype : UINT64idx : 5indexed : 0offset : 48
-attribute : MemFreetype : UINT64idx : 6indexed : 0offset : 56
Containernamegivenatstrgp_add
schema :name : meminfoschema_sz : 4504obj_sz : 408id : 129-attribute : timestamp
type : TIMESTAMPidx : 0indexed : 1offset : 8
QuerydataintheSOSdatabase
timestamp component_id MemFree Active-------------------------------- ------------------ ------------------ ------------------
1487100290.607418 0 1636160 80120 1487100300.609416 0 1636160 80120 1487100310.611474 0 1642688 76016
. . . 1487114607.002163 103 1628516 90320 1487114608.002077 103 1628516 90320
-------------------------------- ------------------ ------------------ ------------------Records 887636/887636.
sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time -Vtimestamp–Vcomponent_id -VMemFree -VActive|less
-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput
OutputthedataasaCSVfile
#timestamp,component_id,MemFree,Active1487100290.607418,0,1636160,801201487100300.609416,0,1636160,801201487100310.611474,0,1642688,76016. . . 1487114606.002196,103,1628548,903201487114607.002163,103,1628516,903201487114608.002077,103,1628516,90320#Records889483/889483.-------------------------------- ------------------ ------------------ ------------------Records 887636/887636.
sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time-Vtimestamp–Vcomponent_id -VMemFree -VActive -fcsv|less
-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput-fcsvformattheoutputasCSV
OutputthedataasaJSONfile
{"data":[{"timestamp":"1487100290.607418","component_id" :"0","MemFree":"1636160","Active":"80120"},{"timestamp":"1487100300.609416","component_id" :"0","MemFree":"1636160","Active":"80120"},{"timestamp":"1487100310.611474","component_id" :"0","MemFree":"1642688","Active":"76016"},{"timestamp":"1487100320.613736","component_id" :"0","MemFree":"1641272","Active":"77292"},. . .{"timestamp":"1487114606.002196","component_id" :"103","MemFree":"1628548","Active":"90320"},{"timestamp":"1487114607.002163","component_id" :"103","MemFree":"1628516","Active":"90320"},{"timestamp":"1487114608.002077","component_id" :"103","MemFree":"1628516","Active":"90320"}],"totalRecords":890414,"recordCount":890414}
sos_cmd -C/home/ovis_public/demo/ldmsd/data/sos/meminfo \-q-Smeminfo -Xcomp_time-Vtimestamp–Vcomponent_id -VMemFree -VActive -fjson |less
-qQuerythedatabase-SSchemaname-Xindexusedtoorderdata-Vonceforcolumnintheoutput-fcsvformattheoutputasJSon
LAB8: DataAnalysisandVisualizationfromanSOSdatabase
Note:VM’snotinthereleasematerials.Additional configuration scriptsintheassociatedtarball
lmqLDMStooltoplottime-seriesgraphs
Queryrangeofdatesavailableinthedatabaselmq --path /home/ovis_public/demo/data/sos/meminfo \
--query dates --schema meminfo
There are data available from 02/13/17 14:47:44 (1487018864.002345) through 02/15/17 21:12:21 (1487214741.002282)
--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery
Exercise:Plottime-seriesgraphofametric
$lmq --path ~/demo/ldmsd/data/sos/meminfo --query data --schema meminfo \
--metric_name MemFree --component_id 2
--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery--metric_name Themetricdatatoplot--component_id Thecomponentdatatoplot
lmq plotofMemFee ofcomponent2
Exercise:Plotagraphshowingwindowedaverage,andrunningwindowedvariance
lmq --path~/demo/ldmsd/data/sos/meminfo --querydata--schemameminfo \--metric_name current_freemem --component_id2--bollinger
--path Thepathtothecontainer--query Whatisbeingqueried--schema Theschematoquery--metric_name Themetricdatatoplot--component_id Thecomponentdatatoplot--bollinger PlotBollingerbandsandoutliers
lmq plotofMemFree ofcomponent2