Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY...
Transcript of Red Hat OpenStack Platform 16 · CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY...
Red Hat OpenStack Platform 160
Service Telemetry Framework
Installing and deploying Service Telemetry Framework
Last Updated 2020-10-21
Red Hat OpenStack Platform 160 Service Telemetry Framework
Installing and deploying Service Telemetry Framework
OpenStack Teamrhos-docsredhatcom
Legal Notice
Copyright copy 2020 Red Hat Inc
The text of and illustrations in this document are licensed by Red Hat under a Creative CommonsAttributionndashShare Alike 30 Unported license (CC-BY-SA) An explanation of CC-BY-SA isavailable athttpcreativecommonsorglicensesby-sa30 In accordance with CC-BY-SA if you distribute this document or an adaptation of it you mustprovide the URL for the original version
Red Hat as the licensor of this document waives the right to enforce and agrees not to assertSection 4d of CC-BY-SA to the fullest extent permitted by applicable law
Red Hat Red Hat Enterprise Linux the Shadowman logo the Red Hat logo JBoss OpenShiftFedora the Infinity logo and RHCE are trademarks of Red Hat Inc registered in the United Statesand other countries
Linux reg is the registered trademark of Linus Torvalds in the United States and other countries
Java reg is a registered trademark of Oracle andor its affiliates
XFS reg is a trademark of Silicon Graphics International Corp or its subsidiaries in the United Statesandor other countries
MySQL reg is a registered trademark of MySQL AB in the United States the European Union andother countries
Nodejs reg is an official trademark of Joyent Red Hat is not formally related to or endorsed by theofficial Joyent Nodejs open source or commercial project
The OpenStack reg Word Mark and OpenStack logo are either registered trademarksservice marksor trademarksservice marks of the OpenStack Foundation in the United States and othercountries and are used with the OpenStack Foundations permission We are not affiliated withendorsed or sponsored by the OpenStack Foundation or the OpenStack community
All other trademarks are the property of their respective owners
Abstract
This guide contains information about installing the core components and deploying ServiceTelemetry Framework
Table of Contents
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE12 INSTALLATION SIZE
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK21 THE CORE COMPONENTS OF STF22 PREPARING YOUR OCP ENVIRONMENT FOR STF
221 Persistent volumes2211 Using ephemeral storage
222 Resource allocation223 Node tuning operator
23 DEPLOYING STF TO THE OCP ENVIRONMENT231 Deploying STF to the OCP environment with ElasticSearch232 Deploying STF to the OCP environment without ElasticSearch233 Creating a namespace234 Creating an OperatorGroup235 Enabling the OperatorHubio Community Catalog Source236 Enabling Red Hat STF Operator Source237 Subscribing to the AMQ Certificate Manager Operator238 Subscribing to the Elastic Cloud on Kubernetes Operator239 Subscribing to the Service Telemetry Operator2310 Creating a ServiceTelemetry object in OCP
24 REMOVING STF FROM THE OCP ENVIRONMENT241 Deleting the namespace242 Removing the OperatorSource
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FORSERVICE TELEMETRY FRAMEWORK
331 Retrieving the AMQ Interconnect route address332 Configuring the STF connection for the overcloud333 Validating client-side installation
CHAPTER 4 ADVANCED FEATURES41 CUSTOMIZING THE DEPLOYMENT
411 Manifest override parameters412 Overriding a managed manifest
Procedure42 ALERTS
Additional resources421 Creating an alert rule in Prometheus
ProcedureAdditional resources
422 Configuring custom alertsProcedureAdditional resources
423 Creating an alert route in AlertmanagerProcedureAdditional resources
43 HIGH AVAILABILITY
446
8889999
1010111111
121213141415171717
191919
20202122
2626262828293030303131313131323434
Table of Contents
1
431 Configuring high availabilityProcedure
44 DASHBOARDS441 Setting up Grafana to host the dashboard
4411 Viewing and editing queries442 The Grafana infrastructure dashboard
4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system
45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways
Procedure4521 Example manifests
453 Creating the OpenStack environment fileProcedureAdditional resources
454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE
461 Configuring ephemeral storage
APPENDIX A COLLECTD PLUG-INS
343435353636363738383839404141
42434445454646
47
Red Hat OpenStack Platform 160 Service Telemetry Framework
2
Table of Contents
3
CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK
Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types
Metric
a numeric measurement of an application or system
Event
irregular and discrete occurrences that happen in a system
The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP
STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration
11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo
Table 11 STF components
Client Component Server(OCP)
yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus
yes
no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus
yes
no Prometheus as time-series data storage yes
no ElasticSearch as events data storage yes
yes collectd to collect infrastructure metrics and events no
yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no
Figure 11 Service Telemetry Framework architecture overview
Red Hat OpenStack Platform 160 Service Telemetry Framework
4
Figure 11 Service Telemetry Framework architecture overview
NOTE
The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported
For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus
If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore
Server-side STF monitoring infrastructure consists of the following layers
Service Telemetry Framework 1x (STF)
Red Hat OpenShift Container Platform (OCP)
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
5
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Red Hat OpenStack Platform 160 Service Telemetry Framework
Installing and deploying Service Telemetry Framework
OpenStack Teamrhos-docsredhatcom
Legal Notice
Copyright copy 2020 Red Hat Inc
The text of and illustrations in this document are licensed by Red Hat under a Creative CommonsAttributionndashShare Alike 30 Unported license (CC-BY-SA) An explanation of CC-BY-SA isavailable athttpcreativecommonsorglicensesby-sa30 In accordance with CC-BY-SA if you distribute this document or an adaptation of it you mustprovide the URL for the original version
Red Hat as the licensor of this document waives the right to enforce and agrees not to assertSection 4d of CC-BY-SA to the fullest extent permitted by applicable law
Red Hat Red Hat Enterprise Linux the Shadowman logo the Red Hat logo JBoss OpenShiftFedora the Infinity logo and RHCE are trademarks of Red Hat Inc registered in the United Statesand other countries
Linux reg is the registered trademark of Linus Torvalds in the United States and other countries
Java reg is a registered trademark of Oracle andor its affiliates
XFS reg is a trademark of Silicon Graphics International Corp or its subsidiaries in the United Statesandor other countries
MySQL reg is a registered trademark of MySQL AB in the United States the European Union andother countries
Nodejs reg is an official trademark of Joyent Red Hat is not formally related to or endorsed by theofficial Joyent Nodejs open source or commercial project
The OpenStack reg Word Mark and OpenStack logo are either registered trademarksservice marksor trademarksservice marks of the OpenStack Foundation in the United States and othercountries and are used with the OpenStack Foundations permission We are not affiliated withendorsed or sponsored by the OpenStack Foundation or the OpenStack community
All other trademarks are the property of their respective owners
Abstract
This guide contains information about installing the core components and deploying ServiceTelemetry Framework
Table of Contents
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE12 INSTALLATION SIZE
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK21 THE CORE COMPONENTS OF STF22 PREPARING YOUR OCP ENVIRONMENT FOR STF
221 Persistent volumes2211 Using ephemeral storage
222 Resource allocation223 Node tuning operator
23 DEPLOYING STF TO THE OCP ENVIRONMENT231 Deploying STF to the OCP environment with ElasticSearch232 Deploying STF to the OCP environment without ElasticSearch233 Creating a namespace234 Creating an OperatorGroup235 Enabling the OperatorHubio Community Catalog Source236 Enabling Red Hat STF Operator Source237 Subscribing to the AMQ Certificate Manager Operator238 Subscribing to the Elastic Cloud on Kubernetes Operator239 Subscribing to the Service Telemetry Operator2310 Creating a ServiceTelemetry object in OCP
24 REMOVING STF FROM THE OCP ENVIRONMENT241 Deleting the namespace242 Removing the OperatorSource
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FORSERVICE TELEMETRY FRAMEWORK
331 Retrieving the AMQ Interconnect route address332 Configuring the STF connection for the overcloud333 Validating client-side installation
CHAPTER 4 ADVANCED FEATURES41 CUSTOMIZING THE DEPLOYMENT
411 Manifest override parameters412 Overriding a managed manifest
Procedure42 ALERTS
Additional resources421 Creating an alert rule in Prometheus
ProcedureAdditional resources
422 Configuring custom alertsProcedureAdditional resources
423 Creating an alert route in AlertmanagerProcedureAdditional resources
43 HIGH AVAILABILITY
446
8889999
1010111111
121213141415171717
191919
20202122
2626262828293030303131313131323434
Table of Contents
1
431 Configuring high availabilityProcedure
44 DASHBOARDS441 Setting up Grafana to host the dashboard
4411 Viewing and editing queries442 The Grafana infrastructure dashboard
4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system
45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways
Procedure4521 Example manifests
453 Creating the OpenStack environment fileProcedureAdditional resources
454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE
461 Configuring ephemeral storage
APPENDIX A COLLECTD PLUG-INS
343435353636363738383839404141
42434445454646
47
Red Hat OpenStack Platform 160 Service Telemetry Framework
2
Table of Contents
3
CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK
Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types
Metric
a numeric measurement of an application or system
Event
irregular and discrete occurrences that happen in a system
The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP
STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration
11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo
Table 11 STF components
Client Component Server(OCP)
yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus
yes
no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus
yes
no Prometheus as time-series data storage yes
no ElasticSearch as events data storage yes
yes collectd to collect infrastructure metrics and events no
yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no
Figure 11 Service Telemetry Framework architecture overview
Red Hat OpenStack Platform 160 Service Telemetry Framework
4
Figure 11 Service Telemetry Framework architecture overview
NOTE
The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported
For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus
If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore
Server-side STF monitoring infrastructure consists of the following layers
Service Telemetry Framework 1x (STF)
Red Hat OpenShift Container Platform (OCP)
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
5
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Legal Notice
Copyright copy 2020 Red Hat Inc
The text of and illustrations in this document are licensed by Red Hat under a Creative CommonsAttributionndashShare Alike 30 Unported license (CC-BY-SA) An explanation of CC-BY-SA isavailable athttpcreativecommonsorglicensesby-sa30 In accordance with CC-BY-SA if you distribute this document or an adaptation of it you mustprovide the URL for the original version
Red Hat as the licensor of this document waives the right to enforce and agrees not to assertSection 4d of CC-BY-SA to the fullest extent permitted by applicable law
Red Hat Red Hat Enterprise Linux the Shadowman logo the Red Hat logo JBoss OpenShiftFedora the Infinity logo and RHCE are trademarks of Red Hat Inc registered in the United Statesand other countries
Linux reg is the registered trademark of Linus Torvalds in the United States and other countries
Java reg is a registered trademark of Oracle andor its affiliates
XFS reg is a trademark of Silicon Graphics International Corp or its subsidiaries in the United Statesandor other countries
MySQL reg is a registered trademark of MySQL AB in the United States the European Union andother countries
Nodejs reg is an official trademark of Joyent Red Hat is not formally related to or endorsed by theofficial Joyent Nodejs open source or commercial project
The OpenStack reg Word Mark and OpenStack logo are either registered trademarksservice marksor trademarksservice marks of the OpenStack Foundation in the United States and othercountries and are used with the OpenStack Foundations permission We are not affiliated withendorsed or sponsored by the OpenStack Foundation or the OpenStack community
All other trademarks are the property of their respective owners
Abstract
This guide contains information about installing the core components and deploying ServiceTelemetry Framework
Table of Contents
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE12 INSTALLATION SIZE
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK21 THE CORE COMPONENTS OF STF22 PREPARING YOUR OCP ENVIRONMENT FOR STF
221 Persistent volumes2211 Using ephemeral storage
222 Resource allocation223 Node tuning operator
23 DEPLOYING STF TO THE OCP ENVIRONMENT231 Deploying STF to the OCP environment with ElasticSearch232 Deploying STF to the OCP environment without ElasticSearch233 Creating a namespace234 Creating an OperatorGroup235 Enabling the OperatorHubio Community Catalog Source236 Enabling Red Hat STF Operator Source237 Subscribing to the AMQ Certificate Manager Operator238 Subscribing to the Elastic Cloud on Kubernetes Operator239 Subscribing to the Service Telemetry Operator2310 Creating a ServiceTelemetry object in OCP
24 REMOVING STF FROM THE OCP ENVIRONMENT241 Deleting the namespace242 Removing the OperatorSource
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FORSERVICE TELEMETRY FRAMEWORK
331 Retrieving the AMQ Interconnect route address332 Configuring the STF connection for the overcloud333 Validating client-side installation
CHAPTER 4 ADVANCED FEATURES41 CUSTOMIZING THE DEPLOYMENT
411 Manifest override parameters412 Overriding a managed manifest
Procedure42 ALERTS
Additional resources421 Creating an alert rule in Prometheus
ProcedureAdditional resources
422 Configuring custom alertsProcedureAdditional resources
423 Creating an alert route in AlertmanagerProcedureAdditional resources
43 HIGH AVAILABILITY
446
8889999
1010111111
121213141415171717
191919
20202122
2626262828293030303131313131323434
Table of Contents
1
431 Configuring high availabilityProcedure
44 DASHBOARDS441 Setting up Grafana to host the dashboard
4411 Viewing and editing queries442 The Grafana infrastructure dashboard
4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system
45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways
Procedure4521 Example manifests
453 Creating the OpenStack environment fileProcedureAdditional resources
454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE
461 Configuring ephemeral storage
APPENDIX A COLLECTD PLUG-INS
343435353636363738383839404141
42434445454646
47
Red Hat OpenStack Platform 160 Service Telemetry Framework
2
Table of Contents
3
CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK
Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types
Metric
a numeric measurement of an application or system
Event
irregular and discrete occurrences that happen in a system
The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP
STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration
11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo
Table 11 STF components
Client Component Server(OCP)
yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus
yes
no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus
yes
no Prometheus as time-series data storage yes
no ElasticSearch as events data storage yes
yes collectd to collect infrastructure metrics and events no
yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no
Figure 11 Service Telemetry Framework architecture overview
Red Hat OpenStack Platform 160 Service Telemetry Framework
4
Figure 11 Service Telemetry Framework architecture overview
NOTE
The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported
For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus
If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore
Server-side STF monitoring infrastructure consists of the following layers
Service Telemetry Framework 1x (STF)
Red Hat OpenShift Container Platform (OCP)
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
5
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Table of Contents
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE12 INSTALLATION SIZE
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK21 THE CORE COMPONENTS OF STF22 PREPARING YOUR OCP ENVIRONMENT FOR STF
221 Persistent volumes2211 Using ephemeral storage
222 Resource allocation223 Node tuning operator
23 DEPLOYING STF TO THE OCP ENVIRONMENT231 Deploying STF to the OCP environment with ElasticSearch232 Deploying STF to the OCP environment without ElasticSearch233 Creating a namespace234 Creating an OperatorGroup235 Enabling the OperatorHubio Community Catalog Source236 Enabling Red Hat STF Operator Source237 Subscribing to the AMQ Certificate Manager Operator238 Subscribing to the Elastic Cloud on Kubernetes Operator239 Subscribing to the Service Telemetry Operator2310 Creating a ServiceTelemetry object in OCP
24 REMOVING STF FROM THE OCP ENVIRONMENT241 Deleting the namespace242 Removing the OperatorSource
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FORSERVICE TELEMETRY FRAMEWORK
331 Retrieving the AMQ Interconnect route address332 Configuring the STF connection for the overcloud333 Validating client-side installation
CHAPTER 4 ADVANCED FEATURES41 CUSTOMIZING THE DEPLOYMENT
411 Manifest override parameters412 Overriding a managed manifest
Procedure42 ALERTS
Additional resources421 Creating an alert rule in Prometheus
ProcedureAdditional resources
422 Configuring custom alertsProcedureAdditional resources
423 Creating an alert route in AlertmanagerProcedureAdditional resources
43 HIGH AVAILABILITY
446
8889999
1010111111
121213141415171717
191919
20202122
2626262828293030303131313131323434
Table of Contents
1
431 Configuring high availabilityProcedure
44 DASHBOARDS441 Setting up Grafana to host the dashboard
4411 Viewing and editing queries442 The Grafana infrastructure dashboard
4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system
45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways
Procedure4521 Example manifests
453 Creating the OpenStack environment fileProcedureAdditional resources
454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE
461 Configuring ephemeral storage
APPENDIX A COLLECTD PLUG-INS
343435353636363738383839404141
42434445454646
47
Red Hat OpenStack Platform 160 Service Telemetry Framework
2
Table of Contents
3
CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK
Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types
Metric
a numeric measurement of an application or system
Event
irregular and discrete occurrences that happen in a system
The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP
STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration
11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo
Table 11 STF components
Client Component Server(OCP)
yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus
yes
no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus
yes
no Prometheus as time-series data storage yes
no ElasticSearch as events data storage yes
yes collectd to collect infrastructure metrics and events no
yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no
Figure 11 Service Telemetry Framework architecture overview
Red Hat OpenStack Platform 160 Service Telemetry Framework
4
Figure 11 Service Telemetry Framework architecture overview
NOTE
The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported
For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus
If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore
Server-side STF monitoring infrastructure consists of the following layers
Service Telemetry Framework 1x (STF)
Red Hat OpenShift Container Platform (OCP)
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
5
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
431 Configuring high availabilityProcedure
44 DASHBOARDS441 Setting up Grafana to host the dashboard
4411 Viewing and editing queries442 The Grafana infrastructure dashboard
4421 Top panels4422 Networking panels4423 CPU panels4424 Memory panels4425 Diskfile system
45 CONFIGURING MULTIPLE CLOUDS451 Planning AMQP address prefixes452 Deploying Smart Gateways
Procedure4521 Example manifests
453 Creating the OpenStack environment fileProcedureAdditional resources
454 Querying metrics data from multiple clouds46 EPHEMERAL STORAGE
461 Configuring ephemeral storage
APPENDIX A COLLECTD PLUG-INS
343435353636363738383839404141
42434445454646
47
Red Hat OpenStack Platform 160 Service Telemetry Framework
2
Table of Contents
3
CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK
Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types
Metric
a numeric measurement of an application or system
Event
irregular and discrete occurrences that happen in a system
The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP
STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration
11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo
Table 11 STF components
Client Component Server(OCP)
yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus
yes
no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus
yes
no Prometheus as time-series data storage yes
no ElasticSearch as events data storage yes
yes collectd to collect infrastructure metrics and events no
yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no
Figure 11 Service Telemetry Framework architecture overview
Red Hat OpenStack Platform 160 Service Telemetry Framework
4
Figure 11 Service Telemetry Framework architecture overview
NOTE
The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported
For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus
If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore
Server-side STF monitoring infrastructure consists of the following layers
Service Telemetry Framework 1x (STF)
Red Hat OpenShift Container Platform (OCP)
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
5
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Table of Contents
3
CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK
Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types
Metric
a numeric measurement of an application or system
Event
irregular and discrete occurrences that happen in a system
The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP
STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration
11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo
Table 11 STF components
Client Component Server(OCP)
yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus
yes
no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus
yes
no Prometheus as time-series data storage yes
no ElasticSearch as events data storage yes
yes collectd to collect infrastructure metrics and events no
yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no
Figure 11 Service Telemetry Framework architecture overview
Red Hat OpenStack Platform 160 Service Telemetry Framework
4
Figure 11 Service Telemetry Framework architecture overview
NOTE
The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported
For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus
If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore
Server-side STF monitoring infrastructure consists of the following layers
Service Telemetry Framework 1x (STF)
Red Hat OpenShift Container Platform (OCP)
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
5
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
CHAPTER 1 INTRODUCTION TOSERVICE TELEMETRY FRAMEWORK
Service Telemetry Framework (STF) provides automated collection of measurements and data fromremote clients - Red Hat OpenStack Platform or third-party nodes - and transmission of thatinformation to a centralized receiving Red Hat OpenShift Container Platform (OCP) deployment forstorage retrieval and monitoring The data can be either of two types
Metric
a numeric measurement of an application or system
Event
irregular and discrete occurrences that happen in a system
The collection components that are required on the clients are lightweight The multicast message busthat is shared by all clients and the deployment provides fast and reliable data transport Other modularcomponents for receiving and storing data are deployed in containers on OCP
STF provides access to monitoring functions such as alert generation visualization through dashboardsand single source of truth telemetry analysis to support orchestration
11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
Service Telemetry Framework (STF) uses the components described in Table 11 ldquoSTF componentsrdquo
Table 11 STF components
Client Component Server(OCP)
yes An AMQP 1x compatible messaging bus to shuttle the metrics to STF forstorage in Prometheus
yes
no Smart Gateway to pick metrics and events from the AMQP 1x bus and todeliver events to ElasticSearch or to provide metrics to Prometheus
yes
no Prometheus as time-series data storage yes
no ElasticSearch as events data storage yes
yes collectd to collect infrastructure metrics and events no
yes Ceilometer to collect Red Hat OpenStack Platform metrics and events no
Figure 11 Service Telemetry Framework architecture overview
Red Hat OpenStack Platform 160 Service Telemetry Framework
4
Figure 11 Service Telemetry Framework architecture overview
NOTE
The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported
For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus
If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore
Server-side STF monitoring infrastructure consists of the following layers
Service Telemetry Framework 1x (STF)
Red Hat OpenShift Container Platform (OCP)
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
5
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Figure 11 Service Telemetry Framework architecture overview
NOTE
The Service Telemetry Framework data collection components collectd and Ceilometerand the transport components AMQ Interconnect and Smart Gateway are fullysupported The data storage components Prometheus and ElasticSearch including theOperator artifacts and visualization component Grafana are community-supported andare not officially supported
For metrics on the client side collectd collects high-resolution metrics collectd delivers the data toPrometheus by using the AMQP1 plugin which places the data onto the message bus On the serverside a Golang application called the Smart Gateway takes the data stream from the bus and exposes itas a local scrape endpoint for Prometheus
If you plan to collect and store events collectd or Ceilometer delivers event data to the server side byusing the AMQP1 plugin which places the data onto the message bus Another Smart Gateway writesthe data to the ElasticSearch datastore
Server-side STF monitoring infrastructure consists of the following layers
Service Telemetry Framework 1x (STF)
Red Hat OpenShift Container Platform (OCP)
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
5
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Infrastructure platform
Figure 12 Server-side STF monitoring infrastructure
For more information about how to deploy Red Hat OpenShift Container Platform see the OCPproduct documentation You can install OCP on cloud platforms or on bare metal For more informationabout STF performance and scaling see httpsaccessredhatcomarticles4907241
NOTE
Do not install OCP on the same infrastructure that you want to monitor
12 INSTALLATION SIZE
The size of your Red Hat OpenShift Container Platform installation depends on the following factors
The number of nodes you want to monitor
The number of metrics you want to collect
The resolution of metrics
The length of time that you want to store the data
Installation of Service Telemetry Framework (STF) depends on the existingRed Hat OpenShift Container Platform environment Ensure that you install monitoring for
Red Hat OpenStack Platform 160 Service Telemetry Framework
6
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Red Hat OpenStack Platform on a platform separate from your Red Hat OpenStack Platformenvironment You can install Red Hat OpenShift Container Platform (OCP) on baremetal or othersupported cloud platforms For more information about installing OCP see OpenShift ContainerPlatform 43 Documentation
The size of your OCP environment depends on the infrastructure you select For more informationabout minimum resources requirements when installing OCP on baremetal see Minimum resourcerequirements in the Installing a cluster on bare metal guide For installation requirements of the variouspublic and private cloud platforms which you can install see the corresponding installationdocumentation for your cloud platform of choice
CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
7
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
CHAPTER 2 INSTALLING THE CORE COMPONENTS OFSERVICE TELEMETRY FRAMEWORK
Before you install Service Telemetry Framework (STF) ensure thatRed Hat OpenShift Container Platform (OCP) version 4x is running and that you understand the corecomponents of the framework As part of the OCP installation planning process ensure that theadministrator provides persistent storage and enough resources to run the STF components on top ofthe OCP environment
WARNING
Red Hat OpenShift Container Platform version 43 or later is currently required for asuccessful installation of STF
21 THE CORE COMPONENTS OF STF
The following STF core components are managed by Operators
Prometheus and AlertManager
ElasticSearch
Smart Gateway
AMQ Interconnect
Each component has a corresponding Operator that you can use to load the various applicationcomponents and objects
Additional resources
For more information about Operators see the Understanding Operators guide
22 PREPARING YOUR OCP ENVIRONMENT FOR STF
As you prepare your OCP environment for STF you must plan for persistent storage adequateresources and event storage
Ensure that persistent storage is available in your Red Hat OpenShift Container Platform clusterto permit a production grade deployment For more information see Section 221 ldquoPersistentvolumesrdquo
Ensure that enough resources are available to run the Operators and the application containersFor more information see Section 222 ldquoResource allocationrdquo
To install ElasticSearch you must use a community catalog source If you do not want to use acommunity catalog or if you do not want to store events see Section 23 ldquoDeploying STF to theOCP environmentrdquo
STF uses ElasticSearch to store events which requires a larger than normal
Red Hat OpenStack Platform 160 Service Telemetry Framework
8
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
vmmax_map_count The vmmax_map_count value is set by default inRed Hat OpenShift Container Platform For more information about how to edit the value of vmmax_map_count see Section 223 ldquoNode tuning operatorrdquo
221 Persistent volumes
STF uses persistent storage in OCP to instantiate the volumes dynamically so that Prometheus andElasticSearch can store metrics and events
Additional resources
For more information about configuring persistent storage for OCP see Understanding persistentstorage
2211 Using ephemeral storage
WARNING
You can use ephemeral storage with STF However if you use ephemeral storageyou might experience data loss if a pod is restarted updated or rescheduled ontoanother node Use ephemeral storage only for development or testing and notproduction environments
Procedure
To enable ephemeral storage for STF set storageEphemeralEnabled true in your ServiceTelemetry manifest
Additional resources
For more information about enabling ephemeral storage for STF see Section 461 ldquoConfiguringephemeral storagerdquo
222 Resource allocation
To enable the scheduling of pods within the OCP infrastructure you need resources for thecomponents that are running If you do not allocate enough resources pods remain in a Pending statebecause they cannot be scheduled
The amount of resources that you require to run STF depends on your environment and the number ofnodes and clouds that you want to monitor
Additional resources
For recommendations about sizing for metrics collection seehttpsaccessredhatcomarticles4907241
For information about sizing requirements for ElasticSearch seehttpswwwelasticcoguideencloud-on-k8scurrentk8s-managing-compute-resourceshtml
223 Node tuning operator
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
9
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
STF uses ElasticSearch to store events which requires a larger than normal vmmax_map_count The vmmax_map_count value is set by default in Red Hat OpenShift Container Platform
If you want to edit the value of vmmax_map_count you cannot apply node tuning manually using the sysctl command because Red Hat OpenShift Container Platform manages nodes directly To configurevalues and apply them to the infrastructure you must use the node tuning operator For moreinformation see Using the Node Tuning Operator
In an OCP deployment the default node tuning operator specification provides the required profiles forElasticSearch workloads or pods scheduled on nodes To view the default cluster node tuningspecification run the following command
The output of the default specification is documented at Default profiles set on a cluster Theassignment of profiles is managed in the recommend section where profiles are applied to a node whencertain conditions are met When scheduling ElasticSearch to a node in STF one of the following profilesis applied
openshift-control-plane-es
openshift-node-es
When scheduling an ElasticSearch pod there must be a label present that matches tunedopenshiftioelasticsearch If the label is present one of the two profiles is assigned to the podNo action is required by the administrator if you use the recommended Operator for ElasticSearch Ifyou use a custom-deployed ElasticSearch with STF ensure that you add the tunedopenshiftioelasticsearch label to all scheduled pods
Additional resources
For more information about virtual memory usage by ElasticSearch seehttpswwwelasticcoguideenelasticsearchreferencecurrentvm-max-map-counthtml
For more information about how the profiles are applied to nodes see Custom tuning specification
23 DEPLOYING STF TO THE OCP ENVIRONMENT
You can deploy STF to the OCP environment in one of two ways
Deploy STF and store events with ElasticSearch For more information see Section 231ldquoDeploying STF to the OCP environment with ElasticSearchrdquo
Deploy STF without ElasticSearch and disable events support For more information seeSection 232 ldquoDeploying STF to the OCP environment without ElasticSearchrdquo
231 Deploying STF to the OCP environment with ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 235 ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
oc get Tuneddefault -o yaml -n openshift-cluster-node-tuning-operator
Red Hat OpenStack Platform 160 Service Telemetry Framework
10
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
4 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
5 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
6 Section 238 ldquoSubscribing to the Elastic Cloud on Kubernetes Operatorrdquo
7 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
8 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
232 Deploying STF to the OCP environment without ElasticSearch
Complete the following tasks
1 Section 233 ldquoCreating a namespacerdquo
2 Section 234 ldquoCreating an OperatorGrouprdquo
3 Section 236 ldquoEnabling Red Hat STF Operator Sourcerdquo
4 Section 237 ldquoSubscribing to the AMQ Certificate Manager Operatorrdquo
5 Section 239 ldquoSubscribing to the Service Telemetry Operatorrdquo
6 Section 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo
233 Creating a namespace
Create a namespace to hold the STF components The service-telemetry namespace is usedthroughout the documentation
Procedure
Enter the following command
234 Creating an OperatorGroup
Create an OperatorGroup in the namespace so that you can schedule the Operator pods
Procedure
Enter the following command
oc new-project service-telemetry
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorGroupmetadata name service-telemetry-operator-group namespace service-telemetryspec targetNamespaces - service-telemetryEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
11
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Additional resources
For more information see OperatorGroups
235 Enabling the OperatorHubio Community Catalog Source
Before you install ElasticSearch you must have access to the resources on the OperatorHubioCommunity Catalog Source
Procedure
Enter the following command
236 Enabling Red Hat STF Operator Source
Before you deploy STF on Red Hat OpenShift Container Platform you must enable the operatorsource
Procedure
1 Install an OperatorSource that contains the Service Telemetry Operator and the SmartGateway Operator
2 To validate the creation of your OperatorSource use the oc get operatorsources command A
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind CatalogSourcemetadata name operatorhubio-operators namespace openshift-marketplacespec sourceType grpc image quayiooperator-frameworkupstream-community-operatorslatest displayName OperatorHubio Operators publisher OperatorHubioEOF
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1kind OperatorSourcemetadata labels opsrc-provider redhat-operators-stf name redhat-operators-stf namespace openshift-marketplacespec authorizationToken displayName Red Hat STF Operators endpoint httpsquayiocnr publisher Red Hat registryNamespace redhat-operators-stf type appregistryEOF
Red Hat OpenStack Platform 160 Service Telemetry Framework
12
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
2 To validate the creation of your OperatorSource use the oc get operatorsources command Asuccessful import results in the MESSAGE field returning a result of The object has been successfully reconciled
3 To validate that the Operators are available from the catalog use the oc get packagemanifestcommand
237 Subscribing to the AMQ Certificate Manager Operator
You must subscribe to the AMQ Certificate Manager Operator before you deploy the other STFcomponents because the AMQ Certificate Manager Operator runs globally-scoped and is notcompatible with the dependency management of Operator Lifecycle Manager when used with othernamespace-scoped operators
Procedure
1 Subscribe to the AMQ Certificate Manager Operator create the subscription and validate theAMQ7 Certificate Manager
NOTE
The AMQ Certificate Manager is installed globally for all namespaces so the namespace value provided is openshift-operators You might not see your amq7-cert-managerv100 ClusterServiceVersion in the service-telemetrynamespace for a few minutes until the processing executes against thenamespace
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-
$ oc get -nopenshift-marketplace operatorsource redhat-operators-stf
NAME TYPE ENDPOINT REGISTRY DISPLAYNAME PUBLISHER STATUS MESSAGEredhat-operators-stf appregistry httpsquayiocnr redhat-operators-stf Red Hat STF Operators Red Hat Succeeded The object has been successfully reconciled
$ oc get packagemanifests | grep Red Hat STF
smartgateway-operator Red Hat STF Operators 2m50sservicetelemetry-operator Red Hat STF Operators 2m50s
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name amq7-cert-manager namespace openshift-operatorsspec channel alpha installPlanApproval Automatic name amq7-cert-manager source redhat-operators sourceNamespace openshift-marketplaceEOF
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
13
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
2 To validate your ClusterServiceVersion use the oc get csv command Ensure that amq7-cert-managerv100 has a phase Succeeded
238 Subscribing to the Elastic Cloud on Kubernetes Operator
Before you install the Service Telemetry Operator and if you plan to store events in ElasticSearch youmust enable the Elastic Cloud Kubernetes Operator
Procedure
1 Apply the following manifest to your OCP environment to enable the Elastic Cloud onKubernetes Operator
2 To verify that the ClusterServiceVersion for ElasticSearch Cloud on Kubernetes succeededenter the oc get csv command
239 Subscribing to the Service Telemetry Operator
To instantiate an STF instance create the ServiceTelemetry object to allow the Service TelemetryOperator to create the environment
Procedure
1 To create the Service Telemetry Operator subscription enter the oc apply -f command
$ oc get --namespace openshift-operators csv
NAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1kind Subscriptionmetadata name elastic-cloud-eck namespace service-telemetryspec channel stable installPlanApproval Automatic name elastic-cloud-eck source operatorhubio-operators sourceNamespace openshift-marketplaceEOF
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeeded
oc apply -f - ltltEOFapiVersion operatorscoreoscomv1alpha1
Red Hat OpenStack Platform 160 Service Telemetry Framework
14
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
2 To validate the Service Telemetry Operator and the dependent operators enter the followingcommand
2310 Creating a ServiceTelemetry object in OCP
To deploy the Service Telemetry Framework you must create an instance of ServiceTelemetry in OCPBy default eventsEnabled is set to false If you do not want to store events in ElasticSearch ensurethat eventsEnabled is set to false For more information see Section 232 ldquoDeploying STF to the OCPenvironment without ElasticSearchrdquo
The following core parameters are available for a ServiceTelemetry manifest
Table 21 Core parameters for a ServiceTelemetry manifest
Parameter Description Default Value
eventsEnabled Enable events support in STFRequires prerequisite steps toensure ElasticSearch can bestarted For more information seeSection 238 ldquoSubscribing to theElastic Cloud on KubernetesOperatorrdquo
false
metricsEnabled Enable metrics support in STF true
kind Subscriptionmetadata name servicetelemetry-operator namespace service-telemetryspec channel stable installPlanApproval Automatic name servicetelemetry-operator source redhat-operators-stf sourceNamespace openshift-marketplaceEOF
$ oc get csv --namespace service-telemetryNAME DISPLAY VERSION REPLACES PHASEamq7-cert-managerv100 Red Hat Integration - AMQ Certificate Manager 100 Succeededamq7-interconnect-operatorv120 Red Hat Integration - AMQ Interconnect 120 Succeededelastic-cloud-eckv110 Elastic Cloud on Kubernetes 110 elastic-cloud-eckv101 Succeededprometheusoperator0370 Prometheus Operator 0370 prometheusoperator0320 Succeededservice-telemetry-operatorv102 Service Telemetry Operator 102 service-telemetry-operatorv101 Succeededsmart-gateway-operatorv101 Smart Gateway Operator 101 smart-gateway-operatorv100 Succeeded
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
15
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
highAvailabilityEnabled Enable high availability in STF Formore information see Section 43ldquoHigh availabilityrdquo
false
storageEphemeralEnabled Enable ephemeral storagesupport in STF For moreinformation see Section 46ldquoEphemeral storagerdquo
false
Parameter Description Default Value
Procedure
1 To store events in ElasticSearch set eventsEnabled to true during deployment
2 To view the STF deployment logs in the Service Telemetry Operator use the oc logscommand
PLAY RECAP localhost ok=37 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
3 View the pods and the status of each pod to determine that all workloads are operatingnominally
NOTE
If you set eventsEnabled true the notification Smart Gateways will Error andCrashLoopBackOff for a period of time before ElasticSearch starts
oc apply -f - ltltEOFapiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec eventsEnabled true metricsEnabled trueEOF
oc logs $(oc get pod --selector=name=service-telemetry-operator -oname) -c ansible
$ oc get pods
NAME READY STATUS RESTARTS AGEalertmanager-stf-default-0 22 Running 0 26melastic-operator-645dc8b8ff-jwnzt 11 Running 0 88melasticsearch-es-default-0 11 Running 0 26minterconnect-operator-6fd49d9fb9-4bl92 11 Running 0
Red Hat OpenStack Platform 160 Service Telemetry Framework
16
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
24 REMOVING STF FROM THE OCP ENVIRONMENT
Remove STF from an OCP environment if you no longer require the STF functionality
Complete the following tasks
1 Section 241 ldquoDeleting the namespacerdquo
2 Section 242 ldquoRemoving the OperatorSourcerdquo
241 Deleting the namespace
To remove the operational resources for STF from OCP delete the namespace
Procedure
1 Run the oc delete command
2 Verify that the resources have been deleted from the namespace
242 Removing the OperatorSource
If you do not expect to install Service Telemetry Framework again delete the OperatorSourceWhen you remove the OperatorSource PackageManifests related to STF are removed from theOperator Lifecycle Manager catalog
Procedure
1 Delete the OperatorSource
46mprometheus-operator-bf7d97fb9-kwnlx 11 Running 0 46mprometheus-stf-default-0 33 Running 0 26mservice-telemetry-operator-54f4c99d9b-k7ll6 22 Running 0 46msmart-gateway-operator-7ff58bcf94-66rvx 22 Running 0 46mstf-default-ceilometer-notification-smartgateway-6675df547q4lbj 11 Running 0 26mstf-default-collectd-notification-smartgateway-698c87fbb7-xj528 11 Running 0 26mstf-default-collectd-telemetry-smartgateway-79c967c8f7-9hsqn 11 Running 0 26mstf-default-interconnect-7458fd4d69-nqbfs 11 Running 0 26m
oc delete project service-telemetry
$ oc get allNo resources found
CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
17
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
2 Verify that the STF PackageManifests are removed from the platform If successful thefollowing command returns no result
3 If you enabled the OperatorHubio Community Catalog Source during the installationprocess and you no longer need this catalog source delete it
Additional resources
For more information about the OperatorHubio Community Catalog Source see Section 23ldquoDeploying STF to the OCP environmentrdquo
$ oc delete --namespace=openshift-marketplace operatorsource redhat-operators-stfoperatorsourceoperatorscoreoscom redhat-operators-stf deleted
$ oc get packagemanifests | grep Red Hat STF
$ oc delete --namespace=openshift-marketplace catalogsource operatorhubio-operatorscatalogsourceoperatorscoreoscom operatorhubio-operators deleted
Red Hat OpenStack Platform 160 Service Telemetry Framework
18
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
CHAPTER 3 COMPLETING THESERVICE TELEMETRY FRAMEWORK CONFIGURATION
31 CONNECTING RED HAT OPENSTACK PLATFORM TOSERVICE TELEMETRY FRAMEWORK
To collect metrics events or both and to send them to the Service Telemetry Framework (STF)storage domain you must configure the Red Hat OpenStack Platform overcloud to enable datacollection and transport
To deploy data collection and transport to STF on Red Hat OpenStack Platform cloud nodes thatemploy routed L3 domains such as distributed compute node (DCN) or spine-leaf see Section 32ldquoDeploying to non-standard network topologiesrdquo
32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
If your nodes are on a separate network from the default InternalApi network you must makeconfiguration adjustments so that AMQ Interconnect can transport data to theService Telemetry Framework (STF) server instance This scenario is typical in a spine-leaf or aDCN topology For more information about DCN configuration see the Spine Leaf Networkingguide
If you use STF with Red Hat OpenStack Platform 160 and plan to monitor your Ceph Block orObject storage nodes you must make configuration changes that are similar to the configurationchanges that you make to the spine-leaf and DCN network configuration To monitor Ceph nodesuse the CephStorageExtraConfig parameter to define which network interface to load into theAMQ Interconnect and collectd configuration files
Similarly you must specify BlockStorageExtraConfig and ObjectStorageExtraConfig parameters ifyour environment uses Block and Object storage roles
The deployment of a spine-leaf topology involves creating roles and networks then assigningthose networks to the available roles When you configure data collection and transport for STF foran Red Hat OpenStack Platform deployment the default network for roles is InternalApi For CephBlock and Object storage roles the default network is Storage Because a spine-leaf configurationcan result in different networks being assigned to different Leaf groupings and those names aretypically unique additional configuration is required in the parameter_defaults section of theRed Hat OpenStack Platform environment files
Procedure
1 Document which networks are available for each of the Leaf roles For examples of networkname definitions see Creating a network data file in the Spine Leaf Networking guide Formore information about the creation of the Leaf groupings (roles) and assignment of thenetworks to those groupings see Creating a roles data file in the Spine Leaf Networkingguide
CephStorageExtraConfig tripleoprofilebasemetricscollectdamqp_host hiera(storage) tripleoprofilebasemetricsqdrlistener_addr hiera(storage) tripleoprofilebaseceilometeragentnotificationnotifier_host_addr hiera(storage)
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
19
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
2 Add the following configuration example to the ExtraConfig section for each of the leafroles In this example internal_api_subnet is the value defined in the name_lowerparameter of your network definition (with _subnet appended to the name for Leaf 0) andis the network to which the ComputeLeaf0 leaf role is connected In this case the networkidentification of 0 corresponds to the Compute role for leaf 0 and represents a value thatis different from the default internal API network nameFor the ComputeLeaf0 leaf role specify extra configuration to perform a hiera lookup todetermine which network interface for a particular network to assign to the collectd AMQPhost parameter Perform the same configuration for the AMQ Interconnect listeneraddress parameter
Additional leaf roles typically replace _subnet with _leafN where N represents a uniqueindentifier for the leaf
This example configuration is on a CephStorage leaf role
33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUDFOR SERVICE TELEMETRY FRAMEWORK
To configure the Red Hat OpenStack Platform overcloud you must configure the data collectionapplications and the data transport to STF and deploy the overcloud
To configure the Red Hat OpenStack Platform overcloud complete the following tasks
1 Section 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
2 Section 332 ldquoConfiguring the STF connection for the overcloudrdquo
3 Section 333 ldquoValidating client-side installationrdquo
331 Retrieving the AMQ Interconnect route address
When you configure the Red Hat OpenStack Platform overcloud for STF you must provide theAMQ Interconnect route address in the STF connection file
Procedure
1 Log in to your Red Hat OpenShift Container Platform (OCP) environment
2 In the service-telemetry project retrieve the AMQ Interconnect route address
ComputeLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_subnet)
ComputeLeaf1ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(internal_api_leaf1)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(internal_api_leaf1)
CephStorageLeaf0ExtraConfigrsaquo tripleoprofilebasemetricscollectdamqp_host hiera(storage_subnet)rsaquo tripleoprofilebasemetricsqdrlistener_addr hiera(storage_subnet)
Red Hat OpenStack Platform 160 Service Telemetry Framework
20
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
NOTE
If your STF installation differs from the documentation ensure that youretrieve the correct AMQ Interconnect route address
332 Configuring the STF connection for the overcloud
To configure the STF connection you must create a file that contains the connection configurationof the AMQ Interconnect for the overcloud to the STF deployment Enable the collection of eventsand storage of the events in STF and deploy the overcloud
Procedure
1 Log in to the Red Hat OpenStack Platform undercloud as the stack user
2 Create a configuration file called stf-connectorsyaml in the homestack directory
IMPORTANT
The Service Telemetry Operator simplifies the deployment of all dataingestion and data storage components for single cloud deployments Toshare the data storage domain with multiple clouds see Section 45ldquoConfiguring multiple cloudsrdquo
3 In the stf-connectorsyaml file configure the MetricsQdrConnectors address to connectthe AMQ Interconnect on the overcloud to the STF deployment
Add the CeilometerQdrPublishEvents true parameter to enable collection andtransport of Ceilometer events to STF
Replace the host parameter with the value of HOSTPORT that you retrieved inSection 331 ldquoRetrieving the AMQ Interconnect route addressrdquo
4 Add the following files to your Red Hat OpenStack Platform director deployment to setupcollectd and AMQ Interconnect
the stf-connectorsyaml environment file
the enable-stfyaml file that ensures that the environment is being used during theovercloud deployment
$ oc get routes -ogo-template= range items printf sn spechost end | grep -5671stf-default-interconnect-5671-service-telemetryappsinfrawatch
parameter_defaults EventPipelinePublishers [] CeilometerQdrPublishEvents true MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch port 443 role edge sslProfile sslProfile verifyHostname false
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
21
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
the ceilometer-write-qdryaml file that ensures that Ceilometer telemetry is sent to STF
5 Deploy the Red Hat OpenStack Platform overcloud
333 Validating client-side installation
To validate data collection from the STF storage domain query the data sources for delivereddata To validate individual nodes in the Red Hat OpenStack Platform deployment connect to theconsole using SSH
Procedure
1 Log in to an overcloud node for example controller-0
2 Ensure that metrics_qdr container is running on the node
3 Return the internal network address on which AMQ Interconnect is running for example 17217144 listening on port 5666
4 Return a list of connections to the local AMQ Interconnect
openstack overcloud deploy ltother argumentsgt --templates usrshareopenstack-tripleo-heat-templates --environment-file ltother-environment-filesgt --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsmetricsceilometer-write-qdryaml --environment-file usrshareopenstack-tripleo-heat-templatesenvironmentsenable-stfyaml --environment-file homestackstf-connectorsyaml
$ sudo podman container inspect --format StateStatus metrics_qdr
running
$ sudo podman exec -it metrics_qdr cat etcqpid-dispatchqdrouterdconf
listener host 17217144 port 5666 authenticatePeer no saslMechanisms ANONYMOUS
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --connections
Connections id host container role dir security authentication tenant ============================================================================================================================================================================================================================================================================================ 1 stf-default-interconnect-5671-service-telemetryappsinfrawatch443 stf-default-
Red Hat OpenStack Platform 160 Service Telemetry Framework
22
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
There are four connections
Outbound connection to STF
Inbound connection from collectd
Inbound connection from ceilometer
Inbound connection from our qdstat clientThe outbound STF connection is provided to the MetricsQdrConnectors hostparameter and is the route for the STF storage domain The other hosts are internalnetwork addresses of the client connections to this AMQ Interconnect
5 To ensure that messages are being delivered list the links and view the _edge address inthe deliv column for delivery of messages
interconnect-7458fd4d69-bgzfb edge out TLSv12(DHE-RSA-AES256-GCM-SHA384) anonymous-user 12 1721714460290 openstackorgomcontainercontroller-0ceilometer-agent-notification255c02cee550f143ec9ea030db5cccba14 normal in no-security no-auth 16 1721714436408 metrics normal in no-security anonymous-user 899 1721714439500 10a2e99d-1b8a-4329-b48c-4335e5f75c84 normal in no-security no-auth
$ sudo podman exec -it metrics_qdr qdstat --bus=172171445666 --linksRouter Links type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate =========================================================================================================================================================== endpoint out 1 5 local _edge 250 0 0 0 2979926 2979924 0 0 0 2 0 0 0 endpoint in 1 6 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 7 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 8 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 1 9 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint out 1 10 250 0 0 0 911 911 0 0 0 0 0 911 0 endpoint in 1 11 250 0 0 0 0 911 0 0 0 0 0 0 0 endpoint out 12 32 local templSY6Mcicol4J2Kp 250 0 0 0 0 0 0 0 0 0 0 0 0 endpoint in 16 41 250 0 0 0 2979924 2979924 0 0 0 0 0 0 0 endpoint in 912 1834 mobile $management 0 250 0 0 0 1
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
23
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
6 To list the addresses from Red Hat OpenStack Platform nodes to STF connect to OCP toget the AMQ Interconnect pod name and list the connections List the availableAMQ Interconnect pods
7 Connect to the pod and run the qdstat --connections command to list the knownconnections
In this example there are three edge connections from the Red Hat OpenStack Platformnodes with connection id 22 23 and 24
8 To view the number of messages delivered by the network use each address with the oc exec command
0 0 1 0 0 0 0 0 endpoint out 912 1835 local temp9Ok2resI9tmt+CT 250 0 0 0 0 0 0 0 0 0 0 0 0
$ oc get pods -l application=stf-default-interconnect
NAME READY STATUS RESTARTS AGEstf-default-interconnect-7458fd4d69-bgzfb 11 Running 0 6d21h
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --connections
2020-04-21 182547243852 UTCstf-default-interconnect-7458fd4d69-bgzfb
Connections id host container role dir security authentication tenant last dlv uptime ====================================================================================================================================================================================================== 1 1012922143062 rcv[stf-default-collectd-telemetry-smartgateway-79c967c8f7-kq4qv] normal in no-security anonymous-user 000000000 006215025 2 1013005255754 rcv[stf-default-ceilometer-notification-smartgateway-6675df547mbjk5] normal in no-security anonymous-user 000212557 006214936 3 1013005143110 rcv[stf-default-collectd-notification-smartgateway-698c87fbb7-f28v6] normal in no-security anonymous-user 000213653 006214909 22 101280151948 Routerceph-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 23 101280151950 Routercompute-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000003 000220843 24 101280152082 Routercontroller-0redhatlocal edge in TLSv1SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 000000000 000220834 27 12700142202 c2f541c1-4c97-4b37-a189-a396c08fb079 normal in no-security no-auth 000000000 000000000
Red Hat OpenStack Platform 160 Service Telemetry Framework
24
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
$ oc exec -it stf-default-interconnect-7458fd4d69-bgzfb -- qdstat --address
2020-04-21 182010293258 UTCstf-default-interconnect-7458fd4d69-bgzfb
Router Addresses class addr phs distrib pri local remote in out thru fallback ==================================================================================================================== mobile anycastceilometereventsample 0 balanced - 1 0 1553 1553 0 0 mobile collectdnotify 0 multicast - 1 0 10 10 0 0 mobile collectdtelemetry 0 multicast - 1 0 7798049 7798049 0 0
CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
25
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
CHAPTER 4 ADVANCED FEATURESThe following optional features can provide additional functionality to theService Telemetry Framework (STF)
Customizing the deployment For more information see Section 41 ldquoCustomizing thedeploymentrdquo
Alerts For more information see Section 42 ldquoAlertsrdquo
High availability For more information see Section 43 ldquoHigh availabilityrdquo
Dashboards For more information see Section 44 ldquoDashboardsrdquo
Multiple clouds For more information see Section 45 ldquoConfiguring multiple cloudsrdquo
Ephemeral storage For more information see Section 46 ldquoEphemeral storagerdquo
41 CUSTOMIZING THE DEPLOYMENT
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load intoRed Hat OpenShift Container Platform (OCP) The Operator then creates other objects in memorywhich results in the dependent Operators creating the workloads they are responsible formanaging
WARNING
When you override the manifest you must provide the entire manifest contentsincluding object names or namespaces There is no dynamic parametersubstitution when you override a manifest
To override a manifest successfully with Service Telemetry Framework (STF) deploy a defaultenvironment using the core options only For more information about the core options seeSection 2310 ldquoCreating a ServiceTelemetry object in OCPrdquo When you deploy STF use the oc getcommand to retrieve the default deployed manifest When you use a manifest that was originallygenerated by Service Telemetry Operator the manifest is compatible with the other objects thatare managed by the Operators
For example when the metricsEnabled true parameter is configured in the ServiceTelemetrymanifest the Service Telemetry Operator requests components for metrics retrieval and storageusing the default manifests In some cases you might want to override the default manifest Formore information see Section 411 ldquoManifest override parametersrdquo
411 Manifest override parameters
This table describes the available parameters that you can use to override a manifest along withthe corresponding retrieval commands
Table 41 Manifest override parameters
Red Hat OpenStack Platform 160 Service Telemetry Framework
26
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Override parameter Description Retrieval command
alertmanagerManifest Override the Alertmanagerobject creation The PrometheusOperator watches for Alertmanager objects
oc get alertmanager stf-default -oyaml
alertmanagerConfigManifest Override the Secret thatcontains the Alertmanagerconfiguration The PrometheusOperator uses a secret named alertmanager- alertmanager-name forexample stf-default to providethe alertmanageryamlconfiguration to Alertmanager
oc get secret alertmanager-stf-default -oyaml
elasticsearchManifest Override the ElasticSearchobject creation The Elastic Cloudon Kuberneters Operator watchesfor ElasticSearch objects
oc get elasticsearch elasticsearch -oyaml
interconnectManifest Override the Interconnectobject creation The AMQInterconnect Operator watchesfor Interconnect objects
oc get interconnect stf-default-interconnect -oyaml
prometheusManifest Override the Prometheusobject creation The PrometheusOperator watches for Prometheus objects
oc get prometheus stf-default -oyaml
servicemonitorManifest Override the ServiceMonitorobject creation The PrometheusOperator watches for ServiceMonitor objects
oc get servicemonitor stf-default -oyaml
smartgatewayCollectdMetricsManifest
Override the SmartGatewayobject creation for collectdmetrics The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-telemetry -oyaml
smartgatewayCollectdEventsManifest
Override the SmartGatewayobject creation for collectdevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-collectd-notification -oyaml
CHAPTER 4 ADVANCED FEATURES
27
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
smartgatewayCeilometerEventsManifest
Override the SmartGatewayobject creation for Ceilometerevents The Smart GatewayOperator watches for SmartGateway objects
oc get smartgateway stf-default-ceilometer-notification -oyaml
Override parameter Description Retrieval command
412 Overriding a managed manifest
Edit the ServiceTelemetry object and provide a parameter and manifest For a list of availablemanifest override parameters see Section 41 ldquoCustomizing the deploymentrdquo The default ServiceTelemetry object is stf-default Use oc get servicetelemetry to list the available STFdeployments
TIP
The oc edit command loads the default system editor To override the default editor pass or setthe environment variable EDITOR to the preferred editor For example EDITOR=nano oc edit servicetelemetry stf-default
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Load the ServiceTelemetry object into an editor
4 To modify the ServiceTelemetry object provide a manifest override parameter and thecontents of the manifest to write to OCP instead of the defaults provided by STF
NOTE
The trailing pipe (|) after entering the manifest override parameter indicatesthat the value provided is multi-line
oc project service-telemetry
oc edit servicetelemetry stf-default
$ oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata annotations kubectlkubernetesiolast-applied-configuration | apiVersioninfrawatchv1alpha1kindServiceTelemetrymetadataannotationsnamestf-defaultnamespaceservice-telemetryspecmetricsEnabledtrue creationTimestamp 2020-04-14T202942Z
Red Hat OpenStack Platform 160 Service Telemetry Framework
28
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
1
2
Manifest override parameter is defined in the spec of the ServiceTelemetry object
End of the manifest override content
5 Save and close
42 ALERTS
You create alert rules in Prometheus and alert routes in Alertmanager Alert rules in Prometheusservers send alerts to an Alertmanager which manages the alerts Alertmanager can silenceinhibit or aggregate alerts and send notifications using email on-call notification systems or chatplatforms
To create an alert complete the following tasks
1 Create an alert rule in Prometheus For more information see Section 421 ldquoCreating analert rule in Prometheusrdquo
generation 1 name stf-default namespace service-telemetry resourceVersion 1949423 selfLink apisinfrawatchv1alpha1namespacesservice-telemetryservicetelemetrysstf-default uid d058bc41-1bb0-49f5-9a8b-642f4b8adb95spec metricsEnabled true smartgatewayCollectdMetricsManifest | 1 apiVersion smartgatewayinfrawatchv2alpha1 kind SmartGateway metadata name stf-default-collectd-telemetry namespace service-telemetry spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdtelemetry debug true prefetch 15000 serviceType metrics size 1 useTimestamp true 2status conditions - ansibleResult changed 0 completion 2020-04-14T203219079508 failures 0 ok 52 skipped 1 lastTransitionTime 2020-04-14T202959Z message Awaiting next reconciliation reason Successful status True type Running
CHAPTER 4 ADVANCED FEATURES
29
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
2 Create an alert route in Alertmanager For more information see Section 423 ldquoCreating analert route in Alertmanagerrdquo
Additional resourcesFor more information about alerts or notifications with Prometheus and Alertmanager seehttpsprometheusiodocsalertingoverview
To view an example set of alerts that you can use with Service Telemetry Framework (STF) seehttpsgithubcominfrawatchservice-telemetry-operatortreemasterdeployalerts
421 Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications If the rule condition returns an emptyresult set the condition is false Otherwise the rule is true and it triggers an alert
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Create a PrometheusRule object that contains the alert rule The Prometheus Operatorloads the rule into Prometheus
4 To verify that the rules have been loaded into Prometheus by the Operator create a podwith access to curl
5 Run curl to access the prometheus-operated service to return the rules loaded intomemory
oc project service-telemetry
oc apply -f - ltltEOFapiVersion monitoringcoreoscomv1kind PrometheusRulemetadata creationTimestamp null labels prometheus stf-default role alert-rules name prometheus-alarm-rules namespace service-telemetryspec groups - name openstackrules rules - alert Metric Listener down expr collectd_qpid_router_status lt 1 To change the rule edit the value of the expr parameterEOF
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl prometheus-operated9090apiv1rulesstatussuccessdatagroups
Red Hat OpenStack Platform 160 Service Telemetry Framework
30
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
6 To verify that the output shows the rules loaded into the PrometheusRule object forexample the output contains the defined openstackrules exit from the pod
7 Clean up the environment by deleting the curl pod
Additional resourcesFor more information on alerting see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
422 Configuring custom alerts
You can add custom alerts to the PrometheusRule object that you created in Section 421ldquoCreating an alert rule in Prometheusrdquo
Procedure
1 Use the oc edit command
2 Edit the PrometheusRules manifest
3 Save and close
Additional resourcesFor more information about configuring alerting rules seehttpsprometheusiodocsprometheuslatestconfigurationalerting_rules
For more information about PrometheusRules objects seehttpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
423 Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system such as email IRC or other notificationchannel The Prometheus Operator manages the Alertmanager configuration as anRed Hat OpenShift Container Platform (OCP) secret STF by default deploys a basic configurationthat results in no receivers
[nameopenstackrulesfileetcprometheusrulesprometheus-stf-default-rulefiles-0service-telemetry-prometheus-alarm-rulesyamlrules[nameMetric Listener downquerycollectd_qpid_router_status u003c 1duration0labelsannotationsalerts[]healthoktypealerting]interval30]
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
oc edit prometheusrules prometheus-alarm-rules
alertmanageryaml |- global resolve_timeout 5m route
CHAPTER 4 ADVANCED FEATURES
31
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
To deploy a custom Alertmanager route with STF an alertmanagerConfigManifest parameter mustbe passed to the Service Telemetry Operator that results in an updated secret managed by thePrometheus Operator For more information see Section 412 ldquoOverriding a managed manifestrdquo
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object for your STF deployment
4 Add a new parameter alertmanagerConfigManifest and the Secret object contents todefine the alertmanageryaml configuration for Alertmanager
NOTE
This loads the default template that is already managed by ServiceTelemetry Operator To validate the changes are populating correctlychange a value return the alertmanager-stf-default secret and verify thatthe new value is loaded into memory for example changing the value globalresolve_timeout from 5m to 10m
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
oc project service-telemetry
oc edit servicetelemetry stf-default
apiVersion infrawatchv1alpha1kind ServiceTelemetrymetadata name stf-default namespace service-telemetryspec metricsEnabled true alertmanagerConfigManifest | apiVersion v1 kind Secret metadata name alertmanager-stf-default namespace service-telemetry type Opaque stringData alertmanageryaml |- global resolve_timeout 10m route
Red Hat OpenStack Platform 160 Service Telemetry Framework
32
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
5 Verify that the configuration was applied to the secret
6 To verify the configuration has been loaded into Alertmanager create a pod with access to curl
7 Run curl against the alertmanager-operated service to retrieve the status and configYAMLcontents and review the supplied configuration matches the configuration loaded intoAlertmanager
8 Verify that the configYAML field contains the expected changes Exit from the pod
9 To clean up the environment delete the curl pod
group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver null receivers - name null
$ oc get secret alertmanager-stf-default -o go-template=index data alertmanageryaml | base64decode
global resolve_timeout 10mroute group_by [job] group_wait 30s group_interval 5m repeat_interval 12h receiver nullreceivers- name null
oc run curl --generator=run-podv1 --image=radialbusyboxpluscurl -i --tty
[ rootcurl ]$ curl alertmanager-operated9093apiv1status
statussuccessdataconfigYAMLglobaln resolve_timeout 10mn http_config n smtp_hello localhostn smtp_require_tls truen pagerduty_url httpseventspagerdutycomv2enqueuen hipchat_api_url httpsapihipchatcomn opsgenie_api_url httpsapiopsgeniecomn wechat_api_url httpsqyapiweixinqqcomcgi-binn victorops_api_url httpsalertvictoropscomintegrationsgeneric20131114alertnrouten receiver nulln group_byn - jobn group_wait 30sn group_interval 5mn repeat_interval 12hnreceiversn- name nullntemplates []n
[ rootcurl ]$ exit
$ oc delete pod curl
pod curl deleted
CHAPTER 4 ADVANCED FEATURES
33
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Additional resourcesFor more information about the Red Hat OpenShift Container Platform secret and the Prometheusoperator see httpsgithubcomcoreosprometheus-operatorblobmasterDocumentationuser-guidesalertingmd
43 HIGH AVAILABILITY
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover fromfailures in its component services Although Red Hat OpenShift Container Platform (OCP) restartsa failed pod if nodes are available to schedule the workload this recovery process might take morethan one minute during which time events and metrics are lost A high availability configurationincludes multiple copies of STF components reducing recovery time to approximately 2 secondsTo protect against failure of an OCP node deploy STF to an OCP cluster with three or more nodes
NOTE
STF is not yet a fully fault tolerant system Delivery of metrics and events during therecovery period is not guaranteed
Enabling high availability has the following effects
Two AMQ Interconnect pods run instead of the default 1
Three ElasticSearch pods run instead of the default 1
Recovery time from a lost pod in either of these services reduces to approximately 2seconds
431 Configuring high availability
To configure STF for high availability add highAvailabilityEnabled true to the ServiceTelemetryobject in OCP You can this set this parameter at installation time or if you already deployed STFcomplete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Use the oc command to edit the ServiceTelemetry object
4 Add highAvailabilityEnabled true to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry
spec eventsEnabled true metricsEnabled true highAvailabilityEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
34
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
44 DASHBOARDS
Use third-party application Grafana to visualize system-level metrics gathered by collectd for eachindividual host node For more information about configuring collectd see Section 33ldquoConfiguring Red Hat OpenStack Platform overcloud for Service Telemetry Frameworkrdquo
441 Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you mustdeploy the Grafana Operator from OperatorHubio
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Clone the dashboard repository
4 Deploy the Grafana operator
5 To verify that the operator launched successfully run the oc get csv command If the valueof the PHASE column is Succeeded the operator launched successfully
6 Launch a Grafana instance
7 Verify that the Grafana instance deployed
8 Create the datasource and dashboard resources
oc project service-telemetry
git clone httpsgithubcominfrawatchdashboardscd dashboards
oc create -f deploysubscriptionyaml
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASEgrafana-operatorv320 Grafana Operator 320 Succeeded
$ oc create -f deploygrafanayaml
$ oc get pod -l app=grafana
NAME READY STATUS RESTARTS AGEgrafana-deployment-7fc7848b56-sbkhv 11 Running 0 1m
CHAPTER 4 ADVANCED FEATURES
35
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
9 Verify that the resources installed correctly
10 Navigate to httpsltgrafana-route-addressgt in a web browser Use the oc get routescommand to retrieve the Grafana route address
11 To view the dashboard click Dashboards and Manage
Additional resources
For more information about enabling the OperatorHubio catalog source see Section 235ldquoEnabling the OperatorHubio Community Catalog Sourcerdquo
4411 Viewing and editing queries
Procedure
1 Log in to Red Hat OpenShift Container Platform To view and edit queries log in as the admin user
2 Change to the service-telemetry namespace
3 To retrieve the default username and password describe the Grafana object using oc describe
442 The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time Select a node from theupper left corner of the dashboard
4421 Top panels
Title Unit Description
oc create -f deploydatasourceyaml -f deployrhos-dashboardyaml
$ oc get grafanadashboards
NAME AGErhos-dashboard 7d21h
$ oc get grafanadatasourcesNAME AGEservice-telemetry-grafanadatasource 1m
oc get routes
oc project service-telemetry
oc describe grafana service-telemetry-grafana
Red Hat OpenStack Platform 160 Service Telemetry Framework
36
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Current Global Alerts - Current alerts fired byPrometheus
Recent Global Alerts - Recently fired alerts in 5m timesteps
Status Panel - Node status up downunavailable
Uptime smhdMY Total operational time of node
CPU Cores cores Total number of cores
Memory bytes Total memory
Disk Size bytes Total storage size
Processes processes Total number of processes listedby type
Load Average processes Load average represents theaverage number of running anduninterruptible processes residingin the kernel execution queue
4422 Networking panels
Panels that display the network interfaces of the node
Panel Unit Description
Physical Interfaces Ingress Errors errors Total errors with incoming data
Physical Interfaces Egress Errors errors Total errors with outgoing data
Physical Interfaces Ingress ErrorRates
errorss Rate of incoming data errors
Physical Interfaces egress ErrorRates
errorss Rate of outgoing data errors
Physical Interfaces PacketsIngress pps Incoming packets persecond
Physical Interfaces PacketsEgress
pps
Outgoing packets per second Physical Interfaces Data Ingress bytess
CHAPTER 4 ADVANCED FEATURES
37
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Incoming data rates Physical Interfaces Data Egress bytess
Outgoing data rates Physical Interfaces Drop RateIngress
pps
Incoming packets drop rate Physical Interfaces Drop RateEgress
pps
4423 CPU panels
Panels that display CPU usage of the node
Panel Unit Description
Current CPU Usage percent Instantaneous usage at the timeof the last query
Aggregate CPU Usage percent Average non-idle CPU activity ofall cores on a node
Aggr CPU Usage by Type percent Shows time spent for each type ofthread averaged across all cores
4424 Memory panels
Panels that display memory usage on the node
Panel Unit Description
Memory Used percent Amount of memory being used attime of last query
Huge Pages Used hugepages Number of hugepages beingused
Memory
4425 Diskfile system
Panels that display space used on disk
Panel Unit Description Notes
Disk Space Usage percent Total disk use at time oflast query
Red Hat OpenStack Platform 160 Service Telemetry Framework
38
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Inode Usage percent Total inode use at timeof last query
Aggregate Disk SpaceUsage
bytes Total disk space usedand reserved
Because this query relieson the df plugintemporary file systemsthat do not necessarilyuse disk space areincluded in the resultsThe query tries to filterout most of these but itmight not be exhaustive
Disk Traffic bytess Shows rates for bothreading and writing
Disk Load percent Approximatepercentage of total diskbandwidth being usedThe weighted IO timeseries includes thebacklog that might beaccumulating For moreinformation see thecollectd disk plugin docs
Operationss opss Operations done persecond
Average IO OperationTime
seconds Average time each IOoperation took tocomplete This averageis not accurate see thecollectd disk plugin docs
Panel Unit Description Notes
45 CONFIGURING MULTIPLE CLOUDS
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance ofService Telemetry Framework (STF)
1 Plan the AMQP address prefixes that you want to use for each cloud For more informationsee Section 451 ldquoPlanning AMQP address prefixesrdquo
2 Deploy metrics and events consumer Smart Gateways for each cloud to listen on thecorresponding address prefixes For more information see Section 452 ldquoDeploying SmartGatewaysrdquo
3 Configure each cloud to send its metrics and events to STF on the correct address Formore information see Section 453 ldquoCreating the OpenStack environment filerdquo
CHAPTER 4 ADVANCED FEATURES
39
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
Figure 41 Two Red Hat OpenStack Platform clouds connect to STF
451 Planning AMQP address prefixes
By default Red Hat OpenStack Platform nodes get data through two data collectors collectd andCeilometer These components send telemetry data or notifications to the respective AMQPaddresses for example collectdtelemetry where STF Smart Gateways listen on those addressesfor monitoring data
To support multiple clouds and to identify which cloud generated the monitoring data configureeach cloud to send data to a unique address Prefix a cloud identifier to the second part of theaddress The following list shows some example addresses and identifiers
collectdcloud1-telemetry
collectdcloud1-notify
anycastceilometercloud1-eventsample
collectdcloud2-telemetry
collectdcloud2-notify
Red Hat OpenStack Platform 160 Service Telemetry Framework
40
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
anycastceilometercloud2-eventsample
collectdus-east-1-telemetry
collectdus-west-3-telemetry
452 Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud one forcollectd metrics one for collectd events and one for Ceilometer events Configure each of theSmart Gateways to listen on the AMQP address that you define for the corresponding cloud
When you deploy STF for the first time Smart Gateway manifests are created that define the initialSmart Gateways for a single cloud When deploying Smart Gateways for multiple cloud supportyou deploy multiple Smart Gateways for each of the data collection types that handle the metricsand the events data for each cloud The initial Smart Gateways act as a template to createadditional Smart Gateways along with any authentication information required to connect to thedata stores
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
oc project service-telemetry
3 Use the initially deployed Smart Gateways as a template for additional Smart GatewaysList the currently deployed Smart Gateways with the oc get smartgateways command Forexample if you deployed STF with metricsEnabled true and eventsEnabled true thefollowing Smart Gateways are displayed in the output
$ oc get smartgateways
NAME AGEstf-default-ceilometer-notification 14dstf-default-collectd-notification 14dstf-default-collectd-telemetry 14d
4 Retrieve the manifests for each Smart Gateway and store the contents in a temporary filewhich you can modify later and use to create the new set of Smart Gateways
truncate --size 0 tmpcloud1-smartgatewaysyaml ampamp for sg in $(oc get smartgateways -oname)do echo --- gtgt tmpcloud1-smartgatewaysyaml oc get $sg -oyaml --export gtgt tmpcloud1-smartgatewaysyamldone
5 Modify the Smart Gateway manifest in the tmpcloud1-smartgatewaysyaml file Adjustthe metadataname and specamqpUrl fields to include the cloud identifier from yourschema For more information see ] To view example Smart Gateway manifests seeltltexample-manifests_advanced-features[
6 Deploy your new Smart Gateways
CHAPTER 4 ADVANCED FEATURES
41
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
oc apply -f tmpcloud1-smartgatewaysyaml
7 Verify that each Smart Gateway is running This can take several minutes depending on thenumber of Smart Gateways
oc get po -l app=smart-gateway
4521 Example manifests
IMPORTANT
The content in the following examples might be different to the file content in yourdeployment Copy the manifests in your deployment
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that youwant to use for your clouds For more information see Section 451 ldquoPlanning AMQP addressprefixesrdquo
NOTE
Your output may have some additional metadata parameters that you can removefrom the manifests you that load into OCP
apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-ceilometer-notification-cloud1 1spec amqpDataSource ceilometer amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672anycastceilometercloud1-eventsample 2 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-notification-cloud1 3spec amqpDataSource collectd amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-
Red Hat OpenStack Platform 160 Service Telemetry Framework
42
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
1
2
3
4
5
6
Name for Ceilometer notifications for cloud1
AMQP Address for Ceilometer notifications for cloud1
Name for collectd telemetry for cloud1
AMQP Address for collectd telemetry for cloud1
Name for collectd notifications for cloud1
AMQP Address for collectd notifications for cloud1
453 Creating the OpenStack environment file
To label traffic according to the cloud of origin you must create a configuration with cloud-specificinstance names Create an stf-connectorsyaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefixscheme For more information see Section 451 ldquoPlanning AMQP address prefixesrdquo
notify 4 debug false elasticPass fkzfhghw elasticUrl httpselasticsearch-es-httpservice-telemetrysvcclusterlocal9200 elasticUser elastic resetIndex false serviceType events size 1 tlsCaCert configcertscacrt tlsClientCert configcertstlscrt tlsClientKey configcertstlskey tlsServerName elasticsearch-es-httpservice-telemetrysvcclusterlocal useBasicAuth true useTls true---apiVersion smartgatewayinfrawatchv2alpha1kind SmartGatewaymetadata name stf-default-collectd-telemetry-cloud1 5spec amqpUrl stf-default-interconnectservice-telemetrysvcclusterlocal5672collectdcloud1-telemetry 6 debug false prefetch 15000 serviceType metrics size 1 useTimestamp true
CHAPTER 4 ADVANCED FEATURES
43
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
WARNING
Remove enable-stfyaml and ceilometer-write-qdryaml environment filereferences from your overcloud deployment This configuration is redundantand results in duplicate information being sent from each cloud node
Procedure
1 Create the stf-connectorsyaml file and modify it to match the AMQP address that youwant for this cloud deployment
resource_registry OSTripleOServicesCollectd usrshareopenstack-tripleo-heat-templatesdeploymentmetricscollectd-container-puppetyaml OSTripleOServicesMetricsQdr usrshareopenstack-tripleo-heat-templatesdeploymentmetricsqdr-container-puppetyaml OSTripleOServicesCeilometerAgentCentral usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-central-container-puppetyaml OSTripleOServicesCeilometerAgentNotification usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-notification-container-puppetyaml OSTripleOServicesCeilometerAgentIpmi usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-ipmi-container-puppetyaml OSTripleOServicesComputeCeilometerAgent usrshareopenstack-tripleo-heat-templatesdeploymentceilometerceilometer-agent-compute-container-puppetyaml OSTripleOServicesRedis usrshareopenstack-tripleo-heat-templatesdeploymentdatabaseredis-pacemaker-puppetyaml
parameter_defaults EnableSTF true
EventPipelinePublishers [] CeilometerEnablePanko false CeilometerQdrPublishEvents true CeilometerQdrEventsConfig driver amqp topic cloud1-event 1
CollectdConnectionType amqp1 CollectdAmqpInterval 5 CollectdDefaultPollingInterval 5
CollectdAmqpInstances cloud1-notify 2 notify true format JSON presettle false cloud1-telemetry 3 format JSON presettle true
MetricsQdrAddresses
Red Hat OpenStack Platform 160 Service Telemetry Framework
44
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
+ lt1gt Define the topic for Ceilometer events This value is the address format of anycastceilometercloud1-eventsample lt2gt Define the topic for collectd events This value is theformat of collectdcloud1-notify lt3gt Define the topic for collectd metrics This value is the formatof collectdcloud1-telemetry lt4gt Adjust the MetricsQdrConnectors host to the address of the STFroute
1 Ensure that the naming convention in the stf-connectorsyaml file aligns with the specamqpUrl field in the Smart Gateway configuration For example configure the CeilometerQdrEventsConfigtopic field to a value of cloud1-event
2 Save the file in a directory for custom environment files for example homestackcustom_templates
3 Source the authentication file
4 Include the stf-connectorsyaml file in the overcloud deployment command along with anyother environment files relevant to your environment
Additional resourcesFor information about validating the deployment see Section 333 ldquoValidating client-sideinstallationrdquo
454 Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it wasscraped from You can use this label to query data from a specific cloud
To query data from a specific cloud use a Prometheus promql query that matches the associatedservice label for example collectd_uptimeservice=stf-default-collectd-telemetry-cloud1-smartgateway
- prefix collectd distribution multicast - prefix anycastceilometer distribution multicast
MetricsQdrSSLProfiles - name sslProfile
MetricsQdrConnectors - host stf-default-interconnect-5671-service-telemetryappsinfrawatch 4 port 443 role edge verifyHostname false sslProfile sslProfile
[stackundercloud-0 ~]$ source stackrc
(undercloud) [stackundercloud-0 ~]$
(undercloud) [stackundercloud-0 ~]$ openstack overcloud deploy --templates usrshareopenstack-tripleo-heat-templates -e homestackcustom_templatesstf-connectorsyaml
CHAPTER 4 ADVANCED FEATURES
45
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
46 EPHEMERAL STORAGE
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storingdata in your Red Hat OpenShift Container Platform (OCP) cluster Ephemeral storage is notrecommended in a production environment due to the volatility of the data in the platform whenoperating correctly and as designed For example restarting a pod or rescheduling the workload toanother node results in the loss of any local data written since the pod started
If you enable ephemeral storage in STF the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests
461 Configuring ephemeral storage
To configure STF for ephemeral storage add storageEphemeralEnabled true to theServiceTelemetry object in OCP You can add storageEphemeralEnabled true at installation timeor if you already deployed STF complete the following steps
Procedure
1 Log in to Red Hat OpenShift Container Platform
2 Change to the service-telemetry namespace
3 Edit the ServiceTelemetry object
4 Add the storageEphemeralEnabled true parameter to the spec section
5 Save your changes and close the object
oc project service-telemetry
$ oc edit ServiceTelemetry stf-default
spec eventsEnabled true metricsEnabled true storageEphemeralEnabled true
Red Hat OpenStack Platform 160 Service Telemetry Framework
46
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
APPENDIX A COLLECTD PLUG-INSThis section contains a complete list of collectd plug-ins and configurations forRed Hat OpenStack Platform 160
collectd-aggregation
collectdpluginaggregationaggregators
collectdpluginaggregationinterval
collectd-amqp1
collectd-apache
collectdpluginapacheinstances (ex localhost rArr url rArr httplocalhostmod_statusauto)
collectdpluginapacheinterval
collectd-apcups
collectd-battery
collectdpluginbatteryvalues_percentage
collectdpluginbatteryreport_degraded
collectdpluginbatteryquery_state_fs
collectdpluginbatteryinterval
collectd-ceph
collectdplugincephdaemons
collectdplugincephlongrunavglatency
collectdplugincephconvertspecialmetrictypes
collectd-cgroups
collectdplugincgroupsignore_selected
collectdplugincgroupsinterval
collectd-conntrack
None
collectd-contextswitch
collectdplugincontextswitchinterval
collectd-cpu
APPENDIX A COLLECTD PLUG-INS
47
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdplugincpureportbystate
collectdplugincpureportbycpu
collectdplugincpuvaluespercentage
collectdplugincpureportnumcpu
collectdplugincpureportgueststate
collectdplugincpusubtractgueststate
collectdplugincpuinterval
collectd-cpufreq
None
collectd-cpusleep
collectd-csv
collectdplugincsvdatadir
collectdplugincsvstorerates
collectdplugincsvinterval
collectd-df
collectdplugindfdevices
collectdplugindffstypes
collectdplugindfignoreselected
collectdplugindfmountpoints
collectdplugindfreportbydevice
collectdplugindfreportinodes
collectdplugindfreportreserved
collectdplugindfvaluesabsolute
collectdplugindfvaluespercentage
collectdplugindfinterval
collectd-disk
collectdplugindiskdisks
collectdplugindiskignoreselected
collectdplugindiskudevnameattr
Red Hat OpenStack Platform 160 Service Telemetry Framework
48
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdplugindiskinterval
collectd-entropy
collectdpluginentropyinterval
collectd-ethstat
collectdpluginethstatinterfaces
collectdpluginethstatmaps
collectdpluginethstatmappedonly
collectdpluginethstatinterval
collectd-exec
collectdpluginexeccommands
collectdpluginexeccommands_defaults
collectdpluginexecglobals
collectdpluginexecinterval
collectd-fhcount
collectdpluginfhcountvaluesabsolute
collectdpluginfhcountvaluespercentage
collectdpluginfhcountinterval
collectd-filecount
collectdpluginfilecountdirectories
collectdpluginfilecountinterval
collectd-fscache
None
collectd-hddtemp
collectdpluginhddtemphost
collectdpluginhddtempport
collectdpluginhddtempinterval
collectd-hugepages
collectdpluginhugepagesreport_per_node_hp
APPENDIX A COLLECTD PLUG-INS
49
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdpluginhugepagesreport_root_hp
collectdpluginhugepagesvalues_pages
collectdpluginhugepagesvalues_bytes
collectdpluginhugepagesvalues_percentage
collectdpluginhugepagesinterval
collectd-intel_rdt
collectd-interface
collectdplugininterfaceinterfaces
collectdplugininterfaceignoreselected
collectdplugininterfacereportinactive
Collectdplugininterfaceinterval
collectd-ipc
None
collectd-ipmi
collectdpluginipmiignore_selected
collectdpluginipminotify_sensor_add
collectdpluginipminotify_sensor_remove
collectdpluginipminotify_sensor_not_present
collectdpluginipmisensors
collectdpluginipmiinterval
collectd-irq
collectdpluginirqirqs
collectdpluginirqignoreselected
collectdpluginirqinterval
collectd-load
collectdpluginloadreport_relative
collectdpluginloadinterval
collectd-logfile
collectdpluginlogfilelog_level
Red Hat OpenStack Platform 160 Service Telemetry Framework
50
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdpluginlogfilelog_level
collectdpluginlogfilelog_file
collectdpluginlogfilelog_timestamp
collectdpluginlogfileprint_severity
collectdpluginlogfileinterval
collectd-madwifi
collectd-mbmon
collectd-md
collectd-memcached
collectdpluginmemcachedinstances
collectdpluginmemcachedinterval
collectd-memory
collectdpluginmemoryvaluesabsolute
collectdpluginmemoryvaluespercentage
collectdpluginmemoryinterval collectd-multimeter
collectd-multimeter
collectd-mysql
collectdpluginmysqlinterval
collectd-netlink
collectdpluginnetlinkinterfaces
collectdpluginnetlinkverboseinterfaces
collectdpluginnetlinkqdiscs
collectdpluginnetlinkclasses
collectdpluginnetlinkfilters
collectdpluginnetlinkignoreselected
collectdpluginnetlinkinterval
collectd-network
collectdpluginnetworktimetolive
collectdpluginnetworkmaxpacketsize
APPENDIX A COLLECTD PLUG-INS
51
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdpluginnetworkforward
collectdpluginnetworkreportstats
collectdpluginnetworklisteners
collectdpluginnetworkservers
collectdpluginnetworkinterval
collectd-nfs
collectdpluginnfsinterval
collectd-ntpd
collectdpluginntpdhost
collectdpluginntpdport
collectdpluginntpdreverselookups
collectdpluginntpdincludeunitid
collectdpluginntpdinterval
collectd-numa
None
collectd-olsrd
collectd-openvpn
collectdpluginopenvpnstatusfile
collectdpluginopenvpnimprovednamingschema
collectdpluginopenvpncollectcompression
collectdpluginopenvpncollectindividualusers
collectdpluginopenvpncollectusercount
collectdpluginopenvpninterval
collectd-ovs_events
collectdpluginovs_eventsaddress
collectdpluginovs_eventsdispatch
collectdpluginovs_eventsinterfaces
collectdpluginovs_eventssend_notification
Red Hat OpenStack Platform 160 Service Telemetry Framework
52
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdpluginovs_events$port
collectdpluginovs_eventssocket
collectd-ovs_stats
collectdpluginovs_statsaddress
collectdpluginovs_statsbridges
collectdpluginovs_statsport
collectdpluginovs_statssocket
collectd-ping
collectdpluginpinghosts
collectdpluginpingtimeout
collectdpluginpingttl
collectdpluginpingsource_address
collectdpluginpingdevice
collectdpluginpingmax_missed
collectdpluginpingsize
collectdpluginpinginterval
collectd-powerdns
collectdpluginpowerdnsinterval
collectdpluginpowerdnsservers
collectdpluginpowerdnsrecursors
collectdpluginpowerdnslocal_socket
collectdpluginpowerdnsinterval
collectd-processes
collectdpluginprocessesprocesses
collectdpluginprocessesprocess_matches
collectdpluginprocessescollect_context_switch
collectdpluginprocessescollect_file_descriptor
collectdpluginprocessescollect_memory_maps
collectdpluginpowerdnsinterval
APPENDIX A COLLECTD PLUG-INS
53
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectd-protocols
collectdpluginprotocolsignoreselected
collectdpluginprotocolsvalues
collectd-python
collectd-serial
collectd-smart
collectdpluginsmartdisks
collectdpluginsmartignoreselected
collectdpluginsmartinterval
collectd-snmp_agent
collectd-statsd
collectdpluginstatsdhost
collectdpluginstatsdport
collectdpluginstatsddeletecounters
collectdpluginstatsddeletetimers
collectdpluginstatsddeletegauges
collectdpluginstatsddeletesets
collectdpluginstatsdcountersum
collectdpluginstatsdtimerpercentile
collectdpluginstatsdtimerlower
collectdpluginstatsdtimerupper
collectdpluginstatsdtimersum
collectdpluginstatsdtimercount
collectdpluginstatsdinterval
collectd-swap
collectdpluginswapreportbydevice
collectdpluginswapreportbytes
collectdpluginswapvaluesabsolute
Red Hat OpenStack Platform 160 Service Telemetry Framework
54
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdpluginswapvaluespercentage
collectdpluginswapreportio
collectdpluginswapinterval
collectd-syslog
collectdpluginsysloglog_level
collectdpluginsyslognotify_level
collectdpluginsysloginterval
collectd-table
collectdplugintabletables
collectdplugintableinterval
collectd-tail
collectdplugintailfiles
collectdplugintailinterval
collectd-tail_csv
collectdplugintail_csvmetrics
collectdplugintail_csvfiles
collectd-tcpconns
collectdplugintcpconnslocalports
collectdplugintcpconnsremoteports
collectdplugintcpconnslistening
collectdplugintcpconnsallportssummary
collectdplugintcpconnsinterval
collectd-ted
collectd-thermal
collectdpluginthermaldevices
collectdpluginthermalignoreselected
collectdpluginthermalinterval
collectd-threshold
APPENDIX A COLLECTD PLUG-INS
55
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdpluginthresholdtypes
collectdpluginthresholdplugins
collectdpluginthresholdhosts
collectdpluginthresholdinterval
collectd-turbostat
collectdpluginturbostatcore_c_states
collectdpluginturbostatpackage_c_states
collectdpluginturbostatsystem_management_interrupt
collectdpluginturbostatdigital_temperature_sensor
collectdpluginturbostattcc_activation_temp
collectdpluginturbostatrunning_average_power_limit
collectdpluginturbostatlogical_core_names
collectd-unixsock
collectd-uptime
collectdpluginuptimeinterval
collectd-users
collectdpluginusersinterval
collectd-uuid
collectdpluginuuiduuid_file
collectdpluginuuidinterval
collectd-virt
collectdpluginvirtconnection
collectdpluginvirtrefresh_interval
collectdpluginvirtdomain
collectdpluginvirtblock_device
collectdpluginvirtinterface_device
collectdpluginvirtignore_selected
collectdpluginvirthostname_format
Red Hat OpenStack Platform 160 Service Telemetry Framework
56
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-
collectdpluginvirtinterface_format
collectdpluginvirtextra_stats
collectdpluginvirtinterval
collectd-vmem
collectdpluginvmemverbose
collectdpluginvmeminterval
collectd-vserver
collectd-wireless
collectd-write_graphite
collectdpluginwrite_graphitecarbons
collectdpluginwrite_graphitecarbon_defaults
collectdpluginwrite_graphiteglobals
collectd-write_kafka
collectdpluginwrite_kafkakafka_host
collectdpluginwrite_kafkakafka_port
collectdpluginwrite_kafkakafka_hosts
collectdpluginwrite_kafkatopics
collectd-write_log
collectdpluginwrite_logformat
collectd-zfs_arc
None
APPENDIX A COLLECTD PLUG-INS
57
- Table of Contents
- CHAPTER 1 INTRODUCTION TO SERVICE TELEMETRY FRAMEWORK
-
- 11 SERVICE TELEMETRY FRAMEWORK ARCHITECTURE
- 12 INSTALLATION SIZE
-
- CHAPTER 2 INSTALLING THE CORE COMPONENTS OF SERVICE TELEMETRY FRAMEWORK
-
- 21 THE CORE COMPONENTS OF STF
- 22 PREPARING YOUR OCP ENVIRONMENT FOR STF
-
- 221 Persistent volumes
-
- 2211 Using ephemeral storage
-
- 222 Resource allocation
- 223 Node tuning operator
-
- 23 DEPLOYING STF TO THE OCP ENVIRONMENT
-
- 231 Deploying STF to the OCP environment with ElasticSearch
- 232 Deploying STF to the OCP environment without ElasticSearch
- 233 Creating a namespace
- 234 Creating an OperatorGroup
- 235 Enabling the OperatorHubio Community Catalog Source
- 236 Enabling Red Hat STF Operator Source
- 237 Subscribing to the AMQ Certificate Manager Operator
- 238 Subscribing to the Elastic Cloud on Kubernetes Operator
- 239 Subscribing to the Service Telemetry Operator
- 2310 Creating a ServiceTelemetry object in OCP
-
- 24 REMOVING STF FROM THE OCP ENVIRONMENT
-
- 241 Deleting the namespace
- 242 Removing the OperatorSource
-
- CHAPTER 3 COMPLETING THE SERVICE TELEMETRY FRAMEWORK CONFIGURATION
-
- 31 CONNECTING RED HAT OPENSTACK PLATFORM TO SERVICE TELEMETRY FRAMEWORK
- 32 DEPLOYING TO NON-STANDARD NETWORK TOPOLOGIES
- 33 CONFIGURING RED HAT OPENSTACK PLATFORM OVERCLOUD FOR SERVICE TELEMETRY FRAMEWORK
-
- 331 Retrieving the AMQ Interconnect route address
- 332 Configuring the STF connection for the overcloud
- 333 Validating client-side installation
-
- CHAPTER 4 ADVANCED FEATURES
-
- 41 CUSTOMIZING THE DEPLOYMENT
-
- 411 Manifest override parameters
- 412 Overriding a managed manifest
-
- Procedure
-
- 42 ALERTS
-
- Additional resources
- 421 Creating an alert rule in Prometheus
-
- Procedure
- Additional resources
-
- 422 Configuring custom alerts
-
- Procedure
- Additional resources
-
- 423 Creating an alert route in Alertmanager
-
- Procedure
- Additional resources
-
- 43 HIGH AVAILABILITY
-
- 431 Configuring high availability
-
- Procedure
-
- 44 DASHBOARDS
-
- 441 Setting up Grafana to host the dashboard
-
- 4411 Viewing and editing queries
-
- 442 The Grafana infrastructure dashboard
-
- 4421 Top panels
- 4422 Networking panels
- 4423 CPU panels
- 4424 Memory panels
- 4425 Diskfile system
-
- 45 CONFIGURING MULTIPLE CLOUDS
-
- 451 Planning AMQP address prefixes
- 452 Deploying Smart Gateways
-
- Procedure
- 4521 Example manifests
-
- 453 Creating the OpenStack environment file
-
- Procedure
- Additional resources
-
- 454 Querying metrics data from multiple clouds
-
- 46 EPHEMERAL STORAGE
-
- 461 Configuring ephemeral storage
-
- APPENDIX A COLLECTD PLUG-INS
-