Private, Public, Hybrid Unique Challenges with OpenStackGathering Info CMDB
Ryszard Chojnacki, an OPS CMDB blueprint worker27 October, 2015
Covering what?• Define a scenario for collecting data• Show actual payload headers used in one application• Set the stage for a CMDB blueprints direction:
ETL vs. Federation approaches
Extract, Transform & LoadSend data as appropriate
• Allows for– Complex and low cost
queries– Can be built to
accommodate loss– History, what changed last
week
FederationAccess the sources of information in “real time”
• Allows for– What is the situation NOW!– Works well for OpenStack
APIs depending on use-case
Set the scene
Imagine this ScenarioYou have hardware, data and applications spread over multiple locations – how can you aggregate meta data into 1 place
Local source:provisioning as example
{ “fqdn": “compute-0001.env1.adomain.com", “serial": “USE1234567", “os_vendor": “Ubuntu", “os_release": “12.04", “role": “compute-hypervisor",}
Suppose provision systems are created such that there is 1 for each environment
• Each system has a limited scope
• Each system must be uniquely identifiable to permit data aggregation
1
2
3
Global source:asset management
Payload type rack_info:{ “rack_id”: “r0099”, “datacenter”: “Frankfurt”, “tile”: “0404”}
Payload type rack_contents:{ “in_rack”: “r00099”, “serial”: “USE1234567”, “u”: 22, “pid”: “qy799a”}
Suppose that there is a single asset management tool that covers al environments
• Scope is global
• Unique ID still employed
• The example has more than one type of data for:
• Each rack – rack_info• Each asset – rack_contents
1
2
3
Snapshot Header
Message formatspayload
{"payload": { . . . }}
Separate logically, by encapsulating data into a payload document
For example, put here:• Provisioning data• Rack data• Asset data
Message formatsversion
{ "version": { "major": 1 // provider extension possible here; minor, tiny, sha1, … }, "payload": { . . . }}
“Schema” version for the payload
• Major versions same indicates no incompatible changes made to schema
• Where compatible snapshot live process will occur
Note: Documents don’t have schemas, but there must be some required, plus optional key/value pairs, so that consumers of the data can rely programmatically on it
Message formatswhen was that?
{ "batch_ts": <epoch_time>, "record_ts": <epoch_time>, “batch_isodate”: “2015-01-16 16:07:21.503680”, . . . "version": { "major": 1 }, "payload": { . . . }}
Useful for understanding how old is the data I’m seeing
• Batch timestamp must be constant for all records in the same batch
• Record is when the record was exported/message created – maybe the same as batch
• Updated, if available, is when the data was last changed in the source system
Note human readable _isodate forms, not used in processing
Message formats
{ "source": { "system": "rackreader.adomain.net" "type": "rack_info", "location": "env" }, "batch_ts": <epoch_time>, "record_ts": <epoch_time>, "import_ts": <epoch_time>, "version": { "major": 1 }, "payload": { . . . }}
Provides where the data came from and the type of data
• System: usually fqdn of source system
• Location: the scope of the system
• Type: describes the payload content, and is tied to the schema
Message formats
{ "record_id": “r00099-rackreader.adomain.net", "msg_type": "snapshot", "source": { "system": "rackreader.adomain.net" "type": "rack_info", "location": "env" }, "batch_ts": <epoch_time>, "record_ts": <epoch_time>, "import_ts": <epoch_time>, "version": { "major": 1 }, "payload": { . . . }}
Mark the content with a unique ID for that record, and how to process
• A combination of an identifier in the source system plus an fqdn makes for a very globally unique value
• This value is the primary key for all data operations on the record
• This is a “snapshot” how that is processed described shortly
Implementation
Philosophy employed• Operating at large scale expect to have issues
– Small % error X a big number = some degree of loss• Tolerant of loss• Considerate of resources• Wanted history
– Need easy access to the very latest• Need a flexible [document] schema – this is JSON
– Provider/Agent is the owner of the schema for its data– Need a way to converge; communities of practice
Snapshot versus event based updates
Example• Snapshot updates every 8h
– Larger data set but not very frequent
• Live updates as they occur– Tiny data as they occur
• Result– Minimal network utilization– Small overhead on source
• Use the combination that best suits the need
We run 2 collections
• Snapshot• Has history
• Live• Has only the latest
Snapshots update Live
Message type overview
Snapshot• Snapshot
– Defines a Snapshot record• Batch record count
– Defines how many items in a batch
– Only If sizes match update Live
• Required to know what to delete
Live• Overwrite
– Live overwrite a complete doc for a single record
• Delete– Delete from live a single
record– Never affects Snapshot
Message formatssnapshot_size
{ "msg_type": "snapshot_size", "source": { "system": "rackreader.adomain.net" "type": "rack_info", "location": "env" }, "size": 3, "batch_ts": <epoch_time>}
If the consumer receives the size of messages indicated then the update of live is possible
Any records received are always placed in the snapshot collection
Message formatsoverwrite
{ "msg_type": "overwrite", "record_id": “r00099-rackreader.adomain.net", "source": { "system": "rackreader.adomain.net" "type": "rack_contents", "location": "env" }, "version": { "major": 1 }, "record_ts": <epoch_time>, "payload": { . . . }}
• Separate the header info from the payload data
Message formatsdelete
{ "msg_type": “delete", "record_id": “r00099-rackreader.adomain.net", "source": { "system": "rackreader.adomain.net" "type": "rack_contents", "location": "env" }, "version": { "major": 1 }, "record_ts": <epoch_time>,}
• Separate the header info from the payload data
Direction
Noteworthy• If we lose an event we catch up in the batch update• If we lose a batch, data is just 1 batch cycle stale
• Several companies have arrived at this position
• Records are fairly small– rabbitMQ friendly– Easy to search in your data store
CMDB blueprint• Set the stage for a CMDB blueprints direction:
– Collect– Store– Query
• Focus on the Collection framework• Community of Practice
– Share common stuff; hopefully every expanding domain– Permit ad-hoc sources, for what you have now
Thank-you!
Message processing
A live update to “B” goes straight in here
And is later updated again by the snapshot
B
There are 2 collections of data; snapshot and live
Snapshot always keeps growing
Live only has 1 entry per record
Top Related