In Pursuit of Complete Visibility within Cloud Foundry (Cloud Foundry Summit 2014)

43
In Pursuit of Complete Visibility Enterprise BOSH & Cloud Foundry

description

Technical Track presented by Wayne E. Seguin at Stark & Wayne. When working with Cloud Foundry, AppFirst’s unique data collection provides unparalleled visibility over traditional tools. This full stack visibility spans the entire PaaS and respective services. In this talk, Wayne E. Seguin of Stark & Wayne will discuss how his team found and utilized AppFirst for PaaS to gain unprecedented visibility. Through years of experience working with PaaS operations, they combined lessons learned in operations workflows yielding a deep understanding of both the space and needs. This year we will roll out an integration enabling Cloud Foundry operators to have full scale visibility across all resources.

Transcript of In Pursuit of Complete Visibility within Cloud Foundry (Cloud Foundry Summit 2014)

  • 1. In Pursuit of Complete Visibility Enterprise BOSH & Cloud Foundry

2. HELLO EVERYONE 3. S&W 1 4. TRY CLOUD FOUNDRY First an announcement 5. http://trycf.starkandwayne.com 6. BACKGROUND 7. Our Backgrounds Years in the PaaS space focused on Operations Metrics & Data Collection Aggregation Correlation Alerting Working for/with Dr. Nic 8. THE GOAL 9. The Goal Enterprise Operations Monitoring 10. ENTERPRISE REQUIREMENTS 11. Enterprise Requirements Complete Operational Overview Cross Systems Correlation Root Cause Analysis 12. Enterprise Requirements Historical Auditing Capacity Planning Billing Systems Verification & Validation 13. Enterprise Requirements Forensics Especially in the event of systemic failure 14. Enterprise Requirements Security Who, What, When, Where, How Why => RCA 15. DATA COLLECTION REQUIREMENTS 16. Data Collection Requirements System Metrics CPU RAM Disk Network In/Out Etc 17. Data Collection Requirements Process Resource Metrics CPU & RAM utilization Files & Socket read/write tps Threads Stolen Time (Living in a virtual world) etc 18. Data Collection Requirements Logs (Obviously :p ) 19. Data Collection Requirements Polling style metrics collection NRPE JMX WPC etc 20. Data Collection Requirements Application specific metrics StatsD style / namespaced Ingestion of APM summaries 21. Data Collection Requirements Business Metrics KPIs 22. Data Collection Requirements Physical Devices Switches Routers SAN & Compute Servers (IPMI) 23. Data Collection Requirements External (3rd party) Data APIs Ex: CloudWatch => Correlate with all other data 24. SUMMARY REQUIREMENTS 25. Summary Requirements Operational Dashboards BOSH Subsystems CF Subsystems Per-Application Dashboards Business (KPI) Dashboards 26. Summary Requirements Ability to export data collected for representation via other means 27. EXPERIENCE BASED REQUIREMENTS 28. Experience Based Requirements Ability to detect and alert on File systems remounted read-only File systems mounted in place of another 29. INTEGRATION REQUIREMENTS 30. Integration Requirements Primitives allowing us to identify and link BOSH subsystems with VMs CF Subsystems with VMs Applications running in DEAs 31. HAVE CAKE & EAT IT 32. Have Cake & Eat It BOSH Services CF Applications Legacy Systems 33. Have Cake & Eat It SaaS On-Prem 34. Have Cake & Eat It Minimize CapEX & OpEX Through consolidation of Systems Allow Ops to run lean & mean This is one reason we love BOSH+CF after all!!! 35. THE APPROACH 36. Vendor Partner We have found that AppFirst fulfills and/or has it on their roadmap to fulfill all of our requirements for integration. 37. ROADMAP Roadmap towards effective and efficient operations 38. Phase I BOSH Integration We are currently working with Pivotal Web Services and AppFirst on the BOSH layer integration. 39. Phase I BOSH Integration This gives us the basic essential data blocks System Metrics Process Metrics Log Metrics 40. Phase II Targeted Alerting Alerting based on detected failure cases Zombie containers File Systems remounted Read Only File Systems remounted incorrectly Network saturation DEA Pool Saturation etc 41. Phase III Subsystems Detection Detection of BOSH/CF Subsystems for Automatic Targeted metrics/logs collection Operations Dashboards per Subsystem DEAs Message Bus Router Cloud Controller etc 42. Phase IV Applications View Aggregated process health and log view for any given application running across all DEAs 43. QUESTIONS?