CouchConf SF 2012 Lightning Talk - Operational Excellence

Laine Campbell, Owner/Principal, laine@palominodb.comCharlie Killian, Director of Engineering, charlie@palominodb.com

Scaling and Performance for Operational Excellence

Who we are

● A boutique consultancy offering custom solutions.

● An operations support team providing a combined 100+ years of experience in distributed, performant and scalable solutions.

● A team of architects, engineers and operators who have worked at some of the most trafficked sites, games and companies since 1999.

Operational Excellence

● Configuration management and documentation.● Change management.● Availability management.● Incident and problem management● Backup, recovery and business continuity.● Monitoring and Trending.

Configuration Management

● Consistent couchbase configurations.○ Guis are great, but don't meet automation needs.

● Self documenting environments.

● Incorporating your infrastructure into your application to leverage couchbase ease of scale.

● Chef, puppet, ansible or "roll your own" using the couchbase API.

Change and Release Management

● Schemaless is great, but data governance is key.

● Your code needs to build a data dictionary or confusion reigns.

● DevOps style relationships build collaboration that can overcome the wild west mentality of schemaless environments.

Availability Management

● Moxi provides availability during node failures, supporting reads and writes.

● XDCR support in Couchbase 2.0 provides availability across datacenters and regions in an active/active topology.

● Special consideration in cloud environments must take into account AZ and region failovers.

Incident and Problem Management

● While not Couchbase specific, crucial to maintaining any highly available architecture.

● Appropriate alerting, response and communication processes ensure that isolated issues don't cascade into massive failures.

● Failing hardware, networks, design issues can all cause failures that can cascade into an entire cluster being down.

● Tracking recurring problems help with a continuous improvement on meeting SLAs.

Backup and Recovery

● Define your recovery SLAs.● Track how long backups take.● Test restores and track how long they take.● Recognize all failure scenarios:

○ Node failure○ Physical data corruption○ Logical data corruption○ Audits and forensics

Backup and Recovery 1.8

● In 1.8, per node backup is supported. Replica sets are also backed-up, which can cause long, or non-completing backups.

● SQLite3 can be used as a logical dump to ease backups.

● Cluster-wide consistency can not be guaranteed.● No incremental backups available.

Backup and Recovery 2.0

● Cluster wide backups are now available, as well as incremental.

● EBS snapshots (or LVM, hardware, etc...) work well due to log-style writes to disk.

● With incremental, it is easier to meet SLAs without breaking the bank on storage.

Monitoring and Alerting

● Use logs! Centralized syslogs, splunk, custom scripts to identify and track error types and rates.

● Track your app! Latency of web pages, forms and api-calls are key indicators.

● Define key alerts, make them actionable and tied to documentation.

● Palomino builds plugins and templates to provide proper alerts that are useful and work!

Trending and Diagnostics

● Alerts aren't enough, you must track usage and internal metrics to understand trends, workloads and bottlenecks.

● Graph everything! All exposed metrics, trend health checks.

● Interleave graphs from internal metrics to external factors: Code pushes, application metrics (logins, purchases, api calls)

Care and Feeding

● Regular performance reviews.● Defragmentation.● Incorporate recovery tests into building test and dev

environments.● Scale-up/Scale-down, preferably via automated

processes.● Rolling upgrades.● Coffee, pie, beer.

Partnering with Couchbase

Providing remote Architecture, Engineering and DBA services to clients.

Vendor neutral operations and scaling expertise for Couchbase clients in need of operators.

Remote Architecture and Engineering Services

● Architecture review and recommendations ● Data modeling● Data model migration● Data migration● Cluster sizing● Tools development

DBA and Operations Services

● Infrastructure builds and management● Proactive operational support● 24x7 operational support with 30 minutes SLA● System health checks● Backup and recovery● Tuning for performance and scale● Query reviews, indexing, benchmarking● Capacity reviews

How we can help

● Support your proof of concept● Migrate you to Couchbase Server● Support your Couchbase Server clusters

Is Couchbase Server a good fit?

● Architecture review● Data model review● Recommendation on moving to Couchbase Server● Data access best practices

Migrating from a RDBMS to Couchbase Server?

● Data model migration from relational to document● Data migration from SQL Server to Couchbase

Server● Couchbase Server cluster sizing● Infrastructure builds

Do you need operational experts?

● 24x7 operational support with 30 minutes SLA● Multiple Couchbase Server 1.8 clusters● Wanted Couchbase operational experts● Escalate to Couchbase for software support

Contact Info

Laine Campbell, laine@palominodb.comCharlie Killian, charlie@palominodb.com

www.palominodb.com@palominodb on Twitter

CouchConf SF 2012 Lightning Talk - Operational Excellence

Technology

Transcript of CouchConf SF 2012 Lightning Talk - Operational Excellence

BizSpark SF Lightning Talk: "Refactoring and Test-Driven Development" by Mathias Brandewinder

CouchConf Tokyo Couchbase in Production

LIGHTNING & LIGHTNING PROTECTION - ITS … · lightning & lightning protection its heartland conference march 27, 2013 bud vansickle lightning protection institute ... lpi 175: lightning

A. HEATING SYSTEM - OFCCofcc.ohio.gov/Portals/0/Documents/Resources/OSDM/2013/2013 Ass… · Lightning Protection: $ 0.30 sf Grounding: $ 0.25 sf Other: (describe “Other” items

External lightning protection Insulated lightning ...

Introduction to Couchbase Server 2.0 - CouchConf SF - Tour and Demo

CouchConf Portland: Syncpoint

BizSpark SF Lightning Talk: "Developing For Azure" by Clayton Peddy

WE'VE GOT THE SPACE. · 2020. 6. 2. · cort 2,980 sf 4,962 sf 2,24 sf 2,42 sf 2,42 sf 3060 sf 4,72 sf 6,503 sf 4,288 sf 2,035 sf 3209 sf 2,800 sf 5,377 sf 9,20 sf 5,986 sf 3532 sf

CouchConf Israel 2013_Couchbase in the Clouds

CouchConf Israel 2013_Couchbase Tour

BizSpark SF Lightning Talk: "Phone and SMS Gateways" by Tim Milliron

CouchConf Israel Couchbase in Production 24x7

BizSpark SF Lightning Talk: "Open-Source Alternatives for .Net" by Joe Balfantz

THE VINEYARDS - LoopNet...993 sf 2,200 sf 1,100 sf 2,199 sf 1,23 sf 3,163 sf 1,760 sf 21,714 sf 22,000 sf 19,000 sf 1,200 sf 3,200 sf 1,600 sf 4,000 sf 2,000 sf 11,500 sf 19,200 sf

Suisse Romande SF DG - Lightning workshop

CouchConf Israel 2013_Couchbase at Playtika

946 - 20,469 SF · 2020. 1. 28. · Available 1.3451 AC 7,200 SF 1,689 SF 1,960 SF 1,500 SF 1,200 SF 1,314 SF 3,000 SF 18,283 SF 3,280 SF 2,427 SF 2,000 SF 1,338 SF 1,834 SF 3,564

Entity Relationships in a Document Database at CouchConf Boston

CouchConf London Developing Couchbase Part III: Advanced App Dev