CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance
-
Upload
couchbase -
Category
Technology
-
view
1.647 -
download
0
Transcript of CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance
![Page 1: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/1.jpg)
1
Couchbase Server in Production
Perry KrugSr. Solutions Architect
![Page 2: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/2.jpg)
2
Typical Couchbase production environment
Application users
Load Balancer
Application Servers
Servers
![Page 3: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/3.jpg)
3
We’ll focus on App-Couchbase interaction …
Application users
Load Balancer
Application Servers
Servers
![Page 4: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/4.jpg)
4
… at each step of the application lifecycle
Dev/Test Size Deploy Monitor Manage
![Page 5: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/5.jpg)
5
KEY CONCEPTS
![Page 6: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/6.jpg)
6
Reading, Writing and Arithmetic
Reading Data Writing Data
Server
Give medocument A
Here is document A
Application Server
A
Server
Please storedocument A
OK, I storeddocument A
Application Server
A
(We’ll save the arithmetic for the sizing section : )
![Page 7: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/7.jpg)
7
Server
Reading data
RAM
DISK
Application Server
Give me document A
A
Here is document A
If document A is in memoryreturn document A to the application
Elseadd document to read queuereader eventually loads document
from disk into memoryreturn document A to the application
A
Reading Data
![Page 8: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/8.jpg)
8
Keeping working data set in RAM is key to read performance
Your application’s working set should fit in RAM…
… or else! (because you don’t want the “else” part happening very often – it is MUCH slower than a memory read and you could have to
wait in line an indeterminate amount of time for the read to happen.)
Reading Data
![Page 9: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/9.jpg)
9
Working set ratio depends on your application
Server Server Server
Late stage social gameMany users no longer
active; few logged in at any given time.
Ad NetworkAny cookie can show up
at any time.
Business applicationUsers logged in during
the day. Day moves around the globe.
working/total set = 1working/total set = .01 working/total set = .33
Reading Data
![Page 10: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/10.jpg)
10
Server
Couchbase in operation: Writing data
RAM
DISK
Application Server
Store document A
A
OK, it is stored
If there is room for the document in RAMStore the document in RAM
ElseEject other document(s) from RAMStore the document in RAM
Add the document to the replication queueReplicator eventually transmits document
Add the document to write queueWriter eventually writes document to disk
A
Writing Data
![Page 11: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/11.jpg)
11
Server
Flow of data when writing
Writing Data
Application ServerApplication Server Application Server
Applications writing to Couchbase
Couchbase writing to disk
network
Couchbase transmitting replicas
![Page 12: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/12.jpg)
12
Server
Queues build if aggregate arrival rate exceeds drain rates
Writing Data
Application ServerApplication Server Application Server
network
Replication queue Disk write queue
![Page 13: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/13.jpg)
13
ServerServer Server
Scaling out permits matching of aggregate flow rates so queues do not grow
Application ServerApplication Server Application Server
network networknetwork
![Page 14: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/14.jpg)
14
DEVELOPMENT
Dev-Test Size Deploy Monitor Manage
![Page 15: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/15.jpg)
15
SDKs
Java Client SDK
.Net SDK
PHP SDK
Ruby SDK
Python SDK
Couchbase Java Library (spymemcached)
Java client API
User Code
Couchbase Server
CouchbaseClient cb = new CouchbaseClient(listURIs,"aBucket", "letmein");
cb.set("hello", 0, "world");cb.get("hello");
![Page 16: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/16.jpg)
16
Data
• Couchbase uses (and is completely compatible with) the memcached protocol.
• While you can use any standard memcached library, Couchbase also provides it’s own libraries for a variety of languages.
• Couchbase is document-oriented
• See http://www.couchbase.com/develop
![Page 17: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/17.jpg)
17
Farm Town Wars App Code
Ap
plic
atio
n s
erv
er
Co
uch
ba
se
Se
rve
r
Couchbase JavaClient library
Couchbase Server
11210
(“smart”) library
Farm Town Wars App Code
Ap
plic
atio
n s
erv
er
Co
uch
ba
se
Se
rve
r
Memcached Client
Moxi (Couchbase proxy)
11210
Client-side Moxi
OR8091
8091
Client Deployment
Couchbase Server
![Page 18: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/18.jpg)
18
SERVER AND CLUSTER SIZING(TIME FOR THE ARITHMETIC)
Dev-Test Size Deploy Monitor Manage
![Page 19: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/19.jpg)
19
Size Couchbase Server
Sizing == performance• Serve reads out of RAM• Enough IO for writes• Mitigate inevitable failures
Reading Data Writing Data
Server
Give medocument A
Here is document A
Application Server
A
Server
Please storedocument A
OK, I storeddocument A
Application Server
A
![Page 20: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/20.jpg)
20
How many nodes?
4 Key Factors determine number of nodes needed:
1) RAM2) Disk3) Network4) Data Distribution/Safety
Couchbase Servers
Web application server
Application user
![Page 21: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/21.jpg)
21
RAM sizing
1) RAM• Working set• Metadata• Buffer/overhead• Active+Replica(s)
Keep working set in RAM for best read performance
Server
Give medocument A
Here is document A
Application Server
A
A
A
Reading Data
![Page 22: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/22.jpg)
23
Disk sizing: Space and I/O
2) Disk• Sustained write rate• Rebalance capacity• Backups • Total dataset• Active+Replicas
I/O
Space
Please storedocument A
OK, I storeddocument A
Application Server
A
Server
A
A
Writing Data
![Page 23: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/23.jpg)
24
Network sizing
3) Network• Client traffic• Replication (writes)• Rebalancing
Reads+Writes
Replication and/or Rebalancing
![Page 24: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/24.jpg)
25
Data Distribution
4) Data Distribution / Safety (assuming one replica):• 1 node = BAD• 2 nodes = …better…• 3+ nodes = BEST!
Note: Many applications will need more than 3 nodes
Servers fail, be prepared. The more nodes, the less impact a failure will have.
![Page 25: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/25.jpg)
26
COUCHBASE CLIENT LIBRARY
Data Distribution
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Read/Write/Update
COUCHBASE CLIENT LIBRARY
Read/Write/Update
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
CLUSTER MAP CLUSTER MAP
APP SERVER 1 APP SERVER 2
COUCHBASE SERVER CLUSTER
![Page 26: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/26.jpg)
27
How many nodes? (recap)
4 Key Factors determine number of nodes needed:
1) RAM2) Disk3) Network4) Data Distribution
Couchbase Servers
Web application server
Application user
![Page 27: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/27.jpg)
28
MONITORING
Dev-Test Size Deploy Monitor Manage
![Page 28: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/28.jpg)
29Server
Key resources: RAM, Disk, Network
RAM
DISK
NETWORK
Server
RAM
DISK
Server
RAM
DISK
Application Server Application Server Application Server
![Page 29: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/29.jpg)
30
Monitoring
Once in production, heart of operations is monitoring
-RAM Usage-Disk writes queues / read activity-Network bandwidth, replication queues-Data distribution (balance, replicas)
![Page 30: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/30.jpg)
31
How do you know when your working set is not in RAM?
Server
RAM
DISK
Application Server
Give me document A
A
Here is document A
If document A is in memoryreturn document A to the application
Elseadd document to read queuereader eventually loads document
from disk into memoryreturn document A to the application
A
Cache Miss Ratio
![Page 31: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/31.jpg)
32
How do you know when you don’t have enough disk I/O?
Disk Write Queue
![Page 32: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/32.jpg)
33
How do you know when you don’t have enough network I/O?
TAP Replication Queue
![Page 33: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/33.jpg)
35
![Page 34: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/34.jpg)
36
MANAGEMENT AND MAINTENANCE
Dev-Test Size Deploy Monitor Manage
![Page 35: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/35.jpg)
37
Growth
Going from 5 million to 100 million users…
– RAM usage is growing:• Cache misses increasing
• Resident item ratios decreasing
• Disk fetches increasing
– Disk write queue growing higher than usual
Need to add a few more nodes...
Now we have more RAM and more disk throughput without any downtime
![Page 36: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/36.jpg)
38
Add Nodes
Read/Write/Update Read/Write/Update
Doc 7
Doc 9
Doc 3
Active Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
SERVER 4 SERVER 5
Active Docs Active Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
![Page 37: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/37.jpg)
39
As simple as running a packaged script (cbbackup)
Done on live system with minimal to no performance impact
Backup
![Page 38: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/38.jpg)
40
1) Replace backup files, server will automatically “warmup” from disk files upon restart
– Traditional RDBMS performance is acceptable while slowly populating cache
– Our applications demand a different level of performance
– Couchbase Server pre-loads as much as possible into RAM
Restore
warmup
![Page 39: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/39.jpg)
41
Restore
2) “cbrestore” used to restore data into live/different cluster
Data Files
cbrestore
![Page 40: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/40.jpg)
42
1. Add nodes of new version, rebalance…
2. Remove nodes of old version, rebalance…
3. Done!
No disruption
General use for software upgrade, hardware refresh, planned maintenance
Upgrade existing Membase 1.7 to Couchbase Server 1.8
Upgrade
![Page 41: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/41.jpg)
43
Current use of sqlite causes performance degradation as DB files get fragmented
-“vacuum” available (but not as online operation)
- Best practice: Repeat rebalance to “clean” disk files
Under Development: “Maintenance mode” to allow for safely offlining of node to perform vacuuming in place.
Couchbase Server 2.0 has much improved behavior
Disk fragmentation
![Page 42: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/42.jpg)
44
Failures Happen!
Hardware
NetworkBugs
![Page 43: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/43.jpg)
45
Easy to Manage failures with Couchbase
• Failover (automatic or manual):
– Replica data promoted for immediate access
– Replicas not recreated
– Do NOT failover healthy node
![Page 44: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/44.jpg)
46
Fail Over
Doc 7
Doc 9
Doc 3
Active Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY
CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7 Doc 8
Doc 6
Doc 3
DOC
DOC
DOCDOC
DOC
DOC
DOC DOC
DOC
DOC
DOC DOC
DOC
DOC
DOC
Doc 9
Doc 5DOC
DOC
DOC
Doc 1
Doc 8
Doc 2
Replica Docs Replica Docs Replica Docs
Active Docs Active Docs Active Docs
SERVER 4 SERVER 5
Active Docs Active Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
![Page 45: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/45.jpg)
47
Easy to maintain Couchbase
• Use remove+rebalance on “malfunctioning” node:
– Protects data distribution and “safety”
– Replicas recreated
– Best to “swap” with new node to maintain capacity
![Page 46: CouchConf-SF-Couchbase-Server-2.0-in-Production-24x7-Cluster-Ops-and-Maintenance](https://reader033.fdocuments.in/reader033/viewer/2022060202/559c1a751a28ab1d598b4794/html5/thumbnails/46.jpg)
48
QUESTIONS?