Ebay: DB Capacity planning at eBay
-
Upload
datastax-academy -
Category
Technology
-
view
832 -
download
3
Transcript of Ebay: DB Capacity planning at eBay
Feng Qu, Sr MTS Bass Chorng, Principal Capacity Engineer
DB Capacity Planning at eBay
#CassandraSummit2015
Who Am I?
#CassandraSummit2015 2
Bass Chorng – Principal Capacity Engineer @ eBay Specializes in database performance, availability & scalability in a large website. Established DB capacity team at eBay in 2003. Loves mountain biking.
#CassandraSummit2015
eBay Site DB Traffic At A Glance NoSQL Total – 52 B/Day
Cassandra – 15 B Mongo – 15 B CouchBase – 12 B PushVM – 10B
RDBMS Total – 350 B
MySQL – 10 B Oracle – 340 B
Peak Traffic – 8M/sec Site Total DB Calls – 400B/Day across 2,000 NoSQL Nodes + 450 Oracle Nodes Hosting 800M Active items & 120M Active Users Y-o-Y Growth – 30% ~ 35%
15 15 12 10 10
340
Billion SQL Calls per Day
Cassandra
Mongo
CouchBase
PushVM
MySQL
Oracle
Capacity Planning - Simply Put Ø Analyze Traffic
o Data Ø Analyze Utilization
o Data Ø Analyze The Relationship Of The Above Two
o Same Data Ø Forecast Growth
o Simple Models, Then Impress Your Boss. Ø Convert Resource Need into $
o A Calculator, Then Impress Your CIO’s
BTW, You Also Need To Know …
• Platform Domain Knowledge – Server, DB Engine, IO Subsystem, Networks … • Relationship Between System Overhead & Utilization • Seasonality & Workload Characteristics • Bottlenecks – Components, Systems, Platforms, Architecture, Site & Apps • New Technologies
#CassandraSummit2015 4
Domain Knowledge Stack
#CassandraSummit2015 5
APPS
DB
UNIX
STORAGE
CAPACITY
CAPACITY
aka Whom To Blame Stack
Bottom of food chain =>
Data Ø What To Collect?
Apps, Database, Sessions, CPU, Memory, Connections, IOPS, IO Time, NIC, HBA, Array
Ø How To Collect?
Time Resolution, Aggregation Level, Retention Ø How To Use It?
Average, Max, 95th percentile, Dashboard, Reporting, Trending
#CassandraSummit2015 6
0.0
1.0
2.0
3.0
4.0 5/
1/20
15
5/2/
2015
5/
3/20
15
5/4/
2015
5/
5/20
15
5/6/
2015
5/
7/20
15
5/8/
2015
5/
10/2
015
5/11
/201
5 5/
12/2
015
5/13
/201
5 5/
14/2
015
5/15
/201
5 5/
16/2
015
5/17
/201
5 5/
19/2
015
5/20
/201
5 5/
21/2
015
5/22
/201
5 5/
23/2
015
5/24
/201
5 5/
25/2
015
5/26
/201
5 5/
27/2
015
0 5000000
10000000 15000000 20000000 25000000 30000000 35000000 40000000
1/26
/201
5 1/
28/2
015
1/30
/201
5 2/
1/20
15
2/3/
2015
2/
5/20
15
2/7/
2015
2/
9/20
15
2/11
/201
5 2/
13/2
015
2/15
/201
5 2/
17/2
015
2/19
/201
5 2/
21/2
015
2/23
/201
5 2/
25/2
015
2/27
/201
5 3/
1/20
15
Forecast Ø Model Traffic, Not Resources Ø Need One Year Trend Ø Forecast At Daily Level Ø Eliminate Outliers Ø No Data Is Better Than Wrong Data Ø Convert Traffic To Resource Usage Ø Linear Extrapolation Only (CPU Utilization, not IO Time) Ø Simple Excel Formula Works Well Ø For Long Term Resource Planning Only Ø Use Average, Not Max Ø Not All Workloads Are Predictable
#CassandraSummit2015 7
0
10
20
30
40
50
60
70
01/01/2012 01/01/2013 01/01/2014 01/01/2015
Billion Calls
CATY Traffic Forecast
Forecast Actual Capacity
Things To Watch For Myths
Ø More CPU Makes Apps Run Faster Ø More Data Makes Apps Run Slower Ø Apps Run Twice As Fast On CPU Twice The Speed Ø High Session = High Load
Pitfalls
Ø Cause VS. Symptom Ø Time Resolution Masks Issues Ø Look At The Whole Picture Ø Slow Down In Order To Go Faster < Throttle > Challenges Ø Data Quality – Data Missing, Data Source Changes, F/O Data Residency, Data Errors … Ø Varieties of Data Formats & Resolutions Ø Data Collection In Secured Zones #CassandraSummit2015
8
Me: Everything NoSQL
CassandraSummit2015 | #CassandraSummit
Ø Prior to 2011: Worked on Oracle at DoubleClick/Yahoo/Intuit
Ø Worked on NoSQL at eBay Database Infrastructure team: Ø Cassandra since 2011 Ø MongoDB since 2012 Ø Couchbase since 2014
Ø Cassandra Summit speaker for 2013, 2014, 2015
Ø DataStax Cassandra MVP for 2014, 2015
For Cassandra Ø Capacity Measurements Ø Throughput Ø Latency Ø E.g. 30,000 reads/sec with SLA of P99 at 5ms
Ø Hardware SKU Example Ø CPU: 20 cores Ø Memory: 128GB RAM Ø Storage: 1.5TB local SSD Ø Network: 10g NIC
CassandraSummit2015 | #CassandraSummit
Benchmarking Ø Benchmarking for different hardware Ø High I/O SKU Ø High memory SKU Ø High storage SKU Ø Bare metal or cloud
Ø Benchmarking for different software releases Ø Benchmarking for different workloads
Ø 100% Writes Ø 50% Writes, 50% Reads Ø 5% Writes, 95% Reads Ø 100% Reads
Ø Benchmarking Tools Ø YCSB Ø Cassandra-stress
Ø Proactive and repeated process using near real-time traffic in prod like environment
CassandraSummit2015 | #CassandraSummit
Capacity Planning
Ø Key to avoid surprise in production Ø The concept behind capacity planning is simple, but the mechanics are harder. Ø Business requirements may increase, need to forecast how much resource must be
added to the system to ensure that user experience continues uninterrupted Ø Input: clearly defined capacity goal coming from business requirement and performance baseline
from benchmark test Ø Output: Identify resources to be added, such as memory, CPU, storage, I/O, network
Ø Always prepare for peak + headroom
CassandraSummit2015 | #CassandraSummit
Capacity Planning Process
Ø Initial Sizing Ø Storage size vs. data size Ø Compaction overhead, compression ratio, RF, indexes
Ø Cost-effective configuration to meet capacpity/latency SLA Ø Routine Review Ø System utilization on I/O, storage, network, CPU, memory etc Ø Cassandra metrics on GC, compaction, latency, throughput etc Ø Compactionstats, cfhistoralgrams, tpstats etc
Ø Forecasting Ø Historical comparison Ø Traffic projection
Ø Flex up or Flex down
CassandraSummit2015 | #CassandraSummit
Scale Up vs. Scale Out Ø Scale Up(vertical)
Ø Pros Ø Smaller data center footprint, such as space, power, cooling Ø Less license cost
Ø Cons Ø Likely cost more using proprietary hardware Ø Less fault tolerant Ø Limited upgradability in future
Ø Scale Out(horizontal) Ø Pros
Ø Cheaper using commodity hardware Ø More fault tolerant Ø (unlimited) upgradability
Ø Cons Ø Bigger data center footprint Ø More license cost Ø Likely need more network equipment
CassandraSummit2015 | #CassandraSummit
Questions ?
CassandraSummit2015 | #CassandraSummit
eBay is hiring experienced NoSQL professionals, please send resume to [email protected]