Post on 26-Jan-2015
description
An Insider’s Guide to ODA Performance
Prepared by: Alex Gorbachev, Pythian CTO & Gwen
Shapira
Presented by: Gwen Shapira, Senior Pythian Consultant
Gwen ShapiraSenior Consultant, Pythian
Oracle Ace Director
Alex GorbachevCTO, Pythian
President, Oracle RAC SIG
3
Why Companies Trust Pythian
Recognized Leader:• Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server
• Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and MDS Inc. to help manage their complex IT deployments
Expertise:• One of the world’s largest concentrations of dedicated, full-time DBA expertise.
Global Reach & Scalability:• 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response
8
4 © 2012 Pythian4
5 © 2012 Pythian
Oracle Database Appliance
5
•Simple RAC-In-A-Box
•2 database servers + shared storage + interconnect
• Inexpensive
We will talk about:
•Node Hardware•Interconnect•Storage•Benchmark results•Capacity planning tips
6
What’s in a Server Node?
7
8 © 2012 Pythian
ODA Front View
8
9 © 2012 Pythian
ODA Rear View
9
10 © 2012 Pythian
System Controller View
10
11 © 2012 Pythian
System Controller View
11
12 © 2012 Pythian
Server Node (SN) / System Controller (SC)
12
•Two X5675 - 3.06GHz, 6 core
•96G RAM
•Two SATA 7500 RPM, 500G disks
•Lots of network ports, both 1GbE and 10GbE
•Identical to X2-2 Exadata node
13
Oracle Database Appliance Storage
13
•20 SAS 15000 RPM 600GB
•4 SAS SSD 73GB
•Each SN – 2 HBA
•Each SN – 2 Expanders
•Each Expander – 12 disks
•Each disk – 2 SAS ports
14 © 2012 Pythian14
Only $50K
Sound of a Single Node Scaling
15
Cluster Interconnect
16
Where’s the Interconnect?
[root@odaorcl1 ~]# /u01/app/11.2.0.3/grid/bin/oifcfg getif
eth0 192.168.16.0 global cluster_interconnect
eth1 192.168.17.0 global cluster_interconnect
bond0 172.20.31.0 global public
eth0 Link encap:Ethernet HWaddr 00:21:28:E7:C3:72
inet addr:192.168.16.24 Bcast:192.168.16.255
inet6 addr: fe80::221:28ff:fee7:c372/64
UP BROADCAST RUNNING MULTICAST MTU:9000
17
[root@odaorcl1 ~]# ethtool eth0
Settings for eth0:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: d
Current message level: 0x00000001 (1)
Link detected: yes
18
19 © 2012 Pythian
Interconnect Performance
Is 1GbE a problem?
•Dedicated 2 x 1 GbE Fibre links
•No switches
•IC latency ~ 0.5 ms.
•Like Exadata over IB
•Only 2 nodes
•Workload matters
19
Throughput – 400 VUsers
20
But Wait!
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ DB CPU 6,459 29.9
buffer busy waits 123,162 3,725 30 17.3 Concurrenc
gc buffer busy release 8,871 3,383 381 15.7 Cluster
gc current block 2-way 3,282,774 1,969 1 9.1 Cluster
gc buffer busy acquire 11,073 1,364 123 6.3 Cluster
21
But Wait!
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ enq: US - contention 1,123,271 33,733 30 38.2 Other
enq: HW - contention 42,551 17,317 407 19.6 Configurat
buffer busy waits 156,152 11,550 74 13.1 Concurrenc
latch: row cache objects 798,648 6,181 8 7.0 Concurrenc
DB CPU 5,796 6.6
22
I need that buffer.
I’m busy!
381 ms later:
Here’s the buffer!
Waiting
23
Interconnect Again
Send ReceiveUsed By Mbytes/sec Mbytes/sec---------------- ----------- -----------Global Cache 48.94 43.04Parallel Query .00 .00DB Locks 4.99 5.23DB Streams .00 .00Other .00 .01
24
Instance Latency 500B MSG
Latency8K MGS
1 0.14 0.13
2 0.58 0.69
Storage Performance-
REDO LOG
25
26 © 2012 Pythian
No Storage Cache
26
Implications:
•Excessive IO will impact latency
•Online redo logs are on SSD
•Tune DBWR processes (MTTR target)
SSD
•4x 73GB
•Dedicated to redo logs
•Reminder:
•0.025ms read
•0.250ms write (best case)
•Writes are not just writes
•Over-provisioning
27
28 © 2012 Pythian28
SSD for Redo
29
•Not a general recommendation
•Consistent low latency
•Works well for multiple databases
•Leftover space
ODA: SSD Performance for LGWR
30
More LGWR Performance
Saturating LGWR Test
• 3200 writes, 2 nodes, 0.2ms latency
• LGWR spent 70% of time on CPU
SwingBench Order Entry
• 4500 TPS
• Bottleneck was buffer busy contention
Big data load
• 100K size write, several ms latency
• Data warehouse load – bad fit for ODA
31
Storage Performance-
DATA
32
HDD Performance
We tested:
•HDD Scalability
•Effects of disk placement
•Backups!
33
34 © 2012 Pythian34
ODA Small Random Reads - HDDs Scalability
35 © 2012 Pythian35
ODA Write IO impact - Minimal
ODA Write IO impact - Minimal
36
37 © 2012 Pythian
ODA Small Random Reads: Data Placement
37
38
Co-locating data onto outer 40% of a disk adds 50% more
IOPS
39 © 2012 Pythian
ODA Sequential Reads Scalability (Single node)
39
I could reach 2.4 GBPS with 24 parallel
reads for a single stream
40 © 2012 Pythian
RMAN Backup Performance (1)
40
Backup to FRA:• Optimal number of channels - 8• 42 GB of data in 1 min 45 seconds = 400 MBPS• 1.6 TB full backup in about 1 hour
41 © 2012 Pythian
RMAN Backup Performance (2)
41
Backup to external location:• BACKUP VALIDATE with 8 channels• 42 GB of data in 45 seconds = 1 GBPS•Theoretical maximum wire speed for one link 10 GbE
• 4 TB database in 1 hour 15 minutes
42 © 2012 Pythian
Configurations of note:
42
Capacity Planning forMigration or Consolidation
43
Choosing Consolidation Candidates
•Vendor limitations•SLAs•Dependencies•CPU utilization•Workload type
Big Question: Will it fit?
44
Collect metrics
•CPU utilization•Memory usage – SGA + PGA•Storage requirements•Workload types•I/O requirements – IOPS, throughput•RAC – current interconnect load
45
CPU
Build time-based model of utilization on existing servers:
46
Time S1 (8 core)
S2 (4 core)
S3 (32 core)
Total
00:00 50% 25% 10% 8*0.5+4*0.25+32*0.1 = 8.2
00:15 30% 50% 10% 7.6
00:30 100% 25% 10% 12.2We calculated 12.2 cores in use at peak time.
ODA’s 24 cores give plenty of spare capacity
You can get more accurate results by taking core speed into account. This is a rough model.
Memory
•Easiest way: Sum memory on existing servers
•Actually: Sum SGA and PGA sizes, and leave 20-30% spare
Use advisors:•OEM gives graphs with SGA and PGA size recommendations.
47
IO Capacity
•OLTP and DWH go in separate boxes•Each can be standby of the other•Consider throughput and latency requirements•According to our tests:• 12K redo IOPS at 0.5 ms latency
• Over 3000 data file IOPS at 15ms latency
• Almost 6000 if using outside only
• Can reach 2.4GBPS
48
Disk Space•High redundancy – triple data usage•Can use external storage if needed•ZFS supports HCC•Take backups into account
49
Testing
•Always test•Bad tests are still better than no tests•Replicating production load:• RAT
• “Brewing Benchmarks”
• Jmeter, Loadrunner, etc
•Especially test:• Migration strategy and times
• Non-RAC applications going to RAC
• Upgrades
50
51 © 2012 Pythian
Oracle Database Appliance Requires 11.2.0.2
51
We will upgrade andmigrate your DB
to ODA for free
© 2012 Pythian52
http://www.pythian.com/news/
http://on.fb.me/pythianfacebook
@pythian
http://linkd.in/pythian
1-877-PYTHIAN sales@pythian.com
To contact us…
To follow us…
Thank you and Q&A
@pythianjobs
Gwen Shapira – shapira@pythian.com Alex Gorbachev – gorbachev@pythian.com