1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather:...

52
1 Designing Highly Scalable OLTP Systems Thomas Kejser: Principal Program Manager Ewan Fairweather: Program Manager Microsoft
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    212
  • download

    0

Transcript of 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather:...

Page 1: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

1

Designing Highly Scalable OLTP Systems

Thomas Kejser: Principal Program Manager

Ewan Fairweather: Program Manager

Microsoft

Page 2: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

2

Agenda Windows Server 2008R2 and SQL Server 2008R2

improvements Scale architecture

Customer Requirements Hardware setup

Transaction log essentials

Getting the code right Application Server Essentials

Database Design

Tuning Data Modification UPDATE statements

INSERT statements

Management of LOB data

The problem with NUMA and what to do about it Final results and Thoughts

Page 3: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

3

Top statisticsCategory MetricLargest single database 80 TBLargest table 20 TB

Biggest total data 1 customer

2.5 PB

Highest transactions per second 1 db

36,000

Fastest I/O subsystem in production

18 GB/sec

Fastest “real time” cube 15 sec latency

data load for 1TB 20 minutesLargest cube 4.2 TB

Page 4: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

4

Upping the Limits

Previous (before 2008R2) windows was limited to 64 cores Kernel tuned for this config

With Windows Server 2008R2 this limit is now upped to 1024 Cores New concept: Kernel Groups A bit like NUMA, but an extra layer in the hierarchy

SQL Server generally follows suit – but for now, 256 Cores is limit on R2 Currently, largest x64 machine is 128 Cores And largest IA-64 is 256 Hyperthread (at 128 Cores)

Page 5: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

5

The Path to the SocketsWindows OS

Kernel Group 0

NUMA 0

NUMA 1

NUMA 2

NUMA 3

NUMA 4

NUMA 5

NUMA 6

NUMA 7

Kernel Group 1

NUMA 8

NUMA 9

NUMA 10

NUMA 11

NUMA 12

NUMA 13

NUMA 14

NUMA 15

Kernel Group 2

NUMA 16

NUMA 17

NUMA 18

NUMA 19

NUMA 20

NUMA 21

NUMA 22

NUMA 23

Kernel Group 3

NUMA 24

NUMA 25

NUMA 26

NUMA 27

NUMA 28

NUMA 29

NUMA 30

NUMA 31

HardwareNUMA 6

CPU Socket

CPU Core

HT HT

CPU Core

HT HT

CPU Socket

CPU Core

HT HT

CPU Core

HT HT

NUMA 7

CPU Socket

CPU Core

HT HT

CPU Core

HT HT

CPU Socket

CPU Core

HT HT

CPU Core

HT HT

Page 6: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

6

And we measure it like this Sysinternals CoreInfo http://technet.microsoft.com/en-us/sysinternals/cc835722.aspx

Nehalem-EX Every socket is a NUMA node How fast is your interconnect….

Page 7: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

7

And it Looks Like This...

Page 8: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

8

Customer ScenariosCore Banking

Healthcare System

POS

Workload Credit Card transactions from ATM and Branches

Sharing patient information across multiple healthcare trusts

World record deployment of ISV POS application across 8,000 US stores

Scale Requirements

10.000 Business Transactions / sec

37,500 concurrent users

Handle peak holiday load of 228 checks/sec

Technology App Tier .NET 3.5/WCFSQL 2008R2Windows 2008R2

App Tier: .NETSQL 2008R2 Windows 2008R2

Virtualized App Tier: Com+, Windows 2003SQL 2008, Windows 2008

Server HP SuperdomeHP DL785G6

IBM 3950 and HP DL 980

DL785

Page 9: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

9

Hardware Setup – Database files Database Files

# should be at least 25% of CPU cores

This alleviates PFS contention – PAGELATCH_UP

There is no signficant point of diminishing returns up to 100% of CPU cores

But manageability, is an issue...

Though Windows 2008R2 is much easier

TempDb PFS contention is a larger problem here as it’s an instance wide resource

Deallocations and Allocations , RCSI – version store, triggers, temp tables

# files shoudl be exactly 100% of CPU Threads

Presize at 2 x Physical Memory

Data files and TempDb on same LUNs It’s all random anyway – don’t sub-optimize

IOPS is a global resource for the machine. Goal is to avoid PAGEIOLATCH on any data file

Example: Dedicated XP24K SAN ~500 spindles in 64 LUN (RAID5 7+1)

No more than 4 HBA per LUN via MPIO

Key Takeaway: Script it! At this scale, manual work WILL drive you insane

Page 10: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

10

Special Consideration: Transaction Log Transaction log is a set of 127 linked buffers

with max 32 outstanding IOPS Each buffer is 60KB

Multiple transactions can fit in one buffer BUT: Buffer must flush before log manager can signal a

commit OK

Pre-allocate log file Use dbcc loginfo for existing systems Transaction log throughput was ~80MB/sec

But we consistently got <1ms latency, no spikes! Initial Setup: 2 x HBA on dedicated storage port on RAID10

with 4+4 When tuning for peak: SSD on internal PCI bus (latency: a

few µs)

Key Takeway: For Transaction Log, dedicate storage components and optimize for low latency

Page 11: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

11

Network Cards – Rule of Thumb At scale, network traffic will generate a LOT

of interrupts for the CPU These must be handled by CPU Cores

Must distribute packets to cores for processing

Rule of thumb (OTLP): 1 NIC / 16 Cores Watch the DPC activity in Taskmanager In Windows 20003 remove SQL Server (with affinity mask)

from the NIC cores

Page 12: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

12

Lab: Network Tuning Approaches1. Tuning configuration options of a single NIC

card to provide the maximum throughput.2. Improve the application code to compress

LOB data before sending it to the SQL Server

3. Team a pair of 1 Gb/s NICs to provide more bandwidth (transparent to the app).

4. Add multiple NICS (better for scale )

Page 13: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

13

Tuning a Single NIC Card – POS system Enable RSS to enable multiple CPUs to

process receive indications:http://www.microsoft.com/whdc/device/network/NDIS_RSS.mspx

The next step was to disable the Base Filtering Service in Windows and explicitly enable TCP Chimney offload. Careful with Chimney Offload as per KB 942861

Page 14: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

14

Before and After Tuning Single NIC

1. Before any network changes the workload was CPU bound on CPU0

2. After tuning RSS, disabling Base Filtering Service and explicitly enabling TCP Chimney Offload CPU time on CPU0 was reduced. The base CPU for RSS successfully moved from CPU0 to another CPU.

1 2

3

Page 15: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

16

SQL Server Memory Setup For large CPU/Memory box, Lock Pages in

Memory really matters We saw more than double performance Use gpedit.msc to grant it to SQL Service account

Consider TF834 (Large page Allocations) On Windows 2008R2 previous issues with this TF are fixed Around 5-10% throughput increase Increases startup time

Beware of NUMA node memory distribution Set max memory close to box max if dedicated box

available

Page 16: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

17

SQL Server Configuration Changes As we increased number of connections to around

6000 (users had think time) we started seeing waits on THREADPOOL Solution: increase sp_configure ‘max worker threads’ Probably don’t want to go higher than 4096

Gradually increase it, default max is 980 Avoid killing yourself in thread management – bottleneck is

likely somewhere else

Use affinity mask to get rid of SQL Server for cores running NIC traffic

Well tuned, pure play OLTP No need to consider parallel plans Sp_configure ‘max degree of parallelism’, 1

Page 17: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

18

Getting the Code RightDesigning Highly Scalable OLTP Systems

Page 18: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

20

To DTC or not to DTC: POS System Com+ transactional applications are still prevalent

today This results in all database calls enlisting in a DTC

transaction 45% performance overhead Scenario in the lab involved two Resource

Managers MSMQ and SQL:

Tuning approaches 1. Optimize DTC TM configuration (transparent to app)

2. Remove DTC transactions (requires app changes) Utilize System.Transactions which will only promote to DTC

if more than one RM is involved See Lightweight transactions:

http://msdn.microsoft.com/en-us/magazine/cc163847.aspx#S5

wait_type total_wait_time_ms total_waiting_tasks_count

average_wait_ms

DTC_STATE 5,477,997,934 4,523,019 1,211

PREEMPTIVE_TRANSIMPORT 2,852,073,282 3,672,147 776

PREEMPTIVE_DTC_ENLIST 2,718,413,458 3,670,307 740

Page 19: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

21

Optimizing DTC Configuration

Default application servers use local TM (MSDTC Coordinator)

Introduces RPC communication between SQL TM and App Server TM

App virtualization layer incurs ‘some’ delay

Configuring application servers to use remote coordinator removes RPC communication

See Mike Ruthruff’s paper on SQLCAT.COM:

http://sqlcat.com/msdnmirror/archive/2010/05/11/resolving-dtc-related-waits-and-tuning-scalability-of-dtc.aspx

Page 20: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

22

Things to Double Check Connection pooling enabled?

How much connection memory are we using? Monitor perfmon: MSSQL: Memory Manager

Obvious Memory or Handle leaks? Check perfmon Process counters in perfmon for .NET app Server side processes will keep memory unless under

pressure

Can the application handle the load? Call into dummy procedures that do nothing Check measured application throughput Typical case: Application breaks before SQL

Page 21: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

23

Remote Calling from WCF Original client code: Synchronous calls in

WCF Each thread must wait for network latency before

proceeding Around 1ms waiting Very similar to disk I/O – thread will fall asleep

Lots of sleeping threads Limited to around 50 client simulations per machine

Instead, use IAsyncInterface

Page 22: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

24

Fully Qualified Calls To Stored Procedures Developer uses Exec myproc for dbo.myproc SQL acquires an exclusive lock LCK_M_X and

prepares to compile the procedure; this includes calculating the object ID

dm_exec_requests revealed almost all the sessions were waiting on LCK_M_X to compile a stored procedure

SOS_CACHESTORE spins - GetOwnerBySID Workaround: make app user DB_Owner

Page 23: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

25

Tuning Data ModificationDesigning Highly Scalable OLTP Systems

Page 24: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

26

Database Schema – Credit Cards

Transaction

ATM

Account

Transaction_IDCustomer_IDATM_IDAccount_IDTransactionDateAmount…

Account_IDLastUpdateDateBalance… ID_ATM

ID_BranchLastTransactionDateLastTransaction_ID…

INSERT .. VALUES (@amount)INSERT .. VALUES (-1 * @amount)

UPDATE ..SET LastTransaction_ID = @ID + 1LastTransactionDate = GETDATE()

UPDATE … SET Balance

10**10 rows

10**5 rows

10**3 rows

Page 25: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

27

Summary of Concerns Transaction table is hot

Lots of INSERT

How to handle ID numbers?

Allocation structures in database

Account table must be

transactionally consistent with Transaction Do I trust the developers to do this?

Cannot release lock until BOTH are in sync

What about latency of round trips for this

Potentially hot rows in Account Are some accounts touched more than others

ATM Table has hot rows. Each row on average touched at least ten times per second

E.g. 10**3 rows with 10**4 transactions/sec

Transaction

ATM

Account

Transaction_IDCustomer_IDATM_IDAccount_IDTransactionDateAmount…

Account_IDLastUpdateDateBalance…

ID_ATMID_BranchLastTransactionDateLastTransaction_ID…

Page 26: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

28

Generating a Unique ID Why wont this work?

CREATE PROCEDURE GetID@ID INT OUTPUT@ATM_ID INTAS

DECLARE @LastTransaction_ID INT

SELECT @LastTransaction_ID = LastTransaction_IDFROM ATMWHERE ATM_ID = @ATM_ID

SET @ID = @LastTransaction_ID + 1

UPDATE ATMSET @LastTransaction_IDWHERE ATM_ID = @ATM_ID

Page 27: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

29

Concurrency is Fun

SELECT @LastTransaction_ID = LastTransaction_ID

FROM ATM

WHERE ATM_ID = 13

SET @ID = @LastTransaction_ID + 1

UPDATE ATM

SET @LastTransaction_ID = @ID

WHERE ATM_ID = 13

SELECT @LastTransaction_ID = LastTransaction_ID

FROM ATM

WHERE ATM_ID = 13

SET @ID = @LastTransaction_ID + 1

UPDATE ATM

SET @LastTransaction_ID = @ID

WHERE ATM_ID = 13

ATM

ID_ATM = 13LastTransaction_ID = 42…

(@LastTransaction_ID = 42)

(@LastTransaction_ID = 42)

Page 28: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

30

Generating a Unique ID – The Right way

CREATE PROCEDURE GetID@ID INT OUTPUT@ATM_ID INTAS

UPDATE ATMSET LastTransaction_ID = @ID + 1 , @ID = LastTransaction_ID WHERE ATM_ID = @ATM_ID

And it it is simple too...

Page 29: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

31

Hot rows in ATM Initial runs with a few hundred ATM shows

excessive waits for LCK_M_U Diagnosed in sys.dm_os_wait_stats Drilling down to individual locks using sys.dm_tran_locks Inventive readers may wish to use Xevents

Event objects: sqlserver.lock_acquired and sqlos.wait_info

Bucketize them

As concurrency increases, lock waits keep increasing While throughput stays constant Until...

Page 30: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

32

Spinning around

0

1000

0

2000

0

3000

0

4000

0

5000

0

6000

0

7000

0

8000

0

9000

0

1000

001.00E+00

1.00E+02

1.00E+04

1.00E+06

1.00E+08

1.00E+10

1.00E+12

1.00E+14

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

lg(Spins)Throughput

Requests

Spin

s

Thro

ughput

• Diagnosed using sys.dm_os_spinlock_stats• Pre SQL2008 this was DBCC SQLPERF(spinlockstats)

• Can dig deeper using Xevents with sqlos.spinlock_backoff event

• We are spinning for LOCK_HASH

Page 31: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

33

LOCK_HASH – what is it?

ROWLock Manager

Thread

More Threads

LOCK_H

ASH

LCK_U

- Why not go to sleep?

Page 32: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

34

Locking at Scale Ratio between ATM machines and

transactions generated too low. Can only sustain a limited number of locks/unlocks per

second Depends a LOT on NUMA hardware, memory speeds and

CPU caches Each ATM was generating 200 transactions / sec in test

harness

Solution: Increase number of ATM machines Key Takeway: If a locked resource is contended – create

more of it Notice: This is not SQL Server specific, any piece of code

will be bound by memory speeds when access to a region must be serialized

Page 33: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

35

Hot rows in Account Three ways to update Account table

1) Let application servers invoke transaction to both insert in TRANSACTION and UPDATE account

2) Set a trigger on TRANSACTION3) Create stored proc that handles the entire

transaction Option 1 has two issues:

App developers may forget in all code paths Latency of roundtrip: around 1ms – i.e. no more than 1000

locks/sec possible on single row

Option 2 is better choice! Option 3 must be used in all places in app to

be better than option 2.

Page 34: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

36

Hot Latches! LCK waits are gone, but we

are seeing very high waits for PAGELATCH_EX High = more than 1ms

What are we contending on? Latch – a light weight

semaphore Locks are logical

(transactional consistency) Latches are internal SQL

Engine (memory consitency)

Because rows are small (many fit a page) multiple locks may compete for one PAGELATCH

Page (8K)

ROW

ROW

ROW

ROW

LCK_U

LCK_U

PAGELATCH_EX

Page 35: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

37

Row Padding In the case of the ATM

table, our rows are small and few

We can ”waste” a bit of space to get more performance

Solution: Pad rows with CHAR column to make each row take a full page

1 LCK = 1 PAGELATCH

Page (8K)

ROW

LCK_U

PAGELATCH_EX

CHAR(5000)

ALTER TABLE ATM ADD COLUMN Padding CHAR(5000) NOT NULL DEFAULT (‘X’)

Page 36: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

38

INSERT throughput Transaction table is by far the most active

table Fortunately, only INSERT

No need to lock rows But several rows must still fit a single page

Cannot pad pages – there are 10**10 rows in the table

A new page will eventually be allocated, but until it is, every insert goes to same page

Expect: PAGELATCH_EX waits And this is the observation

Page 37: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

39

Hot page at the end of B-tree with increasing index

0

5000

10000

15000

20000

25000

30000

35000

1 2 3 4 5 10 15 20 30 40 50 60 70 80 90 100

110

120

130

140

150

Inse

rts/

sec

Multiple Client Threads

Page 38: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

40

Waits & Latches Dig into details with:

sys.dm_os_wait_stats sys.dm_os_latch_waits

wait_type % Wait Time

PAGELATCH_SH 86.4%

PAGELATCH_EX 8.2%

LATCH_SH 1.5%

LATCH_EX 1.0%

LOGMGR_QUEUE 0.9%

CHECKPOINT_QUEUE 0.8%

ASYNC_NETWORK_IO 0.8%

WRITELOG 0.4%

latch_class wait_time_ms

ACCESS_METHODS_HOBT_VIRTUAL_ROOT

156,818

LOG_MANAGER 103,316

Page 39: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

41

How to Solve INSERT hotspot Hash partition the table Create multiple B-trees Round robin between

the B-trees create more resources and less contention

Do not use a sequential key

Distribute the inserts all over the B-tree

0123456

hashID

7

0,8,16

1,9,17

2,10,183,11,194,12,205,13,216,14,227,15,23

0-1000

1001- 2000

2001- 3000

3001- 4000

INS

ER

T

INS

ER

T

INS

ER

T

INS

ER

T

Page 40: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

42

0

Design Pattern: Table “Hash” Partitioning Create new filegroup or use existing

to hold the partitions

Equally balance over LUN using optimal layout

Use CREATE PARTITION FUNCTION command

Partition the tables into #cores partitions

Use CREATE PARTITION SCHEME command

Bind partition function to filegroups

Add hash column to table (tinyint or smallint)

Calculate a good hash distribution

For example, use hashbytes with modulo or binary_checksum

123456

253254255

hash

Page 41: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

43

Table Partitioning Example--Create the partition scheme and function

CREATE PARTITION FUNCTION [pf_hash16] (tinyint) AS RANGE LEFT FOR VALUES

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)

CREATE PARTITION SCHEME [ps_hash16] AS PARTITION [pf_hash16] ALL TO ( [ALL_DATA] )

-- Add the computed column to the existing table (this is an OFFLINE operation of done the simply way)

- Consider using bulk loading techniques to speed it up.

ALTER TABLE [dbo].[Transaction]

ADD [HashValue] AS (CONVERT([tinyint], abs(binary_checksum([uidMessageID ])%(16)),(0))) PERSISTED NOT NULL

--Create the index on the new partitioning scheme

CREATE UNIQUE CLUSTERED INDEX [IX_Transaction_ID] ON [dbo].[Transaction([Transaction_ID ], [HashValue]) ON ps_hash16(HashValue)

1

2

3

Note: Requires application changes Ensure Select/Update/Delete have appropriate partition elimination

Page 42: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

46

Lab Example: Before Partitioning

Latch waits of approximately 36 ms at baseline of 99 checks/sec.

1

2

Page 43: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

47

Lab Example: After Partitioning*

*Other optimizations were applied

Latch waits of approximately 0.6 ms at highest throughput of 249 checks/sec.

1

2

3 4

Page 44: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

49

B-Tree Root Split

NextPrev

Virtual

RootSHLATCH

(ACCESS_METHODSHBOT_VIRTUAL_ROOT)

LCK

PAGELATCH

X

SH

SHPAGELATCH

PAGELATCH

EX

SH

SH

EX

SH

EX

EX

EX

EX

Page 45: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

51

NUMA and What to do Remember those PAGELATCH for UPDATE

statements? Our solution: add more pages Improvemnet: Get out of the PAGELATCH

fast so next one can work on it

On NUMA systems, going to a foreign memory node takes at least 4-10 times more expensive

Use SysInternals CoreInfo tool

Page 46: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

52

How does NUMA work in SQL Server? The first NUMA node to request a page will ”own” that page

Ownership continues until page is evicted from buffer pool

Every other NUMA node that need that page will have to do foreign memory access

Additional (SQL 2008) feature is SuperLatch Useful when page is read a lot but written rarely

Only kicks in on 32 cores or more

The ”this page is latched” information is copied to all NUMA nodes

Acquiring a PAGELATCH_SH only requires local NUMA access

But: Acquiring PAGELATCH_EX must signal all NUMA nodes

Perfmon object: MSSQL:Latches

Number of SuperLatches

SuperLatch demotions / sec

SuperLatch promotions / sec

See CSS blog post

Page 47: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

53

4 RS Servers

4 RS Servers

NUMA 3

NUMA 2

NUMA 1

NUMA 0

Effect of UPDATE on NUMA traffic

0

1

2

3

ATM_ID

UPDATE ATMSET LastTransaction_ID

UPDATE ATMSET LastTransaction_ID

UPDATE ATMSET LastTransaction_ID

UPDATE ATMSET LastTransaction_ID

4 RS ServersApp Servers

Page 48: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

54

NUMA 3

NUMA 2

NUMA 1

NUMA 0

Using NUMA affinity

0

1

2

3

ATM_ID

UPDATE ATMSET LastTransaction_ID

UPDATE ATMSET LastTransaction_ID

UPDATE ATMSET LastTransaction_ID

UPDATE ATMSET LastTransaction_ID

4 RS Servers

4 RS Servers

4 RS Servers

4 RS Servers

Port: 8000

Port: 8001

Port: 8002

Port: 8003

How to: Map TCP/IP Ports to NUMA Nodes

Page 49: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

55

Final Results and thoughts 120.000 Batch Requests / sec 100.000 SQL Transactions / sec 50.000 SQL Write Transactions / sec

12.500 Business Transactions / sec

CPU Load: 34 CPU cores busy Given more time, we would get the CPU’s to 100%, Tune

the NICs more, and work on balancing NUMA more. And of NIC, we only had two and they were loading two

CPU at 100%

Page 50: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

56

Q A&Q A&

Page 51: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

Coming up…P/X001How to Get Full Access to a Database Backup in 3 Minutes or LessIderaP/L001End-to-end database development has arrivedRed GateP/L002Weird, Deformed, and Grotesque –Horrors Stories from the World of ITQuestP/L005Expert Query Analysis with SQL SentrySQLSentryP/T007Attunity Change Data Capture for SSISAttunity

#SQLBITS

Page 52: 1 Designing Highly Scalable OLTP Systems Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft.

58

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.