Designing Highly Scalable OLTP Systems

52
1 Designing Highly Scalable OLTP Systems Thomas Kejser: Principal Program Manager Ewan Fairweather: Program Manager Microsoft

description

Designing Highly Scalable OLTP Systems. Thomas Kejser:Principal Program Manager Ewan Fairweather: Program Manager Microsoft. Agenda. Windows Server 2008R2 and SQL Server 2008R2 improvements Scale architecture Customer Requirements Hardware setup Transaction log essentials - PowerPoint PPT Presentation

Transcript of Designing Highly Scalable OLTP Systems

Designing Highly Scalable OLTP SystemsThomas Kejser:Principal Program ManagerEwan Fairweather: Program ManagerMicrosoft #1AgendaWindows Server 2008R2 and SQL Server 2008R2 improvementsScale architectureCustomer RequirementsHardware setupTransaction log essentialsGetting the code rightApplication Server EssentialsDatabase DesignTuning Data ModificationUPDATE statementsINSERT statementsManagement of LOB dataThe problem with NUMA and what to do about itFinal results and Thoughts #2Top statisticsCategoryMetricLargest single database80 TBLargest table20 TBBiggest total data 1 customer2.5 PBHighest transactions per second 1 db36,000Fastest I/O subsystem in production18 GB/secFastest real time cube15 sec latencydata load for 1TB20 minutesLargest cube4.2 TB #3Upping the LimitsPrevious (before 2008R2) windows was limited to 64 coresKernel tuned for this config With Windows Server 2008R2 this limit is now upped to 1024 CoresNew concept: Kernel GroupsA bit like NUMA, but an extra layer in the hierarchySQL Server generally follows suit but for now, 256 Cores is limit on R2Currently, largest x64 machine is 128 CoresAnd largest IA-64 is 256 Hyperthread (at 128 Cores)

#4The Path to the SocketsWindows OS Kernel Group 0NUMA 0NUMA 1NUMA 2NUMA 3NUMA 4NUMA 5NUMA 6NUMA 7 Kernel Group 1NUMA 8NUMA 9NUMA 10NUMA 11NUMA 12NUMA 13NUMA 14NUMA 15 Kernel Group 2NUMA 16NUMA 17NUMA 18NUMA 19NUMA 20NUMA 21NUMA 22NUMA 23 Kernel Group 3NUMA 24NUMA 25NUMA 26NUMA 27NUMA 28NUMA 29NUMA 30NUMA 31HardwareNUMA 6 CPU Socket CPU Core HT HT CPU Core HT HT CPU Socket CPU Core HT HT CPU Core HT HTNUMA 7 CPU Socket CPU Core HT HT CPU Core HT HT CPU Socket CPU Core HT HT CPU Core HT HT #5And we measure it like thisSysinternals CoreInfo http://technet.microsoft.com/en-us/sysinternals/cc835722.aspx Nehalem-EXEvery socket is a NUMA node How fast is your interconnect.

#And it Looks Like This...

#Customer ScenariosCore BankingHealthcare SystemPOSWorkloadCredit Card transactions from ATM and BranchesSharing patient information across multiple healthcare trustsWorld record deployment of ISV POS application across 8,000 US storesScale Requirements10.000 Business Transactions / sec37,500 concurrent usersHandle peak holiday load of 228 checks/secTechnologyApp Tier .NET 3.5/WCFSQL 2008R2Windows 2008R2App Tier: .NETSQL 2008R2 Windows 2008R2 Virtualized App Tier: Com+, Windows 2003SQL 2008, Windows 2008 ServerHP SuperdomeHP DL785G6IBM 3950 and HP DL 980 DL785 #Hardware Setup Database filesDatabase Files # should be at least 25% of CPU coresThis alleviates PFS contention PAGELATCH_UP There is no signficant point of diminishing returns up to 100% of CPU coresBut manageability, is an issue...Though Windows 2008R2 is much easierTempDb PFS contention is a larger problem here as its an instance wide resource Deallocations and Allocations , RCSI version store, triggers, temp tables# files shoudl be exactly 100% of CPU ThreadsPresize at 2 x Physical MemoryData files and TempDb on same LUNsIts all random anyway dont sub-optimizeIOPS is a global resource for the machine. Goal is to avoid PAGEIOLATCH on any data file Example: Dedicated XP24K SAN~500 spindles in 64 LUN (RAID5 7+1)No more than 4 HBA per LUN via MPIOKey Takeaway: Script it! At this scale, manual work WILL drive you insane #9Special Consideration: Transaction LogTransaction log is a set of 127 linked buffers with max 32 outstanding IOPSEach buffer is 60KBMultiple transactions can fit in one bufferBUT: Buffer must flush before log manager can signal a commit OKPre-allocate log file Use dbcc loginfo for existing systemsTransaction log throughput was ~80MB/secBut we consistently got