Post on 18-Jan-2018
description
Storage in Microsoft Exchange Server 2010
Matt GossageSenior Program ManagerMicrosoft CorporationUNC321
Agenda
Exchange storage backgroundStorage technology 2010+Large mailbox valueE2010 storage architecture
Store innovationsESE database innovations
E2010 storage designSummary
Exchange 2003 HA/Storage DesignMSIT 4+3 SCC SAN example
4 Active Nodes3 Passive Node8 Processor cores 4 GB of RAM4000 Users/Server250 MB MailboxesBackups:
Daily FullStream to disk/tape
SAN Fabric B
SAN Fabric A
+1 IOPS/Mailbox
RAID10 3.5” 10K FC Disks
Storage is single point of failure
Exchange 2007 HA/Storage DesignMSIT CCR + DAS example
File Share Witness
Hub Transport
Server:
Transport Dumpster
Public Network
Private Network
Active Node Passive Node
CCR
RAID
Transaction Log Shipping
Replay
RAID
RAID
RAID
RAID5 2.5” 10K SAS Disks
.33 IOPS/Mailbox
No single points of failure!
~4000 Mailboxes/Cluster8 Processor cores 16 GB of RAM2 GB MailboxesBackups: DPM
15 min IncrementalDaily Express Full
Disk Technology
Disk Capacity trend predicted to continue2TB Desktop class SATA disks available1TB Nearline/Midline SAS disk available
Sequential throughput increasing linearly based on areal density
2010 SATA = ~250MB/secRandom I/O performance not expected to improve substantially
15K RPM is the ceiling
Random vs. Sequential Disk IORandom IO
Disk head has to move to process subsequent IOHead movement = High IO latencySeek Latency limits IOPS
Disk Head
7.2K SATA Disk (20ms Latency)Random = 50 IOPSSequential = +300 IOPS!
Sequential IODisk head does not move to process subsequent IOStationary Head = Low IO latencyDisk RPM speed limits IOPS
IOPS = Input/Outputs (IO’s)per second
FLASH/SSD: E2010 Scenarios
Cache
SSD
PCM
Enterprise SAN ArrayHybrid HDD
SATA SSD
NAND
NAND HBA / RAID
Flash best utilized by E2010 when used as a cache within storage stack
E2010 Mailbox Server
?
E-mail Trends
The average corporate user, today, can expect to send and receive about 156 messages a day, and this number is expected to grow to about 233 messages a day by 2012. An increase of 33% over the four-year period. (Radicati, 2008)Business users report that they currently spend 19% of their work day, or close to 2 hours/day on email. (Radicati, 2007)
2008 2010 20120
50
100
150
200
250Messages Sent/Received Per User/Day
Large Mailbox ValueLarge Mailbox = 1-10GB+
“Aggregate Mailbox” = Primary mailbox + Archive Mailbox~1 Year of mail (minimum)
Increased knowledge worker productivity
Reduced mailbox managementClient Accessibility (Outlook/OWA/Mobile)
Eliminate/Reduce PST’sEliminate/Reduce 3rd Party Archive
Time Items Mailbox Size (MB)
1 Day 200 101 Month 4000 2001 Year 48,000 2,4004 Years 192,000 9,600*Very Heavy Profile = 150 Receive + 50 Send /Day, 50KB, no deletions
Large Mailbox Challenges & SolutionsClient Experience
Performance Improvements: Office 2007 SP2 (KB953195)
Updated OST sizing guidance (10GB)Utilize the E2010 Archive Mailbox to reduce data cached to OSTE2010 Store/ESE changes
Outlook 2007 Performance (Cached Mode)
Outlook 2007 (Online)/OWA Performance
Items/folder LimitationsView Creation Performance
Client Search Performance
E2010 Store/ESE changes
E2010 Search Performance Improvements
Real-time result views2x increase in indexing performance
E2010 Store/ESE changes
Large Mailbox Challenges & SolutionsDeployment/Operations Backup off passive copies
Daily Incremental/Weekly Full backupsDPM Express Full BackupsE2010 HA + Hold Policy is your backup
Long Backup Times
Fast Recovery Requirements (RTO)
High Storage CostsIOPS (efficiently utilizing low
performance/high capacity disks)RAID overhead
E2010 HA
E2010 Store/ESE changes
Move Mailbox Downtime E2010 Online Move Mailbox
Database MaintenanceOnline Maintenance Duration (OLD)DB corruption (-1018) pain pointDB re-seed performance hit on
active copy
E2010 Store/ESE changes
Exchange 2010 Storage Vision
IO ReductionSequential IO
Large, Fast, Low-cost Mailboxes
SATA/Tier 2 Disk Optimization
Storage Design Flexibility
RAID’less Storage (JBOD)
IOPS Reductions: Store Schema Changes
Store Schema = The way the Store organizes data in the ESE DatabaseE2010: One simple theme
Move away from doing many, random, small size, disk IOs to doing fewer, sequential, large size, disk IOs.
Significant BenefitsFast/Efficient..
OWA/Outlook Online Mode…end user viewing for “cold” states/first time view creation…Calendar Operations…Search performance
Outlook Cached Mode/Exchange Active SyncOST sync = sequential IOEAS sync = sequential IO
Server Management…Move mailbox…Content Index Crawls
IOPS Reduction: Store Table Architecture
E2007
Message/Folder Table (MFT)
Joe:Inbox:H3
Joe:Inbox:H2
Joe:Inbox:H1
Per Database Per Folder
Mailbox Table
Jeff’s Mbx
Ann’s Mbx
Joe’s Mbx
Attachments Table
Jeff:Excel.xls
Ann:Pic.bmp
Joe:Help.doc
Message Table (Msg)
Joe:Msg10
Jeff:Msg32
Ann:Msg180
Folders Table
Jeff:Inbox
Ann:Drafts
Joe:Unread
E2010
View Tables (e.g. From)
Joe:H920
Joe:H302
Joe:H10
Secondary Indexes used for Views
Per Mailbox
Mailbox Table
Jeff’s Mbx
Ann’s Mbx
Joe’s Mbx
Body
Joe:Msg10
Joe:Help.doc
Joe:Msg302
Message Header Table
Joe:H10
Joe:H302
Joe:H920
Folders Table
Joe:Inbox
Joe:Drafts
Joe:Unread
Per Database
New Store Schema = no more single instance storage within a DB
Per View
Store Schema Changes: Physical Contiguity
1078
B+ Tree
92 4577 6 872 7210 3278 21 9346
1078
B+ Tree
1079 1080 1081 1082 1083 3456 3457 3458
E2007
E2010
Many, small size, IOs (1 per 8K page)
Fewer, larger size, sequential IOs
DB Pages (Page Numbers)
B+Tree = Table
Store Schema Changes: Logical Contiguity
E2007
E2010
Many, small size, IOs
Fewer, large size, IOs
Inbox
M1
Calendar
M3
Drafts
M5
For Follow-up
M4
DL Mail
M2
Mailbox
DL Mail M1
Calendar M2
Drafts M3
For Follow-up M4
Inbox M5
Mailbox
Random
Sequential
Store Schema Changes: Lazy View Updates
E2007
E2010
Many, random, IOs (1 per update)
Fewer, sequential, IOs (1 per view)
All Unread or Flagged items (view)
TimeM1 arrives M2 arrives M1 flagged M3 arrives M2 deleted
User uses OWA/Outlook Online and switches to this view
All Unread or Flagged items (view)
M1 M2 M1 M3 M2
M1 M2 M1 M3 M2
Nickel & Dime Approach
Pay to Play Approach
DB I/O
Reducing IO by deferring view updatesView updates utilize sequential IO
Outlook 2007 SP2 Large Mailbox Performance on E2010
demo
IOPS Reduction: ESE ChangesOptimize for new Store Schema
Allocate database space in contiguous mannerMaintain database contiguity over timeUtilize space efficiently (Database compression)
Increase IO SizesDB page size increased from 8KB to 32KBImproved read/write IO coalescing (Gap coalescing)Provide improved async read capability (Pre-read)
Increase Cache Effectiveness100MB Checkpoint Depth (HA configurations only) DB Cache Compression (aka Dehydration)DB Cache Priority (aka Fast Evict)
IOPS Reduction: Space ManagementAllocate space based on contiguity
Page 1
Used
Page 3
Used
Disk
Database Space Allocation Hints:• Allocate DB space based on either data compactness or data contiguity
(usage pattern)
DB CachePage X
Msg Header
Page Y
Msg Header
Page Z
Event History
Contiguity
Space Contiguity
Space Compactness
Page 4
Msg Header
Page 5
Msg Header
Page 2
Event History
Sequential/BloatRandom/Compact
IOPS Reduction: Maintain Contiguity New database maintenance architecture
ESE Function E2007 SP1 E2010
Cleanup (deleted items/mailboxes)
Cleanup performed during Online Defrag (OLD) which occurs during Online Maintenance (OLM) time window
Cleanup performed at run time (when hard delete occurs). Happens during Store dumpster cleanup (OLM), pages are zeroed by default.
Space Compaction
Database is compacted and space reclaimed during Online Defrag (OLD)
Database is compacted and space reclaimed at run-time. Auto-throttled.
Maintain Contiguity N/A: Contiguity is compromised by space compaction
Database is analyzed for contiguity and space at run time and is defragmented in the background (B+Tree Defrag/OLD2). Auto-throttled.
Database Checksum When configured, ½ of OLD maintenance window reserved for sequential scan (Checksum), manual throttle. Active DB copy only.
Two options (both Active and Passive copies):1. Run DB Checksum in the
background 24x7 (default). Sequential IO
2. Run DB Checksum during OLM window. Sequential IO
IOPS Reduction: DB Contiguity ResultsE2007 Message Folder Table (aka MFT)
E2010 Message Header Table (aka MsgHeader)
Blue = contiguous (good)Red = fragmented (bad)
*Production database analysis
Random Deletes at the tail
FRAGMENTED
CONTIGUOUS
DB Page Numbers
Mitigate DB Space Growth: Database Compression
Store Schema change, Space Hints, B+Tree Defrag & 32KB page size combine to increase DB file size by 20%.Growth is 100% mitigated by Database Compression
7bit/XPRESS Compression for message headers and text/html bodies (Long Values)
E2007/RTF E2010/RTF E2010/Mix E2010/HTML0.000.200.400.600.801.001.201.40
1.001.20
1.000.88
Counts E2007 SP1 E2010 Mailbox Count 750 750Tables 14754 92435Secondary Indexes 85784 4557Pages 28,486,144 5,814,032Used Pages (%) 85.7% 86.7%Available Pages (%) 14.3% 13.3%
1 Database, 750 x 250MB mailboxes,RTF = RTF Compressed, Mix = 77% HTML, 15% RTF, 8% Text, Avg. Message size = ~50KB
Msg Views
32KB Pages
DB Space AnalysisDB File Size Comparison
IOPS Reduction: DB Page Size Increased to 32KB
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
DiskPage 4
X
Page 5
MsgBody
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
MsgBody
3 Read IO’s
Page 1 (32KB)
Msg Header, Msg Body
Disk
DBCache
1 Read IO
E2007 DB Read 20KB Message
E2010 DB Read 20KB Message
~20KB Message
8 KB Pages
32 KB Pages
Page 2 (32KB)
X
Page 1 (32KB)
Msg Header, Msg Body
IOPS Reduction: IO Gap CoalescingRead Case
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
DiskPage 4
X
Page 5
Msg Body
E2007 DB Read Behavior
E2010 DB Read Behavior
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
Msg Body
3 Read IO’s
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
DiskPage 4
X
Page 5
Msg Body
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
Msg Body
Page 2
Temp Buffer
Page 4
TempBuffer
1 Read IO
IOPS Reduction: 100MB Checkpoint DepthCheckpoint Depth = The amount of data that is waiting to be committed to the database file (edb).E2010 default Checkpoint Depth Max is increasing from 20MB to 100MB only on databases protected by E2010 HA (standalone still 20MB).
Loadgen Test: 3000 Mailbox, 12 DB, Outlook 2007 Online Very Heavy Profile
20 40 60 80 100 2000
20
40
60
80
100
120
Database Pages Re-peatedly Written/sec
DB Writes/sec (avg)
Checkpoint Depth (MB)
100MB Checkpoint Depth = 40% DB write IO reduction
Deep Checkpoint Benefit = Efficient DB writes (~40% reduction)
Deep Checkpoint Risks = long store shutdown times, long crash recovery times.Risk Mitigation: shutdown databases in parallel, failover on store crash
IOPS Reduction: DB Cache CompressionProblem: New Store Schema + 32KB pages can reduce efficiency of cache. E.g. A page with 8KB of data consumes 32KB of memory in the DB Cache.Solution: Implement DB Cache Compression to shrink partially used cached pages in memory; allowing more Effective cache.
Page 1 (32KB)
8KB
Disk
DBCache
Page 1 (32KB)
8KB
1. 32KB Page with only 8KB of data is read off disk
2. 32KB page is compressed to a 8KB in-memory image
Up to 30% more cache/mailbox serverMore Cache = Less DB IO!
Page 1 (8KB)
8KB
IOPS Reduction: DB Cache PriorityProblem: Background and recovery DB operations can pollute the cache. E.g. DB Check summing, OLD2, HA log replay.Solution: Implement DB Cache Priority to allow lower cache priorities for background/replay operations.
Now Past Future
DB Cache Time
Outlook Message Read
HA Log Replay (Passive)
DB Maintenance
Cache Eviction Cache Entry
ESE Caching Algorithm = LRU-K (Least Recently Used)
Exchange 2010 Storage Speeds and Feeds
DB IO E2007 E2010
IO Type Random “Sequentialish”
Read:Write 1:1 3:2
Avg Read IO Size (KB)
12 52
Avg Write IO Size (KB)
8 60
Mailbox IO Characteristics: E2007 vs... E2010
3000 Mailboxes, 12 DB’s 4MB DBCache/Mailbox, Loagen Outlook 2007 Online Very Heavy Profile, 250MB Mailbox Size
Log IO E2007 E2010
IO Type Sequential Sequential
Read:Write 0:1 0:1
Avg Read IO Size (KB)
n/a n/a
Avg Write IO Size (KB)
10 10
DB IO Sizes increase by 5x!!
Log IO Write Size is the same...
IOPS Reduction: E2007 vs. E2010 Results
E2007 E20100
50
100
150
200
250
300
350
400
450
500
DB IOPS Comparison
DB Read IO/SecDB Write IO/SecDB IO/Sec
+70% Reduction!
3000 Mailboxes, 3MB DB Cache/user, Loadgen Outlook 2007 Online Very Heavy Profile, 250MB Mailbox Size, E2010 Beta
Exchange IOPS Trend
Exchange 2003 Exchange 2007 Exchange 20100
0.2
0.4
0.6
0.8
1
DB IOPS/Mailbox
IOPS/Mailbox
+90% Reduction!
Optimize for SATA/Tier 2 DisksDB Write IO “Burstiness”Problem: Bursty DB writes negatively affect DB read and Log write latency
• The more write IO’s issued at a time, the more disk contention.
2 4 8 16 32 640
20
40
60
80
100
120
IO Latency Based on Max DB Write IO’s (ms)
Maximum DB Write IO's Issued
Latency (ms)
DB Read IO
Single 7.2k SATA disk, logs/db on same spindle, Loadgen load generating 250 RPC Operations/second, ~50 IOPS
Log Write IO
Solution: Throttle DB writes based on Checkpoint target (QoS), DB Write Smoothing
DB Write Smoothing: Results
3000 Mailboxes, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile
Exchange 2010 Baseline Exchange 2010 Smooth DB IO
05
101520253035404550
49
34
3.7 0.700000000000001
10.15.1
E2010 Smooth DB IO Benefit
DB Read Latency (ms)
Log Write Latency (ms)
RPC Average Latency
50% Reduction!
Putting It Altogether: Mailboxes/Disk
Exchange 2007 Exchange 2010
Mailboxes/Disk
250MB Mailbox Size, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile, measured at <20ms RPC Average latency
E2010 Storage improvements cannot be quantified in IOPS reductions alone
+500
125
+4X Mailboxes/Disk!
JBOD/RAID'less Storage: Now an option!JBOD : 1 disk = 1 Database/LogRequires E2010 HA (3+ DB Copies)Annual Disk Failure Rate (AFR) = ~5%
JBOD AdvantagesReducing Storage Costs/Complexity
Eliminates unnecessary DB copies: Server and Storage redundancy can be symmetrical
Reduces Disk IO: Eliminates RAID write penaltyEnables Simple Storage Design: 1 disk = 1 database
Enables Simple Storage Failure Recovery
JBOD ChallengesExchange HA/Storage must replace RAID functionality
Disk Striping performance (e.g. RAID10) cannot be leveraged
Disk Failure = Database Failover (~30 second outage)
Re-enabling Resiliency = Spare disk assignment/partitioning/format/DB re-seed (scriptable)
Soft Disk Errors (bad blocks) must be detected and repaired
JBOD/RAID'less Storage: E2010 Optimizations
Failovers < 30 secondsESE tuned to maintain DB cache after failover (Cache warming)
Optimize HA Failovers/Switchovers
Improve storage failure detection (bad blocks/corruption)
Improve Database Seeding/Repair
Improve HA storage failure detection and failover
HA now detects storage failures and automatically fails over
Active/Passive copy background scan (Checksum)Active/Passive copy Lost Write Detection
Utilize DB passive copy for seeding sourceSeed capability for Content Index CatalogReduce re-seeds by using Single Page Restore (Active and Passive)
Mailbox Server Node 1
Mailbox Server Node 2
Database Availability Group (DAG)
Page1
Page2
Page3
Mailbox Server Node 3
1. Page corruption detected on Active Copy (e.g. -1018)
2. Active DB places marker in log stream to notify passive copies to ship up to date page
3. Passive receives log and replays up to marker, retrieves good page, invokes Replay Service callback and ships page
4. Active receives good page, writes page to DB. Page is restored.
DB1-Active
Database
Log
Page1
Page2
Page3
DB1-CopyA
Database
Log
Page1
Page2
Page3
DB1-CopyB
Database
Log
5. Subsequent page repair from additional copies ignored
JBOD/RAID'less Storage: Single Page Restore (Active)
E2010 HA Storage Design Flexibility
SAN DAS (SAS) JBOD (SATA/Tier2)• HA = Shared Storage
Clustering• +1.0 IOPS/Mailbox• 3.5” 15K 146GB FC Disks• RAID10 for DB & Logs• Dedicated Spindles• Multi-path (HBA’s, FC Switches, SAN array controllers)• Backup = Streaming off active • Fast Recovery = Hardware VSS (Snapshots/Clones)
• HA = CCR• .33 IOPS/Mailbox• 2.5” 146GB 10K SAS Disks• RAID5 for DB• RAID10 for Logs• SAS Array Controller (/w BBU)• Backup = VSS Snapshot• Fast Recovery = CCR
• HA = DAG (2+ DB copies)• .11 IOPS/Mailbox• 3.5” 1TB 7.2K SATA/Tier2 Disks• RAID10 for DB & Logs• SAS Array Controller (/w BBU)• Backup = VSS Snapshot/Optional• Fast Recovery = Database Failover
DAS (SATA/Tier2) • HA = DAG (3+ DB copies)
• .11 IOPS/Mailbox• 3.5” 1TB 7.2K SATA/Tier2 Disks• 1 DB = 1 Disk• SAS Array Controller (/w BBU)• Backup = VSS Snapshot/Optional• Fast Recovery = Database Failover
More options to reduce storage cost
E2010 Storage Design FlexibilityExchange Online Archive provides mailbox storage flexibility
One Mailbox per user or twoE2010 optimized for DAS storage, SAN storage is fully supportedIOPS reductions/SATA optimizations enable lower performing storage
E2010 HA architected for DAS (simpler)JBOD* and RAID storage supportE2010 optimized for Tier 2 (SATA) disks, Enterprise disks are fully supportedSSD storage supported but not recommended for mainstream due to high $/GB Storage Groups are gone; Max 100 Databases/ServerMax recommended DB Size = 2TB*Max recommended Folder Item Count = 100K**
*2+ copy E2010 HA only** Assuming no 3rd party applications
E2010 Storage RequirementsStorage Guidance Stand Alone E2010 HA(2 copies) E2010 HA(3+ copies)
Storage Type DAS, SAN (Fibre Channel, iSCSI)
Disk Type SAS, Fibre Channel, SATA/Tier2 , SSD
RAID RAID recommended RAID optional
RAID Type RAID-1/0, RAID-5, RAID-6 JBOD
DB/Log Isolation Best Practice Not required
Windows Disk Type Basic (recommended), Dynamic (supported)
Partition Type GPT (recommended), MBR (supported)
Partition Alignment Windows 2008/R2 Default (1MB)
File System NTFS
NTFS Allocation Unit Size 64KB for both database and log volumes
Encryption Support Outlook Protection Rules, Bitlocker
See Appendix for full details
E2010 HA/JBOD Storage ExampleSingle Site, 3 Node, 3 Copy DAG
DB1 DB1
DB1 DB2 DB3 DB4 DB5 DB6
DB7 DB8 DB9 DB10 DB11 DB12
DB13 DB14 DB15 DB16 DB17 DB18
DB19 DB20 DB21 DB22 DB23 DB24
DB25 DB26 DB27 DB28 DB29 DB30
Legend
Active copy Passive copy Spare Disk
DB1 DB1
DB1 DB2 DB3 DB4 DB5 DB6
DB7 DB8 DB9 DB10 DB11 DB12
DB13 DB14 DB15 DB16 DB17 DB18
DB19 DB20 DB21 DB22 DB23 DB24
DB25 DB26 DB27 DB28 DB29 DB30
DB1 DB1
DB1 DB2 DB3 DB4 DB5 DB6
DB7 DB8 DB9 DB10 DB11 DB12
DB13 DB14 DB15 DB16 DB17 DB18
DB19 DB20 DB21 DB22 DB23 DB24
DB25 DB26 DB27 DB28 DB29 DB30
Mbx Server 1
10,000 Mailboxes
3,333 Active Mailboxes/Server3 Nodes, 3 Copies = double disk failure resiliency
8 Cores32GB RAM
8 Cores32GB RAM
8 Cores32GB RAM 2GB Mailbox Size
.11 IOPS/Mailbox
1TB 7.2k disks (SAS/SATA/Tier2)
Online SparesBattery Backed Caching Array Controller
Heavy Profile: 120 Messages/day
JBOD: 30 Disks/nodeDatabase Availability Group (DAG)
Mbx Server 2 Mbx Server 3
Key Takeaways
Exchange Server 2010..Reduces DB IOPS by +70%...again!Optimizes for large mailboxes (+10GB) and 100K Item countsOptimizes for large/slow/low-cost disks (SATA/Tier2)Makes JBOD/RAID'less storage a viable optionEnables unmatched storage flexibility to reduce costs
question & answer
www.microsoft.com/teched
Sessions On-Demand & Community
http://microsoft.com/technet
Resources for IT Professionals
http://microsoft.com/msdn
Resources for Developers
www.microsoft.com/learningMicrosoft Certification and Training Resources
www.microsoft.com/learning
Microsoft Certification & Training Resources
Resources
Related ContentUNC314 – Information Protection and Control in Microsoft Exchange Server 2010UNC315 – Federation in Microsoft Exchange Server 2010UNC312 – Archiving and Retention in Microsoft Exchange Server 2010UNC320 – Microsoft Exchange Server Outlook Web Access 2010: The Future of Web-Based E-mailUNC317 – Microsoft Exchange Server 2010 Management ToolsUNC318 – Microsoft Exchange Server 2010 Transition and DeploymentUNC313 – High Availability in Microsoft Exchange Server 2010UNC321 – Storage in Microsoft Exchange server 2010UNC324 – What's New in Exchange Web Services in Microsoft Exchange Server 2010UNC319 – Unified Messaging in Microsoft Exchange Server 2010
Call to ActionLearn More!
Related Content at TechEd on “Related Content” SlideAttend in-person or consume post-event at TechEd Online
Check out online learning/training resourceshttp://technet.microsoft.com/exchange/2010 http://technet.microsoft.com/office/ocs
Try It Out!Download the Exchange Server 2010 Beta Evaluation
http://www.microsoft.com/exchange/2010/try-it
Get a 5-Day Trial of Office Communications Server 2007 R2https://r2.uctrial.com/
appendix
IOPS Reductions: Store Schema Elements
Element E2007 E2010
Physical Contiguity (ESE)
Poor physical contiguity of leaf pages. Hence many, small size, IOs (1 for each page)
Excellent physical contiguity of leaf pages. So fewer, large size IOs, spanning N pages (N ≈100)
Logical Contiguity (Store)
Headers for each folder kept in separate table. So many, small size, IOs spread over many tables
Headers for an entire mailbox kept in a single table. Hence fewer, large sized, IOs on a single table
Temporal Contiguity (View)
All views and indexes updated each time a mail is delivered. So many, small size, IOs spread over time
Views and indexes updated only when they are accessed by user. So fewer, large sized, IOs done together
How do you move from random IO to Sequential IO?
IOPS Reduction: Maintain Contiguity Over Time
Mailbox Messages
1. Delivery 2. Random Delete 3. Defragmentation
M1
M2
M3
M4
M5
M6
M7
M8
M9
M10
Mailbox Messages
M1
M3
M5
M7
M10
Mailbox Messages
M1
M2
M3
M4
M5
M6
M7
M8
M9
M10
Contiguous
Contiguous
FragmentedM11
M12
M13
M14
M15
New E2010 behavior…
IOPS Reduction: Write IO Gap Coalescing
Page 1
DirtyPage 2
CleanPage 3
Dirty
Disk
Page 4
CleanPage 5
DirtyE2007 DB Write Behavior
3 Write IO’s
Page 1
DirtyPage 2
CleanPage 3
Dirty
Disk
Page 4
CleanPage 5
DirtyE2010 DB Write Behavior
1 Write IO
DB Cache
DB Cache
Writes spaced out over time
Big IO: How Big is Too Big?
0 128 256 384 512 640 768 896 10240
5
10
15
20
25Random DB IO Latency Based on Size
IO Size (KB)
IO La
tenc
y (m
s)Write
Read
SqlIO Test, 1x 750GB 7.2k SATA, no caching array controller
E2010 Max IO Size = 256KB for Read 384KB for Write
IO Latency increases with IO size
Optimize for SATA/Tier 2 DisksSolution: Smooth DB Write IOThrottle DB writes based on Checkpoint target (QoS)• When Checkpoint Depth equals 1x ->1.24x of Checkpoint target, Limit Max Outstanding DB
writes/LUN to 1• When Checkpoint Depth meets or exceeds 1.25x of Checkpoint target, ratchet up Max
Outstanding DB writes/LUN• The further behind on checkpoint, the more aggressively we raise the Max Outstanding DB
writes/LUN (Maximum = 512/LUN)
Works for both JBOD SATA through RAID10 SAN
20MB Max Checkpoint example
25.526.5
27.528.5
29.530.5
31.532.5
33.534.5
35.536.5
37.538.5
39.540.5
41.542.5
43.50
5
10
15
20
25
30
35
40Max Outstanding DB Writes vs.. Checkpoint Depth
Log Checkpoint Depth (MB)Log Checkpoint Depth (MB)
Max
Out
stan
ding
DB
Wri
tes
JBOD/RAID'less Storage: Lost Flush Detection
What is a lost flush?A DB write IO that the disk subsystem/OS returned as completed did not actually get written to media or was written in the wrong location (aka lost write).
Why are they so bad?Your database may be logically corrupt and you do not know it!
How can they be detected in E2010?Two methods:
1. In Memory Flush Map (Active & Passive): memory overhead of 2 bits/page. Event ID 530 is fired when detected (-1119) and page can be patched.
2. Database Recovery: Event is fired (ID 516: timestamp mismatch, (-567)) and database must be re-seeded.
Mailbox Server
Exchange 2010 High Availability
Evolution of Continuous Replication technology (Database Mobility)Easier than traditional clustering to deploy and manageAllows each database to have 16 replicated copiesProvides full redundancy of Exchange roles on as few as two servers
Simplified Mailbox High Availability and Disaster Recovery with New Unified
Platform
DB1
DB3DB2
DB4DB5
Recover quickly from
disk and database
failures
Mailbox Server
DB1DB2
DB4DB5
DB3
Mailbox Server
DB1DB2
DB4DB5
DB3
Replicate databases to remote datacenter
San Jose New York
Client
DB2
DB3
DB2
DB3
DB4
DB4
DB5
CAS/HUB
Mailbox Server 1
Mailbox Server 2
Mailbox Server 3
Mailbox Server 6
Mailbox Server 4
AD site: Dallas
AD site: San Jose
Mailbox Server 5
DB5
DB2
DB3
DB4
DB5
DB1
DB3
DB5
DB1
DB1DB1
DB1
Database Availability Group (DAG)
E2010 High Availability Architecture
Mailbox Server Node 1
Mailbox Server Node 2
Database Availability Group (DAG)
Page1
Page2
Page3
Mailbox Server Node 3
1. Page corruption detected on DB Copy (e.g. -1018)
2. Passive copy pauses log replay (log copying continues)
3. Passive retrieves the corrupted page # from the active using DB seeding infrastructure
4. Passive copy waits till log file which meets max required generation requirement is copied/inspected, then patches page
DB1-Active
Database
Log
Page1
Page2
Page3
DB1-CopyA
Database
Log
Page1
Page2
Page3
DB1-CopyB
Database
Log
5. Passive resumes log replay
JBOD/RAID'less Storage: Single Page Restore Passive
Exchange 2010 Storage Guidance Stand Alone Database Availability Group: 2 nodes, 2 Database copies Database Availability Group: 3+ nodes, 3+ Database copiesStorage Type Direct Attached Storage (DAS) Supported Supported SupportedStorage Area Network (SAN): iSCSI Supported. Best Practice = Do not share physical
disks backing Exchange data with other applications.
Supported. Best Practice = Do not share physical disks backing Exchange data with other applications.
Supported. Best Practice = Do not share physical disks backing Exchange data with other applications.
Storage Area Network (SAN): Fibre Channel (FC)
Supported. Best Practice = Do not share physical disks backing Exchange data with other applications.
Supported. Best Practice = Do not share physical disks backing Exchange data with other applications. Best Practice = Do not place both database copies on the same physical spindles.
Supported. Best Practice = Do not share physical disks backing Exchange data with other applications. Best Practice = Do not place both database copies on the same physical spindles.
Network Attached Storage (NAS): SMB Not Supported Not Supported Not SupportedPhysical Disk Type SATA Supported, requires battery backed caching array
controller for data integritySupported, requires battery backed caching array controller for data integrity
Supported, requires battery backed caching array controller for data integrity
SAS Supported Supported SupportedFC Supported Supported SupportedSSD (Flash Disk) Supported Supported Supported
Physical Disk Write Caching (enabled) Not Supported Not Supported Not SupportedStorage RAID RAID recommended RAID recommended RAID optional
EDB Volume RAID5/6, RAID10, RAID1 RAID5/6, RAID10, RAID1 JBOD, RAID5/6, RAID10, RAID1Log Volume RAID1, RAID10 RAID1, RAID10 JBOD, RAID1, RAID10Disk Array RAID Stripe Size (kb) 256KB 256KB 256KB
Storage Array Cache Settings 75% Write Cache, 25% Read Cache (with Battery Backed Cache)
75% Write Cache, 25% Read Cache (with Battery Backed Cache)
75% Write Cache, 25% Read Cache (with Battery Backed Cache)
Database/Log file placement Database/Log Isolation Best Practice (for recoverability) = separate
database file (.edb) and logs from same Database on to different volumes backed by different physical disks
Database file (.edb) and logs from same Database can share same volume and same physical disk.
Database file (.edb) and logs from same Database can share same volume and same physical disk. This is a best practice for JBOD/RAID'less storage scenario where one or more volumes store the edb and log files backed by the same physical disk.
Database Files/Volume Based on backup methodology Based on backup methodology RAID = based on backup methodology, JBOD = one DB file/volume is recommended
Log Streams/Volume Based on backup methodology Based on backup methodology RAID = based on backup methodology, JBOD = one log stream/volume is recommended
Windows Disk Type Basic Disk Recommended Recommended RecommendedDynamic Disk Supported Supported Supported
Partition Type GUID Partition Table (GPT) Recommended Recommended RecommendedMaster Boot Record (MBR) Supported Supported Supported
Partition Alignment Windows 2008 Default: 1MB Windows 2008 Default: 1MB Windows 2008 Default: 1MBVolume Path Drive Letter or Mount Point (mount point host
volume must be RAIDed)Drive Letter or Mount Point (mount point host volume must be RAIDed)
Drive Letter or Mount Point (mount point host volume must be RAIDed)
File System NTFS support only NTFS support only NTFS support onlyNTFS Defragmentation Not required, not recommended Not required, not recommended Not required, not recommendedNTFS Allocation Unit Size 64KB for both edb and log volumes 64KB for both edb and log volumes 64KB for both edb and log volumes
NTFS Compression Not Supported for Exchange Database files Not Supported for Exchange Database files Not Supported for Exchange Database files
NTFS Encrypted File System (EFS) Not Supported for Exchange Database files Not Supported for Exchange Database files Not Supported for Exchange Database files
Windows Bitlocker (volume encryption) Supported for all Exchange database and log files Supported for all Exchange database and log files Supported for all Exchange database and log files
Preliminary Storage Guidance: Subject to Change!
Complete an evaluation on CommNet and enter to win!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.