Post on 14-Dec-2015
Exchange Deployment Planning Services
Exchange 2010 High Availability
Exchange 2010 High Availability
The Exchange 2007 High Availability has the following goals: Review of Exchange Server 2007 Availability
solutions Overview of Exchange Server 2010 High
Availability Exchange Server 2010 High Availability
fundamentals Exchange Server 2010 High Availability architecture
scenarios Exchange Server 2010 site resilience
Ideal audience for this workshop Messaging SME Network SME Security SME
Exchange 2010 High Availability
Exchange 2010 High Availability
During this session focus on the following : How will we leverage this functionality
in our organization? What availability and service level
requirements do we have around our messaging solution?
Agenda Review of Exchange Server 2007
Availability solutions Overview of Exchange Server 2010 High
Availability Exchange Server 2010 High Availability
fundamentals Exchange Server 2010 High Availability
architecture scenarios Exchange Server 2010 site resilience
Exchange Server 2007 Single Copy Clustering• Single Copy Cluster (SCC) out-of-box provides little high
availability value− On Store failure, SCC restarts store on the same machine;
no CMS failover− SCC does not automatically recover from storage failures− SCC does not protect your data, your most valuable asset− SCC does not protect against site failures− SCC redundant network is not leveraged by CMS
• Conclusion− SCC only provides protection from server hardware
failures and bluescreens, the relatively easy components to recover
− Supports rolling upgrades without losing redundancy
1. Copy logs
E00.logE0000000012.logE0000000011.log
2. Inspect logs
3. Replay logs
LogLog LogLog
Log shipping to a local disk
LocalFileShare
Log shipping within a cluster
Cluster
Log shipping to a standby server or
cluster
Standby
Database Database
Exchange Server 2007 Continuous Replication
DB1
Client Access Server
CCR #1
Node A
CCR #1
Node B
CCR #2 Node B
CCR #2
Node A
SCR
Outlook (MAPI) client
Windows cluster Windows cluster
OWA, ActiveSync, or Outlook Anywhere
AD site: San Jose
AD site: Dallas
Client Access Server
Standby Server
SCR managed
separately; no GUI
Manual “activation” of
remote mailbox server
Clustering knowledge required
DB2
DB3
DB1
DB2
DB3
DB4
DB5
DB6
DB4
DB5
DB6
Database failure requires server failover
DB4
DB5
DB6Mailbox server can’t co-exist
with other roles
Today’s Exchange Server 2007 HA Solution (CCR + SCR)
Core Architectural Shift
Windows Failover Cluster
Default Cluster Group
• Cluster IP Address• Cluster Name
• Cluster Quorum
Cluster Database
Clustered Mailbox Server (CMS)
• CMS IP Address• CMS Name
• CMS resources (exres.dll)• CMS disk resources
Cluster Networks
Database Availability Group
Core Architectural Shift
Windows Failover Cluster
Default Cluster Group
• Cluster IP Address• Cluster Name
• Cluster Quorum
Cluster Database
Active Manager
• PAM• SAM
DAG Networks
Database Availability Group
Core Architectural Shift
Mailbox Server
Get-
MailboxDatabaseCopyStatus
Primary Active Manager
Move-ActiveMailboxDatabase
Storage
Mailbox Server
Get-MailboxDatabaseCopyStatus
Standby Active Manager
Move-ActiveMailboxDatabase
Storage
Mailbox Server
Get-
MailboxDatabaseCopyStatus
Standby Active Manager
Move-ActiveMailboxDatabase
Storage
Agenda
• Review of Exchange Server 2007 Availability solutions
• Overview of Exchange Server 2010 High Availability
• Exchange Server 2010 High Availability fundamentals
• Exchange Server 2010 High Availability architecture Scenarios
• Exchange Server 2010 site resilience
Exchange Server 2010 HA Goals
• Reduce complexity • Reduce cost • Native solution - no single point of
failure• Improve recovery times• Support larger mailboxes
Make High Availability Exchange deployments mainstream!
•Improved failover granularity•Simplified administration•Incremental deployment•Unification of CCR + SCR•Easy stretching across sites•Up to 16 replicated copies
Easier and cheaper to deploy
Easier and cheaper to manage
Better Service Level Agreements (SLAs)
Reduced storage costs
Larger mailboxes
• Further IO reductions • RAID-less/JBOD support
Exchange Server Improvements Key Benefits
• Online mailbox moves• Improved transport resiliency
Easier and cheaper to manage
Better SLAs
Improved mailbox uptime
More storage flexibility
Better end-to-end availability
Client
DB2DB3
DB2
DB3
DB4
DB4
DB5
Client Access Server
Mailbox
Server 1
Mailbox
Server 2
Mailbox
Server 3
Mailbox Server 6
Mailbox
Server 4
AD site: Dallas
AD site: San Jose
Mailbox
Server 5
DB5
DB2
DB3DB4DB5DB1
DB1DB1
DB1
Exchange Server 2010 High Availability Architecture
Failover managed
within Exchange
Database centric failover
Easy to stretch
across sites
Client Access Server
All clients connect via CAS
serversDB3DB5
DB1
Agenda
• Review of Exchange Server 2007 High Availability solutions
• Overview of Exchange Server 2010 High Availability
• Exchange Server 2010 High Availability fundamentals
• Exchange Server 2010 High Availability architecture scenarios
• Exchange Server 2010 site resilience
Exchange Server 2010 Active Directory Schema Organization
Exchange Administrati
ve Group
Server 1
ServersDatabase
Availability Groups
Databases
DAG 1 Database 1
Database Copy 1
Host Server link / backlink
DAG link / backlink
Exchange Server 2010Active Directory Schema Organization
Exchange Server 2010 HA Fundamentals• Database Availability
Group (DAG)• Server• Database• Database Copy• Active Manager (AM)• RPC Client Access
Service
DAG
copy
copy
AMSVR
copy
copy
AM
SVR
DB DB
RPC CAS
RPC CAS
Exchange Server 2010 HA FundamentalsDatabase Availability Group (DAG)• A group of up to 16 servers hosting a set of
replicated databases• Wraps a Windows Failover Cluster
− Manages servers’ membership in the group− Heartbeats servers, quorum, cluster database
• Defines the boundary of database replication• Defines the boundary of failover/switchover (*over)• Defines boundary for DAG’s Active Manager
Mailbox Server
1
Mailbox Server
2
Mailbox Server
3
Mailbox Server
4
Mailbox Server
16
Exchange Server 2010 HA FundamentalsServer• Unit of membership for a DAG• Hosts the active and passive copies of multiple mailbox
databases• Executes Information Store, CI, Assistants, etc., services on
active mailbox database copies• Executes replication services on passive mailbox database
copies
DB2DB3
DB4
Mailbox Server
1
Mailbox Server
2
Mailbox Server
3
DB1
DB1 DB3
DB4
DB2
Exchange Server 2010 HA FundamentalsServer (Continued)• Provides connection point between Information Store and RPC Client
Access• Very few server-level properties relevant to HA
− Server’s Database Availability Group− Server’s Activation Policy
DB2DB3
DB4
Mailbox Server
1
Mailbox Server
2
Mailbox Server
3
DB1
DB1 DB3
DB4
DB2
RCA*Client Access Server
Exchange Server 2010 HA FundamentalsMailbox Database• Unit of *over• A database has 1 active copy – active copy can
be mounted or dismounted• Maximum # of passive copies == # servers in
DAG – 1 active
DB2DB3
DB4
Mailbox Server
1
Mailbox Server
2
Mailbox Server
3
DB1
DB1 DB3
DB4
DB2 DB1
Exchange Server 2010 HA FundamentalsMailbox Database (Continued)
−~30 seconds database *overs−Server failover/switchover involves moving
all active databases to one or more other servers
−Database names are unique across a forest−Defines properties relevant at the database
level− GUID: a Database’s unique ID− EdbFilePath: path at which copies are located− Servers: list of servers hosting copies
Exchange Server 2010 HA Fundamentals Active/Passive vs. Source/Target
• Availability Terms− Active: Selected to provide
email services to clients− Passive: Available to provide
email services to clients if active fails
• Replication Terms− Source: Provides data for
copying to a separate location− Target: Receives data from the
source
DB1 DB1
Exchange Server 2010 HA FundamentalsMailbox Database CopyDefines properties applicable to an individual database copy
− Copy status: Healthy, Initializing, Failed, Mounted, Dismounted, Disconnected, Suspended, FailedandSuspended, Resynchronizing, Seeding
− CopyQueueLength− ReplayQueueLength
ActiveCopyActivationSuspended
• Exchange-aware resource manager (high availability’s brain)− Runs on every server in the DAG− Manages which copies should be active and
which should be passive− Definitive source of information on where a
database is active or mounted− Provides this information to other Exchange
components (e.g., RPC Client Access and Hub Transport)
− Information stored in cluster database
Exchange Server 2010 HA FundamentalsActive Manager
• Active Directory is still primary source for configuration info
• Active Manager is primary source for changeable state information (such as active and mounted)
• Replication service monitors health of all mounted databases, and monitors ESE for IO errors or failure
Exchange Server 2010 HA FundamentalsActive Manager
Exchange Server 2010 HA FundamentalsActive Manager
• Primary Active Manager (PAM)− Runs on the node that owns the default cluster
group (quorum resource)− Gets topology change notifications− Reacts to server failures− Selects the best database copy on *overs
• Standby Active Manager (SAM)− Runs on every other node in the DAG− Responds to queries from other Exchange
components for which server hosts the active copy of the mailbox database
Exchange Server 2010 HA FundamentalsContinuous Replication
• Continuous replication has the following basic steps:− Database copy seeding of target− Log copying from source to target− Log inspection at target− Log replay into database copy
Exchange Server 2010 HA FundamentalsDatabase Seeding
• There are three ways to seed the target instance:− Automatic Seeding
− Requires 1st log file containing CreateDB record
− Update-MailboxDatabaseCopy cmdlet− Can be performed from active or passive
copies− Manually copy the database
Exchange Server 2010 HA FundamentalsLog Shipping• Log shipping in Exchange Server 2010 leverages
TCP sockets− Supports encryption and compression− Administrator can set TCP port to be used
• Replication service on target notifies the active instance the next log file it expects − Based on last log file which it inspected
• Replication service on source responds by sending the required log file(s)
• Copied log files are placed in the target’s Inspector directory
Exchange Server 2010 HA FundamentalsLog Inspection• The following actions are performed to
verify the log file before replay:− Physical integrity inspection− Header inspection− Move any Exx.log files to ExxOutofDate
folder that exist on target if it was previously a source
• If inspection fails, the file will be recopied and inspected (up to 3 times)
• If the log file passes inspection it is moved into the database copy’s log directory
Exchange Server 2010 HA FundamentalsLog Replay
• Log replay has moved to Information Store• The following validation tests are performed prior
to log replay:− Recalculate the required log generations by inspecting the
database header− Determine the highest generation that is present in the
log directory to ensure that a log file exists− Compare the highest log generation that is present in the
directory to the highest log file that is required− Make sure the logs form the correct sequence− Query the checkpoint file, if one exists
• Replay the log file using a special recovery mode (undo phase is skipped)
Exchange Server 2010 HA FundamentalsLossy Failure Process• In the event of failure, the following steps will occur
for the failed database:− Active Manager will determine the best copy to activate− The Replication service on the target server will attempt
to copy missing log files from the source - ACLL− If successful, then the database will mount with zero data loss− If unsuccessful (lossy failure), then the database will mount based on
the AutoDatabaseMountDial setting
− The mounted database will generate new log files (using the same log generation sequence)
− Transport Dumpster requests will be initiated for the mounted database to recover lost messages
− When original server or database recovers, it will run through divergence detection and perform an incremental reseed or require a full reseed
Exchange Server 2010 HA FundamentalsActive Manager Selection of Active Database Copy
• Active Manager selects the “best” copy to become active when existing active fails
Catalog HealthyCopy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
CopyQueueLength < 10ReplayQueueLength < 50
Catalog CrawlingCopy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
CopyQueueLength < 10ReplayQueueLength < 50
Catalog HealthyCopy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
ReplayQueueLength < 50
Catalog CrawlingCopy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
ReplayQueueLength < 50
5Copy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
ReplayQueueLength < 50
6Catalog HealthyCopy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
CopyQueueLength < 10
7Catalog CrawlingCopy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
CopyQueueLength < 10
8Catalog HealthyCopy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
9Catalog CrawlingCopy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
10Copy status Healthy, DisconnectedAndHealthy,
DisconnectedAndResynchronizing, orSeedingSource
DB1
Exchange Server 2010 HA FundamentalsIncremental Resync• Incremental reseed scenario
− Active DB1 on server1 fails− Passive DB1 on server3 takes over service− Sometime later, failed DB1 on server1 comes back as passive –
contains inconsistent data− Make DB1 on server1 consistent with new active
• Transaction logs of active and failed copy are compared to find divergence point
• Determines from logs the database pages that changed after divergent point
• Copies database pages from active to failed copy, then play new logs, until in-sync
• Replaces Exchange Server 2007’s Lost Log Resilience (LLR)− LLR is set to 1
DB1
Mailbox Server
1
Mailbox Server
2
Mailbox Server
3
DB1X
Exchange Server 2010 HA FundamentalsBackups• Streaming backup APIs for public use have been cut, must use
Volume Shadow Copy Service (VSS) for backups− Backup from any copy of the database/logs− Always choose Passive (or Active) copy− Backup an entire server − Designate a dedicated backup server for a given database
• Restore from any of these backups scenarios
DB2DB3
DB2
DB3
DB1
DB3
DB1 DB1
VSS requestor
DB2
Database Availability Group
Mailbox Server 1
Mailbox Server 2
Mailbox Server 3
Multiple Database Copies Enable Backupless Configurations
• Exchange Server 2010 HA• E-mail archive• Extended/protected dumpster
retention
7-14 day lag copy
X
Database Availability Group
Mailbox Server 1
Mailbox Server 2
Mailbox Server 3
DB1DB2DB3
DB1DB2DB3
DB1DB2DB3
Site/server/disk failureArchiving/complianceRecover deleted items
Agenda Review of Exchange Server 2007
Availability solutions Overview of Exchange Server 2010 High
Availability Exchange Server 2010 High Availability
fundamentals Exchange Server 2010 High Availability
architecture scenarios Exchange Server 2010 site resilience
AD: Dublin
CCR CCR
CCR CCR
DAG
DAG
DAG
FileShare
FileShare
FileShare
FileShare
DAG DA
G
FileShare
Not an upgrade path – A design path
High Availability architect scenariosCCR Design -> DAG Design
Single Site
3 HA Copies
Database Availability Group (DAG)
DB1 DB2 DB3
DB5 DB6
DB1 DB2 DB3
DB4 DB5 DB6
DB1 DB2 DB3
DB4 DB5 DB6DB4
MailboxServer 1
MailboxServer 2
MailboxServer 3
3 Nodes
X
CAS NLB Farm
AD: Dublin
XJBOD -> 3 physical Copies
2 servers out -> manual activation of server 3
In 3 server DAG, quorum is lostDAGs with more servers sustain more failures – greater resiliency
High Availability architect scenariosDouble Resilience – Maintenance + DB Failure
CAS/HUB/
MAILBOX 1
CAS/HUB/
MAILBOX 2
Member servers of DAG can host other server roles
Hardware Load Balancer
DB1
DB2
DB3
DB2
DB1
DB2
DB3
2 server DAGs, with server roles combined or not, should use RAID
High Availability architect scenariosBranch Office or Smaller Deployment
Agenda
• Review of Exchange Server 2007 High Availability solutions
• Overview of Exchange Server 2010 High Availability
• Exchange Server 2010 High Availability fundamentals
• Exchange Server 2010 High Availability architecture scenarios
• Exchange Server 2010 site resilience
Exchange Server 2010 *over Cases• Within a datacenter
− Database *over− Server *over
• Between datacenters− Single database *over− Server *over
• Datacenter failover (which is really a switchover)
Single DB Cross- Datacenter *Over− Database mounted in another datacenter
and another Active Directory site− Serviced by “new” Hub Transport servers − “Different OwningServer” – for routing
− Transport dumpster re-delivery now from both Active Directory sites
− Serviced by “new” CAS− “Different CAS URL” – for protocol access− Outlook Web Access (OWA) now re-directs
connection to second CAS farm− Other protocols proxy or redirect (varies)
Key
Prim
ary
Dat
acen
ter Secondary D
atacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG
Outlook 2010Outlook 2007
Active
Passive
Outlook 2003
Cross-Site DB Failover (Direct Connect)
RPCClientAccessServer = CAS-PRI
Key
Prim
ary
Dat
acen
ter Secondary D
atacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG
Outlook 2010Outlook 2007
Active
Passive
Outlook 2003
Cross-Site DB Failover (Redirect)
RPCClientAccessServer = CAS-SEC
Autodiscover detects profile change and
requires restart of Client
Outlook 2003 fails to connect due to CAS-PRI
failure
Connection fails due to CAS-PRI
failure. Autodiscover
detects profile change and
requires restart of Client
Datacenter Failover− Customers can evolve to site resilience− Standalone local redundancy site
resilience− Consider name space design at first
deployment− Keep extending the DAG!− Monitoring and many other
concepts/skills just re-applied− Normal administration remains
unchanged− Disaster recovery not HA event
Split Brain Management• Two datacenter *overs have a risk of split brain• Primary datacenter power outage is classic
example• Exchange Server 2010 datacenter failovers
maintain DAG membership but shrink cluster membership to create a new, “available topology” in the standby datacenter
• Exchange Server 2010 provides a safe answer with “datacenter activation coordination” (DAC) mode− Requires a DAG with three nodes− Requires activation in partial datacenter failure
cases is “done right” − Mailbox servers must be “stopped” or powered off
− Implements a “Mommy may I protocol” before active manager mounts databases
Split Brain Management (Cont’d)• If DAC is not enabled, the DAG will not
restart and mount databases until a majority of servers are restored
• If DAC is enabled, the “Mommy May I Protocol” is used to coordinate with Active Managers in DAG to determine state and recoverability
• There are several requirements that must be satisfied to prevent split brain between datacenters after datacenter failover
DAG1DAG1
172.17.x.x “Private” Network
172.16.x.x “Public” Network
172.19.x.x “Private” Network
172.18.x.x “Public” Network
2.2.x.x Perimeter Network
AD Site Redmond AD Site Bel Air
Edge-BProxy-B2.1.x.x Perimeter Network
Edge-A Proxy-A
MBX-B-3 MBX-B-4MBX-A-1 MBX-A-2
DC-A HT-A CAS-A CAS-B HT-B DC-B
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
Failure Scenario: Database Failure1. MBX-A-1 DB1 fails2. Automatic failover to MBX-A-
23. MBX-A-1 DB1 is fixed and
becomes a copy
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
Failure Scenario: Server Failure1. MBX-A-1 fails2. Automatic failover to MBX-A-
23. MBX-A-1 is fixed
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
Failure Scenario: Data Center Failure1. Primary data center fails2. Adjust DNS records for SMTP and HTTPS access and adjust CAS configuration (if necessary)3. Run Stop-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite Redmond –ConfigurationOnly (in
both data centers)4. Restore-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite “Bel Air” –
AlternateFileShareWitnessShare \\ht-b\fsw5. Databases mount (no activation block scenario)
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
DB1 DB2
DB3 DB4
Legend Active DatabaseDatabase CopyUnhealthy? DatabaseContoso.com (MX
Record)Autodiscover.contoso.comMail.contoso.comLoad Balance Array Records
Outlook 2007/14 (MBX on DB1)
Recovering Primary Data Center1. Verify primary data center is capable of hosting service2. Add primary data center servers back to DAG: Start-DatabaseAvailabilityGroup DAG1 –ActiveDirectorySite Redmond3. Reconfigure DAG to use File Share Witness in primary data center: Set-DatabaseAvailabilityGroup DAG1 –
FileShareWitnessShare \\ht-a\fsw4. Reseed data or allow replication to occur and update copies in primary data center 5. Schedule downtime for the mailbox databases and dismount them6. Change MX records and HTTP access back to primary data center7. Move databases back to primary data center: Move-ActiveMailboxDatabase DB1 –ActivateOnServer MBX-A-18. Mount databases in primary data center
DB1 DB2
DB3 DB4
End of Exchange 2010 High Availability Module
For More Information
• Exchange Server Tech Centerhttp://technet.microsoft.com/en-us/exchange/default.aspx
• Planning serviceshttp://technet.microsoft.com/en-us/library/cc261834.aspx
• Microsoft IT Showcase Webcasts http://www.microsoft.com/howmicrosoftdoesitwebcasts
• Microsoft TechNet http://www.microsoft.com/technet/itshowcase
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.