Exchange2003Design_ArchitectureTWP

download Exchange2003Design_ArchitectureTWP

of 47

Transcript of Exchange2003Design_ArchitectureTWP

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    1/47

    Exchange Server 2003 Designand Architecture at Microsoft

    Technical White Paper Published: August 2003

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    2/47

    CONTENTS

    Executive Summary

    ................................................................................................................................................ 5

    Introduction ............. ............. .............. .............. ............. .............. .............. ............. .............. . 6Overview of Current Network Infrastructure 6

    Overview of Current Messaging Infrastructure 7

    Exc hange 2000 Legacy Architecture .............. ............. .............. .............. .............. ....... ..... . 11Overview of Exchange 2000 Infrastructure 11

    Reasons for Microsoft IT to Upgrade ............. .............. ............. .............. ......... ..... ...... ...... . 14Site and Server Consolidation 14

    Availability/Reliability/Manageability Enhancements 14

    Improved Cluster Support 16

    Improved Security 16

    Improved Recoverability Technologies to Better Meet SLA Requirements 18

    Mobility Features/Enhancements 18

    Office 2003 Integration 20

    Exc hange 2003 Architecture Design Decisions ............. .............. .............. ....... ..... ..... ...... 23Topology 23

    Mobility Design and Configuration 23

    Server Design and Configuration 25

    Storage Design and Configuration 27

    Backup and Recovery 31

    Management and Monitoring using Microsoft Operations Manager (MOM) 2000 37

    Bes t Practices and Lessons Learned ............. .............. .............. .............. ............. ........... .. 40Topology Best Practices 40

    Server Configuration Best Practices 41

    Storage Design Best Practices 42

    Management and Monitoring Best Practices 45

    Operational Best Practices 46

    Conclusion ............ .............. ............. .............. .............. .............. ............. .............. ......... ...... . 47

    For More Information ............ .............. .............. .............. ............. .............. ......... ...... ...... ..... 48

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    3/47

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    4/47

    EXECUTIVE S UMMARY The Microsoft IT group recently deployed Microsoft Exchange Server 2003, the latestedition of the companys industry-leading enterprise messaging application. Microsoft IT notonly serves the company by running the IT utility for its myriad employees and locations, butalso serves as the first and best customer for the various enterprise product development

    groups at Microsoft, deploying Microsoft software within the company before it is available tooutside customers.

    The migration from Microsoft Exchange 2000 Server to Microsoft Exchange Server 2003 ledto some significant changes in the messaging architecture at Microsoft. Microsoft IT hasmoved toward a fully clustered, mailbox server environment. Each of these server clustersare connected to one or more Storage Area Network (SAN) enclosures for its data storage.The use of clustering technology has improved reliability, increased availability, and improvedthe process of performing rolling upgrades.

    The benefits of deploying Exchange 2003, especially when combined with the benefitsderived from the deployments of both Microsoft Windows Server 2003 and MicrosoftOffice 2003, have enabled Microsoft to consolidate its messaging infrastructure. Microsoft IT

    has begun implementing its plans to consolidate 113 mailbox servers in 75 locationsworldwide to just 38 mailbox servers in seven locations. Exchange 2003 also supports allmobility messaging services, such as Outlook Web Access (OWA), Outlook Mobile Access(OMA), and Exchange ActiveSync (EAS), on the same server, enabling Microsoft IT toadditionally consolidate its worldwide front-end server infrastructure.

    The messaging data storage infrastructure has also been updated. Data storage, once acombination of direct attached Small Computer System Interface (SCSI) storage arrays atremote locations and SAN solutions in the Redmond, Washington headquarters data center have been replaced by SANs at all locations. These changes have enabled Microsoft IT toincrease the number of mailboxes per server and thoroughly enhanced the performance andcapability of backup and recovery solutions as well.

    As of this writing, Microsoft IT has significantly reduced administrative overhead for Exchange, improved system performance and service availability, and improved its ownability to meet its Service Level Agreement (SLA) obligations. Those benefits should becomeeven more dramatic as the company moves closer to its consolidation goal.

    Note: For security reasons, the sample names of forests, domains, internal resources, and organizations used in this paper do not represent real resource names used within Microsoft and are for illustration purposes only.

    Exchange 2003 Deployment and Architecture Page 4

    SituationThe messaging infrastructure atMicrosoft was quite varied. Therewere over 100 mailbox serversrunning in 75 locations worldwide,using a variety of hardwareconfigurations that were notscalable.

    SolutionMicrosoft IT upgraded itsmessaging infrastructure worldwideto use Exchange Server 2003 onclustered Windows Server 2003servers attached to Storage AreaNetwork (SAN) systems.

    Benefits Consolidation . The use of

    Windows Server 2003simproved clustering technology

    enabled Microsoft IT toimplement a major mailboxserver consolidation.

    Mobility Improvements.Exchange 2003 integratesOutlook Mobile Access andExchange ActiveSync withOutlook Web Access to improvemobile messaging.

    Improved SLA Performance .The use of SANs enabledMicrosoft IT to increase thenumber of mailboxes per server and enhance Microsoft ITsability to backup and restoremailbox data in a timely manner.

    Products & Technologies Microsoft Windows Server

    2003 Microsoft Exchange Server 2003 Microsoft Office 2003 Microsoft Office Outlook 2003 Microsoft Operations Manager Storage Area Networks

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    5/47

    INTRODUCTIONMicrosoft Exchange Server 2003 represents an important, continuing investment inenterprise technology for Microsoft. Exchange 2003 offers improvements required byenterprise messaging and collaboration customers. Many of the largest companies in theworld run their messaging systems on Microsoft Exchange, including Microsoft.

    The purpose of this document is to provide an overview of the architecture and designdecisions made during the upgrade of Exchange Server 2003 at Microsoft. The paper focuses on the hardware selection and configuration aspects of the project. It also includesdiscussions on the key technology wins and best practices that emerged from the upgrade.Since Microsoft IT is a leading edge implementer of Microsoft technologies and products, theorganization brings a unique set of requirements as well as innovative approaches to meetingthe needs of its customers. This paper describes these requirements and approaches, aswell as the way they affected design decisions for the deployment. The intended audience for this white paper includes technical decision makers, system architects, IT implementers, andmessaging system managers.

    Microsoft IT based its mission for migrating from Exchange 2000 to Exchange 2003 on

    achieving several objectives: To test and improve the product before Microsoft offered it to its customers. To consolidate Exchange server sites worldwide to reduce server maintenance and

    administration costs and workload. To simplify the messaging infrastructure based on standardized server and storage

    hardware for all deployment locations. To improve the ability of Microsoft IT to meet its SLA obligations for data backup and

    restore. To significantly improve the end-user experience with messaging services at Microsoft.

    Microsoft IT met all these objectives when it deployed Exchange 2003.

    Overview of Current Network InfrastructureWith all of the beta-level and test version software used in its production environment, theMicrosoft corporate network is the worlds largest experimental computer network. Thenetwork is a confederation of functional backbones, spanning the globe. Each backbone isdefined on regional boundaries with connectivity focused on the Main corporate campuslocated in the Puget Sound Metropolitan Area.

    The network is architected following a multi-domain routing model. It is divided into four regional networks, with each network functioning as a single Open Shortest Path First(OSPF) routing and addressing domain. The four regions cover the following areas: 1. thePuget Sound metropolitan area in western Washington State; 2. Europe, Africa, and theMiddle East; 3. Japan, the Pacific Rim, and the South Pacific, and 4. the remainder of NorthAmerica and South America.

    Each regional network consists of a backbone area (Area 0) and multiple areas to ensurescalability of each regional network. External Border Gateway Protocol (EBGP) is used toexchange routes between the regional networks to ensure the scalability of the network as awhole.

    Exchange 2003 Deployment and Architecture Page 5

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    6/47

    The Puget Sound Metropolitan Area Network (MAN) supports the bulk of data traffic on theglobal enterprise network providing gigabit rate connectivity between buildings and the maindatacenters located in the area. The current campus is comprised of 70 separate buildingsand two datacenters with a network infrastructure providing access to corporate resources,developer lab networks, and Internet connectivity to any location within the campus.

    This network relies on Gigabit Ethernet and Packet over Synchronous Optical Network(SONET), using privately owned or leased Dark Fiber as the transport medium. In the metroarea, efficient use of limited fiber resources is realized by leveraging Wave DivisionMultiplexing (WDM) technologies to provision multiple circuits across a single physical link.

    The available network bandwidth is significant for applications like Exchange Server 2003and site-to-site connectivity. As of June 2003, the network had grown to encompass:

    Three enterprise data centers, nineteen regional data centers worldwide 310 sites in approximately 230 cities in 77 countries The largest wireless LAN (802.1x EAP-TLS) in the world More than 24,000 wireless devices

    More than 4,000 wireless access points More than 250 wide area network (WAN) circuits More than 200 WAN sites in more than 70 countries More than 3,300 IP subnets More than 2,000 routers More than 2,600 network layer 2 switches More than 275 ATM switches More than 10,000 world wide servers More than 350,000 LAN ports

    Overview of Current Messaging InfrastructureManaging the complex messaging infrastructure at Microsoft is a team effort that involvesmany different groups within Microsoft IT. Organizationally, Microsoft IT is comprised of morethan 2,500 staff members that are responsible for operations spanning more than 400 ITlocations worldwide. In addition to providing the IT utility for the company, Microsoft IT playsa key role in helping Microsoft meet its main business objective of software development andmarketing. As the first and best customer of Microsoft, Microsoft IT serves as an earlyadopter of new Microsoft software, such as Windows Server 2003, Microsoft Office 2003, andExchange Server 2003. The result of this process is known in the industry as eating your own dog food.

    In the dog food messaging environment of Microsoft IT, servers regularly receive software

    patches, operating system test releases and upgrades, Exchange server test releases andupgrades, and more. Each Exchange server is touched by Microsoft IT for these softwareupgrades on an average of two times each month. The changes to software are implementedto test new scenarios, meet specific requirements, and continually run the latest applicationconcepts through real world, enterprise-level testing. The rate of change is very high inMicrosoft IT.

    Exchange 2003 Deployment and Architecture Page 6

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    7/47

    Microsoft employees place a significant load on the messaging infrastructure. The averageemployee at Microsoft possesses three computers, typically all of which are used tosynchronize with Exchange. In addition, a significant portion of that population also carriesPocket PC and Smartphone devices that also synchronize with Exchange. The averageRemote Procedure Call (RPC) operations per second (a measurement of work) at Microsoftis significantly higher than at any other company known to Microsoft IT. Microsoft often workswith customers and partners to benchmark their messaging infrastructure. The workloadmanaged by the Exchange servers at Microsoft is typically more than double than the loadmeasured at these companies.

    At the time of this writing, the messaging environment at Microsoft consists of more than 200servers, including 190 Exchange 2003 servers (113 of which are mailbox servers) in 75locations worldwide, including servers in additional cross-forest test environments. Thisenvironment supports:

    Global mail flow of 6,000,000 messages per day, with 2,500,000 average Internet e-mailmessages per day, 70 percent of which is filtered out as either unwanted spam e-mail,virus-infected, or to invalid e-mail addresses. Comparing bytes over the wire, the sizeratio of blocked message content versus accepted message content received atMicrosoft is 40:1. The average size of a typical e-mail message is 44 KB.

    Approximately 85,000 mailboxes, each being increased from a 100 MB to 200 MB limit.Average 100 MB mailbox was only 44 MB in size.

    More than 85,500 distribution groups. More than 230,000 unique public folders managed on public folder servers.

    The Microsoft IT server infrastructure includes:

    Corporate standard client configuration comprised of Windows XP Professional andMicrosoft Office Outlook 2003.

    Legacy, stand-alone mailbox server configurations of 500, 1,000, or 1,500 mailboxes onstand-alone servers. Stand-alone servers are being replaced by clustered SAN solutionsworldwide and have been scaled per server to support 2,700 user mailboxes in regionallocations and 4,000 user mailboxes in the headquarters data center.

    One centrally located support organization in headquarters supports all Exchangeservers worldwide.

    In addition to the Main corporate Exchange Active Directory forest, three additionalforests are used to host Exchange mailbox servers at Microsoft:

    A Level A Test forest dedicated that runs development and test code for Exchange,operating in a frequently changing server software environment.

    A specialized Level B Test forest, serving as a limited-use production environment usedby one product division that hosts a limited number of user mailboxes. Specializedhardware configurations and test scenarios can be run in this environment. Level B Testuses a two-node server cluster connected to a SAN scaled to support 5,000 user mailboxes.

    A legacy test environment forest that is used for testing Windows server operatingsystem versions one version back from the currently released version (specificallyWindows 2000 Service Pack-specific testing) with Exchange.

    Exchange 2003 Deployment and Architecture Page 7

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    8/47

    Note: Microsoft IT uses both Level A Test and Level B Test forests to test cross-forest behavior and support with the Main Microsoft corporate production forest.

    The Microsoft IT service levels include:

    The global service availability Service Level Agreement (SLA) goal in the Main corporate

    forest, calculated as the availability of mailbox databases per minute (including bothplanned and unplanned outages), was 99.9 percent for stand-alone server designs. Thiswas increased to 99.99 percent for the new clustered server designs used withExchange 2003.

    Worldwide e-mail delivery in less than 90 seconds, 95 percent of the time. Backup and restore operation SLA of less than one hour per database.

    Note: For security reasons, the sample names of forests, domains, internal resources,and organizations used in this paper are fictitious. They do not represent real resourcenames used within Microsoft and they are in this document for illustration purposes only.

    Sites and Locations

    Following the lead of the Exchange 2000 deployment, Microsoft IT continued the strategy of deploying Exchange servers in dedicated roles. Table 1 shows the distribution of Exchange 2003 servers by server role. Microsoft IT grouped the Exchange 2003 servers into37 Exchange routing groups that were interconnected with 79 site connectors.

    Table 1. Exchange 2003 Server Distribution by Server Role at Microsoft

    Server Role Exchange 2000 Exchange 2003 (post-consolidation goal *)

    Mailbox 113 38

    Public Folder 20 11

    Messaging Hub 12 7 **Instant Messaging 4 0 ***

    Internet Gateway 22 18

    Dedicated Free/Busy 6 0 ****

    Front-End ***** 14 12

    Antivirus 9 7

    * The mailbox server consolidation project is slated to be completed as of the end of the calendar year 2003.** Microsoft IT will set up seven messaging hubs and four additional dual-purpose servers thatwill provide messaging hub services.

    *** Exchange Instant Messaging servers will be eliminated as the messaging service is migratedto Windows Real Time Communications (WinRTC) servers.**** All of the Free/Busy server services will be provided by existing Public Folder servers.Microsoft IT will not set up any dedicated Free/Busy servers at Microsoft.***** Front-End servers were consolidated with the deployment of Exchange 2003 since thetechnology formerly included in Mobile Information Server (MIS) 2002 product was added intoExchange 2003. To increase system availability, each Exchange 2003 front-end server deployment site was configured with a pair of load-balanced servers.

    Exchange 2003 Deployment and Architecture Page 8

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    9/47

    Routing Group and Administrative Group Structure

    In all Exchange deployments prior to Exchange 2000 (including versions 4.0, 5.0, and 5.5),Microsoft IT grouped Exchange servers into sites based on the network topology. For Exchange 5.5, Microsoft IT designed the environment to strike a balance between the needfor large sites and the limitations of network bandwidth within those sites because of directory

    and public folder replication and message routing traffic.Since the release of Exchange 2000 on Windows 2000, the limits and boundaries imposedby the Exchange 5.5 model were no longer a concern. The ability to place servers in routinggroups independent of their administration group membership allowed Microsoft IT tooptimize the routing topology without losing the advantages of large administrative groups.

    Directory replication is now a function of Active Directory and is an operating system-levelissue that is no longer a key concern of the Exchange deployment. Since routing groups andadministrative groups need not be the same (as was the case in Exchange 5.5 and earlier versions), the Microsoft IT Messaging operations staff is free to place Exchange 2003 serversinto groups that match their administrative and operational structure, and into routing groupsthat match the WAN topology. This leaves directory replication concerns to another Microsoft

    IT team specifically focused in that area. As of this writing Microsoft IT maintains 31Exchange Server 2003 routing groups and 11 administration groups.

    Exchange 2003 Deployment and Architecture Page 9

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    10/47

    EXCHANGE 2000 L EGACY ARCHITECTUREMicrosoft IT began its deployment of Exchange 2003 when the product was still in an earlybeta version. To fully grasp the scope of this project, let us review the previous messaginginfrastructure under Exchange 2000, the compelling reasons why Microsoft IT had to upgradeto Exchange 2003, and what Microsoft IT did to make the upgrade a success. Various

    challenges and discoveries made by Microsoft IT during this experience are included toprovide some guidance and considerations as you plan your Exchange 2003 deployment.

    Overview of Exchange 2000 InfrastructureThe Microsoft Exchange Server platform is the fastest selling Microsoft server product inhistory. Since 1996, when Exchange 4.0 was released, Exchange Server has sold more than50 million seats. Table 2 provides an overview of the evolution of the internal deployment of Exchange Server at Microsoft since 1996 when Microsoft first released Exchange Server.

    Table 2. The Evolution of Exchange Server Deployment at Microsoft

    Exchange4.0

    Exchange5.0

    Exchange 5.5

    Exchange 2000

    Exchange2003

    Mailboxes/Server 305 305 1,024 3,000 4,000

    Mailbox Size/User 50 MB 50 MB 50 MB 100 MB 200 MB

    Restore Time/Database ~12 hours ~12 Hours ~8 Hours ~1 Hour ~25 minutes *

    Total number of Mailboxes ~32,000 ~40,000 ~50,000 ~71,000 ~85,000

    * It takes 25 minutes to restore a database from backup disks.

    Legacy Server and Storage Design

    Microsoft IT used stand-alone servers in both the headquarters data center and in all regionaldeployments. The servers were categorized into four basic mailbox server configurations asshown in Table 3.

    Table 3. Microsoft Microsoft IT Exchange 2000 Server Configurations

    Exchange 2000 Server Configuration Mailboxes

    Small Configuration Regional Mailbox Server 500

    Medium Configuration Regional Mailbox Server 1,000

    Large Configuration Regional Mailbox Server 1,500

    Data Center Configuration Mailbox Server 3,000

    The storage design varied depending upon the requirements of each server configuration. All

    Exchange 2000 mailbox servers supported 100 MB mailboxes. The regional server configurations used direct attached SCSI storage disk arrays that were backed up over the100 Mbps LAN. The data center configuration servers used three SAN arrays, each onecomprising one SG. They were backed up over the Gigabit LAN.

    Microsoft IT used best practice guidelines when designing their original Exchange serverswith consideration towards maximizing system performance and availability with both theserver and storage hardware. To optimize the disk input/output (I/O), each volume of an SG

    Exchange 2003 Deployment and Architecture Page 10

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    11/47

    was designated as a Logical Unit Number (LUN). Since each LUN was assigned a driveletter, each server, hosting three SGs comprised of three LUNs each, used nine drive letters.

    Microsoft IT configured each SG to maintain three separate LUNS. The mailbox data LUNusing 24 18-GB disks and the Log LUN using six 18-GB disks were both configured using astriped mirror configuration, known as Redundant Array of Independent Disks (RAID)-10. The

    SAN also maintained a dedicated backup LUN utilizing 12 36-GB disks in a RAID-5configuration. This LUN was used to support two days of online, disk-to-disk backupretention.

    Each SG supported five databases, and each database supported 200 mailboxes, meaningthat they could support up to 1,000 mailboxes per SG and 3,000 mailboxes per server.

    Performance, Scalability, and Supportability Challenges

    Exchange 2000 was a major upgrade from previous versions of Exchange. However, aspowerful as Exchange 2000 was, Microsoft IT still had to work around some limitations.

    Number of Servers to Manage Too High

    Due to an inability to consolidate servers and sites effectively, the number of sites with

    servers drove support costs significantly higher and added complexity into the messagingenvironment. Some of the more common cost factors associated with the distributedenvironment included:

    More systems to backup Additional maintenance of backup systems at larger number of sites More personnel added to administer backup processes Greater power and cooling resources required at additional sites More onsite support staff added for hardware maintenance at multiple sites

    From a complexity perspective, the larger number of systems meant more moving parts in acomplex machine; i.e. the more backup jobs required, even with the same success rate,

    means a higher number of failures to troubleshoot and resolve. The planned 90 percentreduction in the number of sites with servers dramatically reduces the number of movingparts in the messaging machine, thereby reducing the exposure to failure on a number of fronts.

    Recoverability of Databases within Service Level Agreement (SLA) Time Difficult

    Even small efforts to consolidate resulted in higher scaling on servers in a number of sites.As the number of mailboxes on a server continued to increase with scalability improvementsin the product, database sizes grew as well. More significantly, the initiative to increase themaximum mailbox size from 100 MB mailboxes to 200 MB mailboxes promised an immediatedoubling in the size of databases.

    Since Exchange 2000 does not offer support for new recovery options such as RecoveryStorage Group (RSG) functionality or Volume Shadow Copy Service (VSS), a databaseoutage due to corruption on an Exchange 2000 Server meant that the process of databaserestoration would result in an extended outage. In many sites, backups were managedacross multiple computers in a datacenter, which resulted in backups and restores occurringover the 100 MB LAN, for which restore times averaged, at best, 16 GB per hour. Theoriginal restore SLA was full database restore in one hour, a goal that was quickly becomingunattainable.

    Exchange 2003 Deployment and Architecture Page 11

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    12/47

    Cluster Scalability Limitations

    Windows 2000 Advanced Server supported two-node clusters and Windows 2000 Datacenter Server supported four-node clusters. With Exchange 2000 running on Windows 2000Advanced Server, for an optimized configuration, Microsoft IT needed to have multiple driveletter volumes associated with each SG. There were also additional drive letters used in the

    server configuration, such as the Simple Mail Transfer Protocol (SMTP) drive (a dedicatedinbound/outbound queue device). As a result, each virtual Exchange server within the cluster,after accounting for the collective SGs and the SMTP drive, used ten extended drive letters.This does not account for the required, reserved drive letters used by the server node itself,such as for the floppy disk, operating system volumes, and a CD drive. Microsoft IT couldonly use two servers in a cluster before it exhausted the supply of available letters assignableto disk volumes. The lack of available drive letters prevented Microsoft IT from addingadditional instances of Exchange servers into a clustered environment.

    Backup Infrastructure Inflexible

    Microsoft IT processed a single-stage backup for regional servers. The regional servers usedthe 100 Mbps LAN to perform a direct, disk-to-tape backup. In Redmond, servers performed

    a two-stage backup process: first disk-to-disk within the SAN, and then disk-to-tape. Toensure that the backup process completed during non-business hours, Microsoft IT neededto deploy Gigabit Ethernet network adapters in each Exchange server to ensure that theycould get the throughput necessary to push the data across the LAN and onto tape.

    Data restoration required the creation of a temporary restoration server to serve as a stagingserver for retrieving data from tape. Microsoft IT learned that it in addition to the time it took torestore the data, before that process could start, a tape drive had to read and seek thestarting point of that particular database on a tape. This process often entailed a wait of 90minutes or more before any data actually transferred to disk. The typical throughput for datarestoration (once data began to flow) on the Microsoft IT 100 Mbps network wasapproximately 300-350 MB per minute. With a selective restoration of a sample 15 GBdatabase, the total time needed to complete the job was often more than two hours far inexcess of the SLA.

    In the end, Microsoft IT based its entire architecture of Exchange 2000 on the technicalrequirements for meeting backup and restore efforts within the allotted SLA time window.

    Exchange 2003 Deployment and Architecture Page 12

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    13/47

    R EASONS FOR M ICROSOFT IT TO U PGRADEMicrosoft IT had many compelling reasons to upgrade to Exchange 2003. Of course, in itsspecial role as a group running Microsoft product group dog food software, Microsoft IT wascommitted to deploying Exchange 2003. This deployment was an effort to improve theproduct with real world, enterprise experience and feedback, long before any customers

    would receive the product.In addition, Exchange 2003 resolved the Exchange 2000 challenges for Microsoft IT asdescribed earlier. The deployment of Exchange 2003 enabled Microsoft IT to improve serviceto its customers and to reduce operations requirements. Microsoft realized the followingbusiness benefits:

    Reduced number of servers Improved server availability, reliability, and manageability Improved clustering support Improved security Improved data backup and recovery Improved support for mobile users Improved integration with Office 2003

    Site and Server ConsolidationAs of this writing, with the deployment of Exchange 2003 completed, Microsoft IT is in theprocess of implementing a long-planned consolidation of regional mailbox servers andlocations. Microsoft IT had 113 mailbox servers in 75 locations around the world. The endgoal of the consolidation plan is to reduce the number of locations by 90 percent, down toseven worldwide, using 38 clustered Exchange virtual mailbox servers. This level of server reduction will significantly reduce the administrative workload required of the messaginginfrastructure in the Microsoft IT group.

    Normally an increased number of mailboxes per server and a greater amount of data per SGwould present an increased risk in the event of failure. Indeed, Microsoft IT measuresdatabase service availability as a factor of downtime multiplied by the number of databasesaffected. For example, a one-minute outage affecting a single SG of five databases on aserver containing three SGs (containing 15 databases) is measured as five minutes of downtime. In addition, Microsoft IT studied its downtime incidents and learned that itsplanned downtime exceeded its unplanned downtime by a factor of 6:1.

    Despite the fact that the number of mailboxes per server is growing, and that mailboxes aredoubling in size, the site and server consolidation project is expected to improve MicrosoftITs overall availability as well as its backup and restore performance SLAs. It is alsoexpected to reduce the Microsoft IT server management workload significantly, thereby

    reducing costs.For more information about Microsoft ITs Exchange Server 2003 site consolidation plan, seethe iT Showcase technical white paper titled, Exchange 2003 Site Consolidation athttp://www.microsoft.com/technet/itshowcase .

    Availability/Reliability/Manageability EnhancementsExchange 2003 offers a variety of enhancements that make it a compelling upgrade.

    Exchange 2003 Deployment and Architecture Page 13

    http://www.microsoft.com/technet/itshowcasehttp://www.microsoft.com/technet/itshowcasehttp://www.microsoft.com/technet/itshowcase
  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    14/47

    Virtual Memory Management

    The virtual memory improvements to Exchange 2003 reduce memory fragmentation andincrease server availability. Specifically, Exchange is much more efficient in the way it reusesblocks of virtual memory. These design improvements reduce fragmentation and increaseavailability for higher-end servers that have a large number of mailboxes.

    Virtual memory management for clustered Exchange servers is also improved. InExchange 2003, when an Exchange virtual server is either moved manually or failed over toanother node, the MSExchangeIS service on that node is stopped. Then, when an Exchangevirtual server is moved or failed back to that node, a new MSExchangeIS service is startedand, consequently, a fresh block of virtual memory is allocated to the service.

    Exchange System Manager (ESM)

    Administrator functionality using ESM has been enhanced in Exchange 2003 with these keyupdates:

    Improved method for moving mailboxes. The Exchange Task Wizard now allows youto select as many mailboxes as you want and then, using the task scheduler, toschedule the move to occur at some point in the future. You can also use the scheduler to cancel any unfinished moves at a selected time. Using the wizards multi-threadingcapabilities, you can move up to four mailboxes simultaneously.

    Improved Public Folder interfaces. To make public folders easier to manage,Exchange 2003 includes several new public folder interfaces in the form of tabs.

    The Content tab displays the contents of a public folder in Exchange System Manager. The Find tab enables searches for public folders within the selected public folder or

    public folder hierarchy. A variety of search criteria can be specified, such as the folder name or age. This tab is available at the top-level hierarchy level as well as the folder level.

    The Status tab displays the status of a public folder, including information about servers

    that have a replica of the folder and the number of items in the folder. The Replication tab displays replication information about the folder. New Mailbox Recovery Center. Using the new Mailbox Recovery Center, you can

    simultaneously perform recovery or export operations on multiple disconnectedmailboxes.

    Enhanced Queue Viewer. The Queue Viewer improves the monitoring of messagequeues. Enhancements include:

    The X.400 and STMP queues are displayed in Queue Viewer, rather than from their respective protocol nodes.

    The Disable Outbound Mail option allows you to disable outbound mail from all SMTPqueues.

    The refresh rate of the queues can be set using the Settings option. Messages are searchable based on the sender, recipient, and message state using Find

    Messages . Queues are clickable for displaying additional information about that queue. Previously hidden queues, DSN messages pending submission , Failed message

    retry queue , and Messages queued for deferred delivery , have been exposed.

    Exchange 2003 Deployment and Architecture Page 14

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    15/47

    Enhanced control of message tracking log files. When using Exchange SystemManager, you have greater control over your message tracking log files. Exchange 2003automatically creates a shared directory to the message tracking logs and allows you tochange the location of the message tracking logs.

    Improved error reporting. Error reporting allows server administrators to easily reporterrors to Microsoft. Although error reporting was included in Exchange 2000 SP2 andSP3, its implementation is improved in Exchange 2003. For example, if users do notwant to view the standard error reporting dialog box, they can configure Exchange tosend service-related error reports to Microsoft automatically.

    Improved Cluster SupportClustering in Windows Server 2003 provides a number of improvements that allows MicrosoftIT to take full advantage of this technology to provide a solid clustered server standard tosupport its global Exchange mailbox server consolidation initiative. The new standardprovides for a better level of scalability and availability over any previous deploymentmethodologies used for corporate Exchange deployment at Microsoft.

    Support for Up to Eight Nodes

    Exchange has added support for up to 8-node active/passive clusters when using WindowsServer 2003 Enterprise Edition or Windows Server 2003 Datacenter Edition. This enabledMicrosoft IT to boost the number of servers in their Exchange Server 2003 clusters, therebysubstantially improving server availability and reliability while reducing the number of Exchange deployments necessary to manage the Microsoft corporate messagingenvironment.

    Support for Volume Mount Points

    Exchange now supports the use of volume mount points when using Windows Server 2003Enterprise Edition or Windows Server 2003 Datacenter Edition.

    A volume mount point is a feature of the NTFS file system that allows linking of multiple disk

    volumes into a single tree, similar to the way the Distributed File System (DFS) of a server links remote network shares. Administrators can link many disk volumes together with only asingle drive letter pointing to the root volume. The combination of an NTFS junction and avolume mount point can be used to graft multiple volumes into the namespace of a hostNTFS volume.

    Improved Failover Performance

    Exchange has improved clustering performance by reducing the amount of time it takes aserver to failover to a new node. Exchange specifically optimized the process of shuttingdown services on the running active node, expediting the failover and the startup of serviceson an alternative node, thereby improving overall system performance.

    Improved SecurityWhen Microsoft prioritized security as its first order of business, Exchange 2003 realizedmany benefits:

    Kerberos

    Exchange 2003 uses Kerberos delegation when sending user credentials between anExchange front-end server and Exchange back-end servers. In previous versions of

    Exchange 2003 Deployment and Architecture Page 15

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    16/47

    Exchange, when users opened applications such as Outlook Web Access (OWA), Exchangeused Basic authentication to send the users credentials between an Exchange front-endserver and Exchange back-end servers. As a result, companies had to use a securitymechanism such as IPSec to encrypt the information.

    Exchange 2003 also uses Kerberos when authenticating users of Microsoft Office

    Outlook 2003.Forms-Based Authentication in OWA

    Exchange 2003 enables a new logon page for OWA that will store the user's name andpassword in a cookie instead of in the browser. When a user closes the browser, the cookieis cleared. Additionally, after a predefined period of inactivity, the cookie is clearedautomatically. The new logon page requires users to enter their domain and network user names and passwords or their full user principal names (UPN), e-mail addresses, andpasswords to access their e-mail. This feature is also known as cookie authentication.

    User Selectable Security Options in OWA

    The OWA logon page allows users to select the security options that best fits their needs.Based on the cookie authentication technology, the Public or shared computer option(selected by default) provides a short default timeout option of 15 minutes. Alternatively,OWA users who are using computers in their offices or homes where they are the soleoperators, can select the Private computer option. When selected, the Private computer option allows a much longer period of inactivity before automatically ending the session. Itsinternal default value is 24 hours. To match enterprise security needs, an Exchange 2003administrator can customize the inactivity timeout values for both option settings.

    Blocking Attachments in OWA

    Similar to existing functionality found in Microsoft Outlook 2002 and later, the OWA feature of Exchange 2003 can be configured to block users from accessing certain file typeattachments. This feature is useful in stopping untrustworthy attachments from potentiallycompromising corporate security.

    Secure/Multipurpose Internet Mail Extensions (S/MIME) Supportin OWA

    S/MIME increases the security of Internet e-mail by enabling digital signing of messages aswell as message encryption. Digital signatures provide authentication, non-repudiation, anddata integrity. Message encryption provides confidentiality and data integrity. Within MicrosoftITs configuration, when configured to use S/MIME, private keys are stored in a roamingprofile, which is made available when the user logs onto a computer connected to thecorporate network. All S/MIME encryption, decryption, and messaging signing operations areperformed on the local computer using the private key. All public keys, necessary for non-repudiation and decryption, are stored in the Active Directory. User private keys are never

    passed, in any form, between the user's computer and the Exchange server.Restricted Distribution Lists

    In Exchange 2003, you can place restrictions on those who can send e-mail messages to anindividual user or a distribution list. Submissions can be restricted to specific users, groups,or all authenticated users. Restricting submissions on a distribution list prevents non-trustedsenders, such as unauthorized Internet users, from sending mail to an internal-onlydistribution list.

    Exchange 2003 Deployment and Architecture Page 16

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    17/47

    Improved Security with Clustering

    Exchange 2003 clustering, when run on Windows Server 2003, includes the followingsecurity features:

    Permission improvements mean the Windows Cluster Service no longer requires ExchangeFull Administrator rights to create, delete, or modify an Exchange virtual server.

    The Kerberos authentication protocol is enabled by default Internet Protocol security (IPSec) support for front-end and back-end servers Internet Message Access Protocol 4 (IMAP4) and Post Office Protocol 3 (POP3)

    services are no longer included by default when creating virtual servers

    Improved Recoverability Technologies to Better MeetSLA RequirementsBacking up and restoring large databases or SGs take a long time even over the fastestnetwork connections. However, the coupling of Exchange 2003 with Windows Server 2003offers an alternative solution that takes a small fraction of the time needed by tape mediamethodologies for backup and restore.

    Volume Shadow Copy Service (VSS) Integration Framework

    VSS, a feature of Windows Server 2003, provided Microsoft IT with the ability to performonline snap and clone functions on the databases. This allowed Microsoft IT to have a mirror copy of the data in a single point-in-time. VSS enables Microsoft IT to get either a mirror copyor a snap copy of the production data. Depending upon the type of failure, be it a mailboxstore, an SG, or multiple SGs affected by corrupted data, or a massive spindle failure wherethe entire data structure is lost, Microsoft IT can recover upwards of 800 GB of data inminutes, as opposed to standard restoration methodologies that would take many hours torecover that amount of data.

    Recovery Storage Group (RSG)

    The new RSG is a specialized, offline SG that can be created alongside the standard SGs onthe production server in Exchange. RSG provides added flexibility in quickly restoringmailboxes and databases. With this new feature, a damaged Exchange database can bequickly restored in an offline mode to a production server in an offline status. Once thedatabase has been restored to the RSG, the Exchange tool ExMerge can be used to exportthe contents from one or more mailboxes back into production. RSG eliminates the need for dedicated restore servers for single mailbox restore operations, thereby reducing server downtime.

    Mailbox Recovery Center

    The new Mailbox Recovery Center makes it easy to perform simultaneous recovery or exportoperations on multiple disconnected mailboxes. This is a significant improvement over

    Exchange 2000, where such operations had to be performed individually on eachdisconnected mailbox. With this new feature, you can quickly restore Exchange mailboxes,and thereby reduce downtime.

    Mobility Features/EnhancementsSignificant enhancements were made in Exchange 2003 for the mobile, client-sideexperience. All of the mobility features previously found in Mobile Information Server 2002

    Exchange 2003 Deployment and Architecture Page 17

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    18/47

    (MIS), a separate, adjunct solution to Exchange 2000, were incorporated into Exchange2003.

    Outlook Web Access (OWA)

    The new version of OWA in Exchange Server 2003 represents a significant upgrade fromOWA in Exchange 2000. The new version is a full-featured e-mail client, with support for

    rules, spelling checker, signed and encrypted e-mail, and many other improvements. Aredesigned interface provides an enhanced user experience similar to that of Outlook 2003,including a new Reading Pane (previously called the Preview Pane in Outlook) and animproved navigation pane.

    For OWA users connecting by means of either dial-up, low bandwidth wireless networks, or by using Secure Sockets Layer (SSL), the new use by Exchange 2003 of data compressiontechnology provides substantial overall performance improvements compared to thoserealized from using previous versions of Exchange Server. Additional performanceimprovements were attained by the elimination of all ActiveX controls required to use OWAon client computers connecting to Exchange 2003. When using earlier versions of ExchangeServer, these controls, when not available in the client computers Internet Explorer cache,

    had to be downloaded each time OWA was run.Outlook Mobile Access (OMA)

    Exchange 2003 now includes the OMA application previously offered in MIS. OMA allowsusers with browser-based mobile devices to use mobile devices to access their e-mail,Contacts, Calendars, Tasks, and to search the global address list. Users can use OMA with amobile device that has a mobile browser.

    MIS had to be installed in every network domain where these services were needed. SinceExchange 2003 comes with built-in mobile services, installation on network domains is nolonger necessary.

    Furthermore, Exchange 2000 users were limited to using only the MIS servers located in their

    home domains. Users from a domain within the Microsoft corporate network in which the MISserver was off-line could not use the MIS servers from other sub-domains to access theseservices.

    Exchange 2003 has eliminated the domain boundary limitations for OMA. Any user enabledfor OMA use can use mobile services on any of the front-end servers, regardless of their network domain. As an added benefit for Microsoft IT, if one regions Exchange front-endservers had to be taken offline for service, the user could still access those services from theremaining servers on the network, thereby all but eliminating downtime for this service.

    Exchange ActiveSync (EAS)

    The Exchange ActiveSync feature previously offered in MIS server, which enabled users tosecurely and remotely synchronize their mobile devices directly with the Exchange server,has also been incorporated into Exchange 2003 and enabled by default. By synchronizing amobile device to an Exchange server, users can access their Exchange information withouthaving to be constantly connected to a mobile network. In addition, users are no longer subject to the same EAS domain boundary limitations that affected OMA in MIS.

    Exchange 2003 Deployment and Architecture Page 18

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    19/47

    Up-To-Date Notifications

    Exchange 2003 introduces a new feature within EAS called up-to-date notifications. In thepast, the push notifications featured in MIS used the Short Messaging Service (SMS) of awireless carrier for sending text messages consisting of the first 160 characters of aredirected e-mail. Since SMS used non-encrypted text to transmit its messages, the security

    of message content was a major concern. Instead of transmitting the first 160 characters of the actual message, up-to-date notifications transmits only a binary command to the mobiledevice that causes it to start securely synchronizing e-mail over the SSL-protected EAS link.This way, the binary command never contains any portion of the message body yet the user still receives the latest e-mail.

    To reduce the amount of traffic a device might receive for a user who regularly receives largequantities of e-mail, Windows Mobile 2003 devices offer the user the option to either specifytime ranges during the day called Peak Time in which the synchronization only occurs atspecified intervals or synchronize continuously at all times. During Off Peak Time, however,the mobile device is synchronized by up-to-date notifications every time a message arrives.Support for up-to-date notifications requires the use of Windows Mobile 2003 devices suchas Pocket PC Phone Edition devices or Smartphones.

    Office 2003 IntegrationExchange 2003 is more tightly integrated than ever with its primary client application,Outlook 2003. The combination of the two offers users many enhancements.

    Exchange Cached Mode

    The use of Exchange cached mode, a feature of Microsoft Office Outlook 2003, enables theuser to work in a messaging environment with a perceived connection between theOutlook 2003 client and the Exchange Server. Exchange cached mode isolates the clientfrom most network and server latencies that, in the past, have caused Outlook to appear as if it had stopped responding. Outlook, using Exchange cached mode, connects to theExchange Server and automatically downloads all incoming content, such as e-mail, meetingrequests, and tasks to a dedicated .OST file, which serves as a local cache on the clientcomputer. Once the download has completed, the user can read, reply to, create new, anddelete e-mail as well as sending tasks and meeting requests. Outlook, working continuouslyin the background, connects the local cache file to the Exchange Server to upload the newoutgoing content and download any additional new incoming content. Users typically do notnotice any difference in messaging performance when using Exchange cached mode, other than the clear benefit of being free of slow network connections or poor server performance.

    Exchange cached mode, a feature of Outlook 2003, is supported under both Exchange 2000and Exchange 2003, but several performance improvements have been implementedspecifically to enhance the performance of Outlook 2003 clients when used in conjunctionwith Exchange 2003.

    Exchange cached mode is considered a key requirement toward the Exchange Server consolidation effort. Exchange cached mode will prevent regionally located users fromsuffering from the effects of system latency when working with Outlook over WAN linksconnected to remote mailbox servers.

    Exchange 2003 Deployment and Architecture Page 19

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    20/47

    Data Compression

    To reduce the amount of information sent between the Outlook 2003 client andExchange 2003 servers, both Exchange 2003 and Outlook 2003, when working in tandem,perform data compression that significantly reduce network traffic. Microsoft IT found that itreduced the total Exchange 2003-Outlook 2003-related network traffic by an average of

    40 percent. Exchange 2003 also reduces the total requests for information between the clientand server, thereby optimizing the communication between the client and the server.

    This significant level of data compression between client and server helped Microsoft ITmitigate the effect of additional WAN usage generated when local servers were consolidatedonto regional servers. What was formerly all SMTP network traffic locally has now become allMessaging Application Programming Interface (MAPI) Remote Procedure Call (RPC)network traffic across the WAN, but the quantity of that traffic was significantly reduced whencompared to traffic generated by previous versions of Exchange and Outlook.

    Remote Procedure Call (RPC) over Hypertext Transfer Protocol(HTTP)

    Exchange 2003 and Outlook 2003, combined with Windows Server 2003, support the use of RPC over HTTP to access Exchange. Using the Microsoft Windows RPC over HTTP featureenables the secure use of Outlook 2003 over the Internet without setting up a virtual privatenetwork (VPN) tunnel with remote access or using OWA. Outlook always communicates withthe Exchange server using RPC. When Outlook is configured to use this new feature, it will,by default, first attempt to connect to its corporate Exchange mailbox server by means of RPC over Transmission Control Protocol/Internet Protocol (TCP/IP) as it would in a corporatenetwork setting. If the server cannot be located this way, then Outlook attempts to connect toits corporate Exchange mailbox server by means of RPC over a secure HTTP link on theInternet using SSL. RPC over HTTP comes through the same Exchange front-end serversthat serve users of OWA, OMA, and EAS. Effectively this service is identical to OWA to theExchange back-end servers, but instead of using Internet Explorer as the e-mail client, the e-mail client is Outlook 2003. Similar to OWA, if the RPC connection is made through theInternet, users are prompted to enter their network logon credentials before access to theExchange Server data is granted.

    Note: The feature named RPC over HTTP actually uses Secure Hypertext Transfer Protocol (S-HTTP) over an SSL connection.

    Users who use notebooks as their primary Outlook computer will find this feature to beespecially useful. Users who travel to customer sites and often end up waiting for theopportunity to make presentations can use RPC over HTTP to keep in touch with their corporate Exchange server without the need for a VPN connection. RPC over HTTP enablesa user to make a connection through firewalls at customer sites (which typically block VPNconnections) to the corporate Exchange Server, thereby improving their accessibility andproductivity.

    Unlike OWA, the contents of locally stored personal folder files are available in Outlook on aremote connection in exactly the same way they would be while connected to the corporatenetwork in the office.

    Exchange 2003 Deployment and Architecture Page 20

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    21/47

    Note: Unlike OWA, RPC over HTTP downloads e-mail information when the user connectsto the Exchange Server (assuming the use of Outlook cache mode). Therefore, RPC over HTTP should only be used on computers the user controls, such as corporate notebooks,instead of on shared computers or public kiosks.

    Microsoft IT is optimistic that the use of RPC over HTTP will reduce the number of VPNservers required to meet the needs of the company. Most employees use VPN to connect tothe corporate network primarily to use Outlook. To quantify the level of VPN usage, MicrosoftIT is analyzing the matter to better understand employee needs in an effort to reduce thenumber of VPN servers deployed without reducing needed connectivity services.

    Exchange 2003 Deployment and Architecture Page 21

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    22/47

    EXCHANGE 2003 A RCHITECTURE D ESIGN D ECISIONSThe successful Microsoft IT deployment of Exchange 2003 required the integration of manydisparate elements. Not only was the Exchange server software new, but it also required theaddition of other new technologies, such as server and storage hardware from third-partysources and Microsoft Windows Server 2003 and Microsoft Office 2003 software, for

    Microsoft IT to gain the maximum benefit from the deployment. Design considerations for thenetwork, including bandwidth requirements and SLA agreements for backup and restore,were also considered. Because of the design decisions made, the resulting changes also ledto operational changes in Microsoft IT.

    TopologyMicrosoft IT used the topology from the Exchange 2000 on Windows 2000 Server as its basisfor designing the topology in the Exchange 2003 deployment. Active Directory was a keyelement in the organizational structure and administrative requirements for Exchange 2000.Microsoft IT was able to use the existing Active Directory structure for the Exchange 2003deployment.

    Microsoft IT was already deeply involved in the deployment of Windows Server 2003 in itsworldwide network infrastructure when the initial deployments of Exchange 2003 began. Thisdevelopment was critical, for while Exchange 2003 can run on Windows 2000 Server,Exchange 2000 cannot run on Windows Server 2003. Running Exchange 2003 on WindowsServer 2003 presents many additional benefits to Exchange, which are discussed in detaillater in this paper. Those benefits enabled Microsoft IT to begin implementing plans for consolidating the number of servers in the messaging infrastructure worldwide, which drovethe design for the Exchange 2003 topology.

    For more information about Microsoft ITs Exchange Server 2003 topology, see the iTShowcase technical white paper titled, Exchange 2003 Site Consolidation athttp://www.microsoft.com/technet/itshowcase .

    Mobility Design and ConfigurationThe definition of mobility at Microsoft has grown to include systems not typically associatedwith mobile technologies. Devices using Microsoft ITs mobile infrastructure include morethan just Pocket PCs and Smartphones. Microsoft employees using notebook computers or Tablet PCs running Outlook 2003 can use RPC over HTTP to access the Microsoft corporateExchange servers with just an Internet connection. Any remote, Internet-accessible computer can serve as an OWA client for Microsoft employees. All of these technologies go throughthe same mobile infrastructure to access Exchange 2003.

    The mobility enhancements in Exchange 2003 enabled Microsoft IT to modify the design of its mobile messaging infrastructure with additional server consolidations and improvedsecurity. The mobility infrastructure in Microsoft IT includes such services as OWA, OMA,

    EAS, RPC over HTTP, and up-to-date notifications.

    Consolidation of Front-End Servers

    In addition to the mailbox server site and server consolidation project, Exchange 2003 hasalso enabled Microsoft IT to consolidate its mobility server infrastructure (also known asExchange front-end servers). Microsoft IT no longer has to deploy a multiple-server infrastructure within each domain to provide mobility services. Deploying OWA and MIS with

    Exchange 2003 Deployment and Architecture Page 22

    http://www.microsoft.com/technet/itshowcasehttp://www.microsoft.com/technet/itshowcasehttp://www.microsoft.com/technet/itshowcase
  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    23/47

    Exchange 2000, on the other hand, required an Exchange front-end server dedicated toOWA and separate servers for MIS. By using Exchange 2003, all the mobile messagingfeatures reside on one physical front-end server, enabling Microsoft IT to consolidate thenumber of front-end servers dedicated to hosting mobility features.

    Microsoft IT reduced its server population from seven OWA servers and seven MIS servers

    (one set for each domain in the Microsoft corporate network) to seven Exchange front-endsites hosting OWA, OMA, EAS, and RPC over HTTP services. Each Exchange front-end siteworldwide hosts a pair of non-clustered, network load balanced Exchange front-end servers.While Microsoft IT theoretically could have consolidated to a single set of Exchange front-endservers, the project team decided to retain the larger number due to the network latency thatis caused by the great geographic distances between Exchange front-end servers andregional Exchange mailbox servers. If Microsoft IT had consolidated to a single set, user performance would have suffered. Network latency would have been particularly evidentamong those users with slow Internet connections or mobile devices.

    Mobile Security Enhancements

    Microsoft IT also used the enhanced security features for OWA offered in Exchange 2003 for

    its front-end server deployment, such as time-based logoff and forms-based authentication.Unlike OWA under Exchange 2000, a secure, HTML forms-based, authentication screenappears when a user navigates to a front-end server instead of an NTLM-based dialog box.In addition to logon credentials, the form asks two additional questions:

    1. Is the user logging on from a public kiosk/shared computer or from a private homecomputer?

    2. Does the user want to use basic or premium OWA user interface (UI) feature sets? (Theanswer typically depends on whether the connection is a fast or a slow data link.)

    All of the UI elements displayed in the OWA logon page are customizable, enabling theinclusion of company logos, specific URLs to regional front-end servers, custom usageinstruction text, and more. Microsoft IT created its customized OWA page using thesefeatures.

    Once the form has been filled out and the user clicks Log On , the data is encapsulated andsent by means of an SSL connection to the front-end server specified by the user when theynavigated to the specific server to bring up the authentication form. Once the logoncredentials have been sent over the Web, a special time-out cookie is created on the localclient computer. Depending upon whether the user indicated the client is a public or privatecomputer, the time-out cookie starts counting up to a threshold of inactivity. Once thatthreshold is met with no activity having taken place for that duration, the session connectionis automatically closed, and requires reauthentication if the user wants to regain access tothe Exchange mailbox. Microsoft IT configured the time-out cookie to close out inactivesessions on public or shared computers after 15 minutes, whereas inactive sessions on a

    users private home computer were configured to last for two hours of inactivity beforeclosing. The session time-out periods are enterprise customizable to meet any securityrequirements.

    In order to provide an additional level of security, Microsoft IT chose to deploy InternetSecurity and Acceleration (ISA) servers to act as the reverse proxy for all Exchange front-end

    Exchange 2003 Deployment and Architecture Page 23

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    24/47

    servers. This allowed the front-end servers for Exchange 2003 to be placed behind thefirewall, safely within the corporate network, no longer directly connected to the Internet.

    Server Design and ConfigurationIn designing the server platform for its Exchange 2003 deployment, Microsoft IT considered avariety of factors. Aside from the normal hardware issues of system reliability and vendor support, the key technical issues considered included new processor technology, cluster implementations, server designs, and mobility issues. As a result, Microsoft IT has moved allits Exchange mailbox servers to running in a clustered environment.

    Processors

    Processor technology continues to advance, improving performance in processing speeds,increasing the number and enlarging the size of on-board caches, and increasing the number of tasks that can be processed in parallel. Most of the servers of the Exchange 2000infrastructure were based on Intel Pentium II and Pentium III processors running in the 500 to700 MHz range, with a 100 or 133 MHz front-side bus (FSB).

    Given the advances in processor technologies since Microsoft IT s deployment of

    Exchange 2000, Microsoft IT chose to deploy Exchange 2003 on new systems based on theIntel Xeon Processor MP Hyper-Threading processors employing a 400 MHz FSB.

    Hyper-Threading enables a single processor to process information as if it were two separateprocessors sharing the same memory bus and cache. In effect, the four-processor, Hyper-Threading servers implemented by Microsoft IT functionally serve as virtual eight-processor servers. However, a processor equipped with Hyper-Threading technology does not offer thesame performance benefits as a genuine dual-processor system. Because Hyper-Threadingprocessing shares the same on-chip memory cache and main memory bus, Microsoft IT hasmeasured an actual Exchange performance increase benefit of approximately 25 percenthigher than that of a normal, non-Hyper-Threading processor of the same clock speed.

    Clustered Server Design

    All the new servers Microsoft IT purchased to host Exchange 2003 mailbox servers were setup as clusters and equipped with Xeon Processor MP microprocessors.

    Through a combination of Exchange Server 2003, Windows Server 2003, third party SANtechnology, and faster servers, Microsoft IT decided to create a clustered server design thatoffers greater operational reliability and a reduction in administrative overhead. Their designchoice allowed them to achieve the following specific benefits:

    Reduced service outages by having active node mailbox servers automatically failover topassive node servers.

    Clustered Exchange Virtual Server (EVS) failover performance of just two minutes wasachieved, regardless of the amount of the mailbox data contained within the SAN

    attached to the failed node. Increased the number of EVSs as well as the number of supported SGs per EVS within

    the cluster. Each SG was configured to use three LUNs. Volume Mount Points wereused with these LUNs to minimize the number of drive letters used.

    Enabled server consolidation by hosting many more mailboxes per server. Reduction in administration and maintenance overhead by consolidating more than 113

    mailbox servers in 75 locations into 38 servers in seven locations.

    Exchange 2003 Deployment and Architecture Page 24

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    25/47

    Reduced potential server outage impact to users (previously six hours or more per user)from a database restoration.

    Improved backup and restore times to less than one hour. Achieved server availability of 99.9 percent with a fiscal year 2004 SLA goal of achieving

    99.99 percent.

    Enabled the implementation of rolling upgrades to minimize the impact of serviceoutages while speeding up server operating system and application upgrades andpatching.

    Doubled the user mailbox limit (to 200 MB)

    Microsoft ITs design goal was to support 8,000 mailboxes per SAN, with 200 MB mailboxlimits, 99.99 percent cluster server availability, and less than one hour per database backupand restore time. The scaling of the data center EVSs in the Main corporate forest wasdesigned to reach 4,000 mailboxes.

    Multi-Node Cluster Design

    Microsoft IT chose to use a multi-node cluster design using multiple active (online) andpassive (offline) nodes. This design enables a failed active node to be immediately replacedby an identically configured passive node and for the resources of the failed active node,such as storage, to be immediately transferred to the passive node, thereby insuring that theend user experience is minimized by the failover.

    Microsoft IT implemented two separate types of passive nodes: primary passive nodes andalternative passive nodes. A primary passive node is a server using equivalently equippedhardware to the active node servers. This allows for full functionality upon an active nodefailover. The alternative passive node is a server equipped with lower-scaled hardware that isused primarily for tasks such as streaming backup data from disk to tape. It also serves as areduced performance failover server. Both types of passive nodes are leveraged for rollingsoftware upgrades.

    Microsoft ITs multi-node cluster design employs both primary and alternate passive nodes.Unlike primary passive nodes, alternative passive nodes are smaller servers primarilydesigned to carry out disk-to-tape backup tasks. Microsoft IT uses all of the passive nodes inthe cluster when rolling upgrades of the operating system and/or Exchange are required.Instead of failing an active node to the primary passive node, upgrading the offline activenode, then restoring the upgraded node to active status again and rolling through this cyclefor every active node in the cluster, Microsoft ITs deployment of alternative passive nodes inconjunction with primary passive nodes speeds up the process. Microsoft IT first patches allthe offline passive nodes, then fails over the number of active nodes equivalent to thenumber of available passive nodes. These offline nodes are then upgraded in parallel andrestored to service when ready. This process is repeated once to upgrade the one remainingactive node server.

    Microsoft IT Cluster Designs

    Microsoft IT implemented two primary cluster designs for the Exchange 2003 deployment inthe Main corporate forest: a regional design and a headquarters data center design. Aseparate, scaled validation design was also deployed in the Level B Test limited-useproduction forest. All used the multi-node, Active/Passive cluster design. Table 4 shows theMicrosoft IT cluster configurations.

    Exchange 2003 Deployment and Architecture Page 25

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    26/47

    Table 4. Cluster design specifications per deployment

    Regional Headquarters Level B Test

    Number of four-processor Active Nodes 3 4 1

    Number of four-processor PrimaryPassive Nodes

    1 1 1

    Number of two-processor AlternatePassive Nodes

    1 2 0

    Number of SGs per Active Node 4 4 4

    Number of mailboxes per Active Node 2,700 4,000 5,000

    Number of databases per Active Node 20 20 20

    Number of mailboxes per database 135 200 250

    Maximum size of database 27 GB 40 GB 50 GB

    Number of mailboxes per cluster 8,000 16,000 5,000

    Regional Design. The server specification for the regional cluster implementationconsists of one SAN enclosure per cluster, with three active nodes, one primary passivenode, and one alternate passive node (designated as AAAPp).

    Headquarters Design. The headquarters clustered implementation is similar in design.It consists of two SAN enclosures, four active nodes, one primary passive node, and twoalternate passive nodes (designated as AAAAPpp).

    Level B Test Forest Design. The Level B Test server specification is similar to theregional cluster in design but with greater mailbox capacity. It consists of one SANenclosure, one active node, and one primary passive node (designated as AP).

    To get the best performance at the best price point, Microsoft IT standardized on the four-processor, 1.9 GHz Intel Xeon Processor MP server for its active and primary passive cluster nodes for both regional and headquarters data center deployments. For alternative passivecluster nodes, Microsoft IT uses two-processor 2.4 GHz Intel Xeon Processor MP servers.Because of this new processing platform, Microsoft IT has seen substantial performanceimprovements in its Exchange 2003 infrastructure.

    Microsoft ITs cluster design supports a significant increase in both the number and size of mailboxes per Exchange server. It helps eliminate performance impact to users during thesecond stage backup process because it offloads that stage of the backup process to non-active servers within the cluster, thereby maintaining the SLA.

    Storage Design and ConfigurationThe entire design of Microsoft ITs storage configuration was based on effectively managingpeak time disk I/O. Microsoft IT studied the usage trends of its Exchange 2000 messagingstorage infrastructure and learned that the peak period of usage is typically Mondaymornings. Microsoft IT took that usage data and made it a baseline for designing theExchange 2003 SAN solution. Microsoft IT calculated the average amount of peak time diskI/O per second attributed to each mailbox. They calculated the total I/O rate for a server asthe product of the number of mailboxes multiplied by the I/O rate.

    Exchange 2003 Deployment and Architecture Page 26

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    27/47

    For example, on a server supporting 4,000 mailboxes with a peak time I/O rate of 1.2 per mailbox per second, the total I/O rate for that server equates to 4,800 I/Os per second. Theamount of data in each I/O transfer in Exchange is four KB, which at that I/O rate, equates tonearly 20 MB of I/O per second. Add to that the fact that each SAN enclosure serves twohosts in the headquarters data center configuration, the I/O rate doubles to nearly 10,000I/Os per second.

    In Microsoft ITs design for meeting this demand, each SAN enclosure selected by MicrosoftIT can support up to 12,000 I/Os per second, affording a margin of headroom for unusualspikes in activity but expected to perform adequately in normal peak periods of I/O activity.Any significant load beyond this would likely result in disk read and write latencies, whichwould adversely affect the performance of all the mailboxes attached to that SAN. MicrosoftIT system architects deemed this an acceptable risk, given anticipated conditions, the cost of additional hardware, and monitoring and alerting improvements in Microsoft OperationsManager.

    To determine the messaging storage requirements for any enterprise, one must measureaverage peak time I/O per mailbox user per second, the maximum size of mailboxes, thelength of time items are retained in deleted item retention, and the typical usage e-mailpatterns turnover rate in an organization. These are the factors Microsoft IT considered whendesigning their Exchange 2003 SAN solution.

    Microsoft IT allocated additional capacity to each LUN supporting mailbox stores in anattempt to mitigate any requirement for future resizing based on unexpected growth. TheLUN was sized to support six and half production databases with a fluff factor of 1.4.

    Fluff factor is what Microsoft IT refers to as the average capacity allocation to support a givenmailbox on disk based on deleted item retention, database overhead, non-limited mailboxesetc. For example, creating 100 MB mailboxes for users on Exchange 2000 actually requiredthem to reserve 140 MB of space per user. The value of 1.4 was trended over the years onproduction Exchange servers supporting 100 MB mailboxes and was maintained as a basis

    for designing the new solution with support for 200 MB mailboxes.Microsoft ITs 100 MB mailbox size limit was a hard and fast disk quota limitation set andenforced at the Exchange level by means of policy, but if the user consumed the entire 100MB of available space, it was often because they had exceeded the amount on the back end.This usually happened when a user deleted e-mail from a mailbox. The e-mail was actuallynot immediately deleted from the mailbox database on the server. Rather, it was temporarilyretained in the database, held in a space known as deleted item retention. Only after threedays was the deleted e-mail actually purged from a mailbox database. Microsoft IT needed toaccount for that level of usage overhead when planning its storage needs for Exchange2003.

    Additionally, Microsoft IT sized each data LUN to support six and a half databases even

    though they would only support five in production. This allowed them to duplicate a singlecorrupted database on the same LUN and then run an integrity check on it. This ability to usethe same LUN enabled Microsoft IT to provide the fastest possible response to databasecorruption.

    Exchange 2003 Deployment and Architecture Page 27

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    28/47

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    29/47

    A representation of the drive letter allocation for the first node and corresponding allocation tosupport the online backup devices is given in Figure 1.

    Data SG4P:\

    350 GB

    Data SG3O:\

    350 GB

    Data SG2N:\

    350 GB

    Data SG1M:\

    350 GB

    SMTPM:\Exsrvr03

    50 GB (VMP)

    LogsP:\SG4_Logs40 GB (VMP)

    LogsO:\SG3_Logs

    40 GB (VMP)

    LogsN:\SG2_Logs40 GB (VMP)

    LogsM:\SG1_Logs40 GB (VMP)

    Node 1 BackupNode 1

    BackupZ:\Backup_4

    350 GB (VMP)

    BackupZ:\Backup_3

    350 GB (VMP)

    BackupZ:\Backup_2

    350 GB (VMP)

    BackupZ:\

    350 GB

    Figure 1. Drive letter allocation per node.

    Note: In the context of Figure 1, VMP represents a volume mount point.

    In all, a total of 53 physical LUNs are addressable using 21 drive letters within the clustereddesign. This allows for easy disk subsystem optimization with LUNs distributed across

    controllers and Fibre Channel Adapters (FCAs) to ensure peak disk transfer requirements aremet as required within the Microsoft production environment.

    Redundant Storage System Paths Using Secure Path

    Microsoft ITs deployment of SAN technology includes an I/O design that not only providesredundancy but also uses that redundancy for optimal data flow.

    Microsoft IT uses HP StorageWorks Secure Path for Windows to provide many benefitswithin its SAN infrastructure. Secure Path provides three key benefits:

    1. Eliminates the risk of a single point of failure supporting the server and SANinterconnect.

    2. Allows for LUN distribution to maintain optimized I/O required on a busy Exchange host,

    reducing peak read/write disk latency and substantially improving online backupthroughput to disk.

    3. Insures only single LUN presentation independent of the number of paths to the host.

    Microsoft ITs implementation of Secure Path uses two FCAs per host, two fibre channel dataswitches, and two storage controllers. Each FCA, switch, and controller group makes up whatis known as a fabric. Secure Path allows the use of two separate fabrics per SAN, and each

    Exchange 2003 Deployment and Architecture Page 29

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    30/47

    element of the fabric is interconnected with subordinate elements from both fabrics. Moreprecisely, each active node host in a cluster connects to each switch by means of the twoFCAs installed in each host (one FCA per switch). Each switch takes inbound data from eachhost and has two outbound data connections, one to each controller. Each controller has twoinbound data connections, one from each switch, and has one outbound data connection tothe SAN enclosure. Secure Path enables Microsoft IT to be operationally tolerant to a singlecomponent failure in an FCA, a connecting cable, a switch, or a controller. Serviceperformance would be affected in the event of a component failure, but it would be able tocontinue to operate seamlessly.

    Secure Path also assists with eliminating many single points of failure between the nodesand the connected SAN storage. Microsoft IT can maintain service in the event of acomponent failure affecting a single FCA per host, multiple fiber cables, fiber channelswitches, or a single storage controller that makes up the SAN fabric. The componentfailure is detected by Secure Path, which ensures that I/O is maintained by moving LUNsfrom the failed path to an available path. This process, called failover, requires no resourcedowntime while maintaining LUN availability. Failed-over LUNs can be failed-back using HP'sSecure Path Manager to restore optimized I/O, once failed components have been replaced.

    The headquarters data center cluster implementation using Secure Path to connect to a16,000 mailbox SAN is shown in Figure 2.

    1472GB disks

    Controller Controller

    1472GB disks

    1472GB disks

    1472GB disks

    1472GB disks

    1472GB disks

    1472GB disks

    1472GB disks

    1472GB disks

    1472GB disks

    1472GB disks

    1472GB disks

    Fabric Path A

    Storage Management Appliance

    Two SANs totaling 16,000mailboxes with 200 MB limit

    Fabric Path B

    PublicNetwork100 MB

    FullDuplex

    Alternative Passive NodeDual FCAs

    Active NodeDual FCAs

    1472 GB disks

    Controller Controller

    1472 GB disks

    1472 GB disks

    1472 GB disks

    1472 GB disks

    1472 GB disks

    1472 GB disks

    1472 GB disks

    1472 GB disks

    1472 GB disks

    1472 GB disks

    1472 GB disks

    Active NodeDual FCAs

    Active NodeDual FCAs

    Active NodeDual FCAs

    Alternative Passive NodeDual FCAs

    Switch A Switch B

    SAN Controller Pair 1

    SAN Controller Pair 2

    Primary Passive NodeDual FCAs

    Figure 2. Secure Path Connecting a Data Center Cluster to a Pair of SANs

    Backup and RecoveryWith the implementation of Exchange 2003 in a clustered server environment, Microsoft ITdesigned a two-stage backup process (disk-to-disk and disk-to-tape) to meet its SLAs better.

    Exchange 2003 Deployment and Architecture Page 30

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    31/47

    This process prevents the tape backup process from affecting the production server performance, and provides greater flexibility in managing the data restoration process. Thesolution is based on a combination of:

    Exchange Server 2003 Microsoft Windows Server 2003, Enterprise Edition Windows NT Backup for disk-to-disk backup Veritas Storage Management solution for disk-to-tape backup

    In the past, it was challenging to maintain the one-hour backup restore SLA on directattached SCSI storage server implementations. These server designs used a one-stepbackup process (disk-to-tape), where backups were performed to tape libraries over theGigabit LAN. Microsoft ITs experience showed that they could move data at a rate of approximately 36-37 MB per second, or about 33+ GB per hour. Backups were limited tonon-business hours to minimize any impact to clients with mailboxes hosted on theseservers. However, if a backup failed to complete by 7 A.M., it had to be canceled. Otherwise,the continuing backup process would have a significantly negative impact on the systemperformance of the messaging infrastructure for clients.

    Recovering a mailbox store affected by corruption in Exchange 2000 meant that 1,000mailboxes were out of service for six or more hours during the restore operation. Thisrepresented a cost in lost productivity of $60-$80 per hour per user. Single mailbox restoreoperations required dedicated restore servers. This configuration is shown in Figure 3.

    100 MB LAN

    Tape

    RegionalConfiguration

    100 MB LAN

    1,000 mailboxExchange 2000

    Server 1,000 mailbox

    Exchange 2000Server

    1,000 mailboxExchange 2000

    Server

    DedicatedExchange 2000Restore Sever

    Figure 3. Previous Regional Messaging Backup Environment

    Two-Stage Backup SolutionTo solve these problems and support server consolidation, Microsoft IT designed a flexible,two-stage process to backup data within a multimode clustered configurationdisk-to-disk(stage 1) and disk-to-tape (stage 2).

    Microsoft IT leveraged the fact that resources within a cluster resource group can movewithin that resource group independent of other resource groups. For example, an active

    Exchange 2003 Deployment and Architecture Page 31

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    32/47

    node of a clustered Exchange server is attached to a separate cluster resource group of dedicated backup LUNs in addition to the resource groups used for storing production data,.

    In the first stage, backup runs on all active nodes within the cluster to complete an online,disk-to-disk backup from the LUNs in the production data resource groups to the LUNs in thebackup resource group over a direct attached fibre channel. The backup resource group has

    the capacity to support two-day online retention. Once that process has completed, thecontrol of the LUNs in the backup resource group is transferred to an alternative passivenode. At this point, passive node initiates the second stage, disk-to-tape backup from thebackup resource group to the tape library over a direct attached fibre channel. This processfrees up the active nodes from the time consuming disk-to-tape data transfer, therebyminimizing the amount of time required of the active nodes for processing data backupoperations. This process is shown in Figure 4.

    Figure 4. Two-stage Backup Process

    Microsoft IT elected to use this two-stage process rather than using a single stage, disk-to-tape backup over a direct fibre attachment to a tape library. While the single-stage processwould eliminate the need for backup LUNs in the SAN, which would free up additionalstorage capacity in the SAN for more mailboxes, Microsoft IT realized that it could not takethe risk of losing valuable production time in the event that the node in the cluster mightbecome disconnected from the tape library. If that happened, the node server would berequired to reboot to reattach the server to the library. If the active node were the server performing this work, Microsoft IT would be required to failover the node so it could rebootand reconnect to the library. Microsoft IT considered that an unacceptable risk to systemavailability. Instead, by placing the burden of backing up to tape on a passive node that does

    not support users, no loss of production service occurs when the passive node needs to berebooted to restore the server-to-library connection.

    Per-database online backups are scheduled at regular intervals that let Microsoft IT back upeach entire server between 8:00 P.M. and 1:30 A.M. The databases are backed upconcurrently per SG. An important feature here is that Exchange 2003 allows parallel backupand restore operations on a per-SG basis. Therefore, backup operations for each databasecan be interleaved.

    Exchange 2003 Deployment and Architecture Page 32

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    33/47

    Recovery Solution

    With Microsoft ITs new clustering solution, a server hardware failure is simply a matter of anautomatic cluster node failover; service is negligibly affected. If there is a disk failure, differentrecovery scenarios are implemented, depending upon the scope of the failure and the time of day at which it occurs.

    Methodology is No Longer Scenario-Dependent

    The method of recovery employed used to be based on the type and scope of failure incurredand business priorities. With Exchange 2000, organizations had a choice between restoringtheir messaging service quickly while giving up immediate access to old mailbox data, or restoring full access to their service but taking more time to do it.

    For example, if a single database was lost, up to 200 people could have been affected.Because up to two days of backup data was available on disk and could be restored online inless than an hour (restore rates of up to 2 GB per minute were achieved), regular Exchangerestore procedures were used to get user mailboxes quickly back online with their data.

    Note: Each Exchange database consists of two files: the Exchange Database (EDB) file and

    the Streaming Media (STM) file.

    With Exchange 2000, if an entire SG was lost, the time of day of the failure was often thedeciding factor on how to proceed. If the failure was during the business day, restoration of service usually took precedence over restoration of data, which could be restored later. Inthat scenario, the damaged databases are deleted and recreated (a process known asstubbing a database).

    If the failure occurred in late, non-business hours, Microsoft IT chose to sacrifice theimmediate return of service in favor of a faster restoration of all lost data. In that situation,they elected to perform the restoration without stubbing the affected databases.

    Exchange 2003 Deployment and Architecture Page 33

  • 8/8/2019 Exchange2003Design_ArchitectureTWP

    34/47

    The decision tree used by Microsoft IT to determine whether to restore service first and datalater or restore data and service simultaneously is illustrated in Figure 5.

    1. Is problem

    known andresolvable?

    Move mailboxes andinvestigate further

    Fix problem

    3. Is it between 8AM and 4 PM on

    bus