Capacity Planning for Speech Server 2007download.microsoft.com/download/a/4/4/a441d741-ff7… ·...

Capacity Planning for Office Communications Server 2007 Speech Server DeploymentsWhite Paper

ContentsIntroduction.................................................................................................................................................. 4

Estimating Channel Capacity Requirements...............................................................................................4Identification of Performance Goals........................................................................................................4

Estimating expected call load – incoming call applications..........................................................................5

Estimating expected call load – outbound calling applications....................................................................7

Estimating Average Application load (application complexity).....................................................................7

Disk Space Considerations for Application Tuning and Reporting.............................................................10Application Tuning Disk Space Requirements.......................................................................................10Reporting Requirements........................................................................................................................12

Estimating Hardware Requirements..........................................................................................................13“Reference” Server Hardware Specification..........................................................................................13Suggested Load (channels) per computer per application type.............................................................13Additional Factors that Affect Capacity..................................................................................................13Deployment Topology........................................................................................................................... 15

Bandwidth considerations..........................................................................................................................15

Example Application Capacity planning.....................................................................................................17Simple Application – Store Locator.......................................................................................................17Medium Complexity Application – Flight Booking..................................................................................19High Complexity Application – “How May I Help You” call director........................................................21

Optimizing Speech Application Performance – tips and tricks...................................................................23Consider converting Conversational Understanding grammars to GRXML...........................................23Minimize the amount of logging done....................................................................................................23Prefer recorded prompts over TTS........................................................................................................23Performance of SALT applications migrated from MSS V1 (R2)...........................................................23

Office Communications Server 2007, Speech Server Capacity Planning Tool..........................................24

Copyright © 2007 Microsoft Corporation Page 2

IntroductionBy carefully planning the capacity of your Microsoft® Office Communications Server 2007 Speech Server deployment, you can ensure that your system's telephony and speech recognition (SR) resources are adequately sized to meet expected demands. You can estimate the number of computers you will need by factoring in information about the type of application you are deploying, the system performance goals, and the estimated call volume. As soon as you have estimated the number of servers needed, you can fine-tune the deployment by testing the system under the load of expected call volume to discover the actual capacity and performance numbers that you need.

In general, your system's capacity is constrained by the performance characteristics you expect from each computer in your deployment. As channel density (that is, the number of simultaneous calls) increases, application performance tends to decrease, eventually resulting in declined calls. To ensure a good experience for callers interacting with your speech application, Speech Server declines new calls if response times get too slow.

Estimating Channel Capacity RequirementsWhen estimating channel capacity requirements for your Speech Server deployment, you must consider the following three key factors:

System performance goals

Current and future call workloads

Application complexity

The business objectives for your Speech Server deployment, such as optimal call-handling service levels or high customer satisfaction scores, should weigh heavily in determining your system performance goals. Aside from the typical computer hardware elements (such as processor speed, the amount of memory, and hard disk capacity), call workloads and application complexity are the two main factors that most affect the performance of a Speech Server deployment. These factors play a significant role in determining the scale of your deployment. It is important to estimate these as accurately as possible to meet or exceed your performance objectives.

Identification of Performance GoalsYour Speech Server deployment might replace or supplement a current business function, such as call center agents handling routine or low-complexity inbound phone calls. Alternately, it might provide new capabilities that are attractive to the business, such as enhancing customer service by providing outbound automated appointment reminders. To ensure that your deployment delivers on your business objectives, you need to establish performance goals for the deployment early in the design process. This provides a clear benchmark for success and helps guide your solution design and deployment.

Performance goals for Speech Server deployments typically include the following measurements:

User-perceived latency (UPL) — The length of time between the end of a caller's input and the response from Speech Server.

Call pass rate — The percentage of calls successfully answered by Speech Server without being dropped or unanswered (including busy-out).


Resource utilization — The amount of server resources that the application consumes, such as CPU, memory, and hard disk.

User-Perceived LatencyA low average UPL is important to maintain caller satisfaction as well as to prevent runaway error conditions during an automated call. Users of your speech applications expect the timing of the application's responses to closely mimic that of human conversations. If users are forced to wait too long, they can become impatient and opt out of using the system. In other cases, they might start to repeat inputs that cause the application to reprocess or misrecognize responses. Because neither of these situations is desirable, always factor a reasonable UPL into your performance objectives:

A UPL of less than two seconds is generally acceptable; less than one second is ideal.*

Call Pass RateMost call centers already have metrics in place for acceptable call pass rates, usually between 95 and 99 percent. An increase in the number of unanswered, dropped, or blocked calls would most certainly cause customer satisfaction ratings to drop, leading to a detriment affect on original business goals. Thus, the performance goals for your Speech Server deployment should include meeting or exceeding the current call pass rate.

Target a call pass rate of at least 95 percent, preferably higher.

CPU UtilizationWhile application resilience and network issues can influence both call pass rates and UPL, maximizing and controlling system resources is an excellent way to optimize your deployment for meeting your UPL and call pass rate goals. Resources within the computer include the main subsystems of CPU, memory, and disk/storage. Of all these server resources, CPU utilization measurement is the best indicator of resource availability because resource utilization problems elsewhere, such as a shortage of memory, result in excessive use of CPU resources. Thus, to ensure optimal system performance:

Set your maximum CPU utilization target at 70 percent or lower.

Performance Goals SummaryTo recap, a good set of guidelines for your performance objectives should include the following three metrics:

UPL < 2 seconds

Call pass rate >= 95%

CPU utilization <= 70%

Having established your performance objectives, you can now move to the next step in assessing your capacity needs: estimating current and future call workloads.

Estimating Expected Call Load – Incoming Call ApplicationsAs previously mentioned, the workload placed on the computer affects the ability of your Speech Server deployment to achieve its performance objectives. The higher the workload, the more resources you need. You can determine the call workload using actual statistics from current operations, such as the number of calls currently coming into your call center or helpdesk. If actual statistics are not available, you need to estimate these call statistics.† The statistics most important in determining call workload are as follows:

* In scenarios that legitimately require longer wait times, for example a complex database query, set the user's expectation by providing them with immediate feedback that the requested process is going to take a few moments. This is often implemented with a spoken prompt such as "One moment please," sometimes followed by an audio icon that suggests a process is executing. This way the user does not perceive the processing time as unacceptable system response latency.† If you expect workload to increase over time, design your deployment to accommodate these future changes. Either start with a deployment that has more available capacity than is required at present, or architect it so you can easily add more capacity in the future.


Number of calls

Call concurrency

Call length

These call measurements help you identify the number of telephony channels required in the deployment. For example, given that one channel handles one call at a time, if each call takes ten minutes and you assume each call begins as the previous one ends, one channel can handle six successful calls in an hour. Therefore, if you expect 19 to 24 calls each hour, you need four channels to enable four concurrent calls. However, real call patterns are never so evenly distributed and another method of calculating the number of required channels must be used.

Estimating Required Telephony Ports Using an Erlang CalculationOne of the best ways to estimate the number of channels needed to handle your call workload is to use an Erlang traffic model. This well-known call-statistics calculation method helps you to determine how many channels you need to support call center traffic and gives you a good starting point for investigating your deployment requirements.‡

Erlang calculations use a unit of measurement called the Erlang. An Erlang describes traffic through telephony equipment (such as incoming calls into a phone switch or PBX).

1 Erlang = 1 continuous 3600 second (60 minute) call

To determine the maximum number of Erlangs that you need to accommodate, multiply the number of calls during the busy hour (B)§ by the average call length in seconds (L), and then divide by 3600:

Erlangs = (B*L)/3600

For example, if you estimate that your deployment will receive 1000 calls during the busy hour and the average length of a call is two minutes (120 seconds), the estimated number of Erlangs is determined as:

(1000*120)/3600 = 33.3 Erlangs

When you have established the number of Erlangs your deployment will process, you can use an Erlang traffic calculation** to determine the number of required telephony channels. The Erlang traffic formula is a mathematical algorithm that incorporates the number of Erlangs you estimated plus a percentage of blocked calls (or busy signals) that you think is acceptable. For example, if you take the 33.3 Erlangs from above and you can tolerate 1 percent call blockage, the Erlang traffic calculation determines that your deployment needs to scale to 45 concurrent channels.

With insight into how to calculate call workloads and thus the number of concurrent channels required of your solution, the next factor you need to consider is the complexity of your application.

Use the Peak Call Load sheet of the Speech Server Capacity Planning workbook to calculate these numbers for your own deployment.

Estimating Expected Call Load – Outbound Calling ApplicationsEstimating the call load for an outbound calling application is quite different from an application that accepts inbound calls, where you have no control over the call pattern. For an outbound application, the channel capacity that you need to scale to depends on:

‡ More about the Erlang methods including calculators are available online at http://www.erlang.com/.§ The busy hour (B) is the hour of operations during which the highest number of phone calls are arriving into your call center.**


http://www.erlang.com/

How many calls need to be placed.

The time frame within which the calls must be placed.

The average call duration.

The proportion of unanswered calls.

For example, consider a video rental company that makes reminder calls to customers with overdue videos. They have to place up to 10,000 calls per day, between 9 A.M. and 5 P.M. – this equates to 1250 calls per hour. Assuming an average call duration of 45 seconds, the Erlang calculation is as follows:

Erlangs = (1250*45)/3600

= 56250/3600

= 15.625

For an outbound calling application, the call pattern is controlled by the application and will be steady and predictable so the number of Erlangs map well to the number of concurrent channels you need to be able to scale to. Rounding up, this gives us a scale goal of 16 concurrent channels.

Estimating Average Application load (Application Complexity)A Speech Server application can range from simple to complex and the demand that the application makes on Speech Server resources increases with the complexity of the application. To help you identify the complexity level of your application, the following sections describe the three broad application types: simple, average, and complex.

It should be noted that there is very little difference in performance of VoiceXML applications versus .NET managed code speech applications. Performance should not be a factor influencing your choice of application authoring style.

Simple ApplicationSimple applications include dual tone multi-frequency (DTMF) applications and those with small amounts of simple automatic speech recognition (ASR). DTMF applications play pre-recorded prompts and collect touchtone key presses from callers in response to prompts. This application type might use text-to-speech (TTS) to play back small amounts of dynamic data, such as bank balances or callers' names. DTMF applications place a relatively low burden on the speech recognition engine and on system resources. The speech recognition capability of simple applications is typically limited to rudimentary single-token responses, such as “Yes/No” or digits. Whether featuring touchtone or a combination of touchtone and speech, simple applications use grammars that are sparse and only contain a small number of entries.

An example of a simple application might be a department store locator application. Customers call a toll-free number and obtain the operating hours and address of a local store. The application requires customers to input their postal code using the telephone keypad. It then performs a database lookup and reads back the operating hours and address of the nearest store using TTS.

Average ApplicationsAverage applications typically play a number of pre-recorded and/or TTS prompts and perform moderate levels of speech recognition using medium-sized or variable grammars. This application type might also use text-to-speech to play back dynamic data, such as bank balances or users' names. Average applications create greater load on the speech recognition engine than DTMF or simple applications, but because of the moderately sized grammars, the overall load is less than that of applications with large complex grammars. Average applications typically recognize more complex speech inputs, such as dates and place names.

An example of an average application is a bank account management application. Customers can carry out account management tasks over a telephone, such as checking balances, transferring money, and requesting statements. The application understands speech input, such as the account type (for example,


checking or savings) and the task that they want to perform (for example, check balance, transfer funds, and get statement). The customers can use either the telephone keypad or speech to input account numbers and money amounts. The application uses text-to-speech to play back statements and balances.

Another example of an average complexity application is purchasing tickets for air travel. The application prompts for information, such as the number of tickets, departure and arrival dates, locations, the class of ticket, and food requirements. The application receives and processes speech responses to these prompts. After the tickets are booked, the customer's credit card details are used to reserve the tickets. The user can also check that booked flights are on time and make lost luggage inquiries.

Complex ApplicationsApplication complexity has a profound effect on the performance of Speech Server because complex applications put the highest burden on ASR and TTS resources. Complex applications perform a lot of speech recognition using complex grammars (such as conversational grammars with many nodes and training sentences or grammars with more than 25,000 items). This application type spends little time doing less CPU intensive activities such as recording voice mails or using text-to-speech to play back large amounts of text, such as e-mail messages.

An example of a complex application is the typical “How May I Help You” application that redirects calls to appropriate queues. This application uses a conversational grammar with many concepts and hundreds (or more) of training sentences per concept. Not only is this a complex grammar, but a large proportion of a call duration is actually spent doing speech recognition, as opposed to less intensive tasks such as prompt output or recording.

Another example of a complex speech application is a survey application that spends the entire call asking questions of the user and little or no time presenting information back to the user.

Leeway in Identifying Application ComplexityChoosing which category your solution fits into might not be easy because each application is unique, with various dialog turns being more or less complex than others. Yet, while the relative accuracy with which you assess your application's complexity is important, exactitude is not. The amount of extra performance headroom provided by your system performance goals (such as < 70% CPU utilization rather than 90%) builds in some flexibility for accommodating variations in application complexity and call workload. So as long as you are "in the ballpark," you should have a good starting point from which to plan your deployment.

While estimating capacity is an important part of the deployment planning process, no deployment should be put into final production without performing a smaller scale pilot of the deployment with the actual application against real load. This piloting stage of the final rollout will help shake out any discrepancies, bugs, and potential issues that planning and estimating alone cannot uncover.

Use the Application Load sheet of the Speech Server Capacity Planning workbook to help determine the expected level of complexity of your application. The information below serves as a guide to answering the questions on this worksheet.

Using the Application Load Worksheet in the Speech Server Capacity Planning ToolThis worksheet asks you to rate (on a scale of low-medium-high) three aspects of your application to help determine the overall application complexity (in terms of CPU cost).

How much time is spent recognizing versus speaking or recording? _How complex are the grammars? _

How much additional processing does the application have to do? _


The first question takes into account the fact that the most expensive activity within a speech application is the actual recognition of speech. The amount of speech recognition an application does can vary significantly from one to another. For example, an information service (such as a weather report or e-mail checking) might ask very few questions but spend most of the call reading back information to the caller. For such an application, where less than 10 percent of the call duration is spent recognizing spoken input, the response to this question is “low”. For an application such as a survey that is asking a lot of questions but not presenting much information back to the user, the time spent recognizing might be 30 percent or more of the call duration and this is classified as “high.”

The second question – “How complex are the grammars?” – asks you to consider the complexity of the average grammar used by your application. For example, most of the grammars in an application can be used to recognize responses to simple menus or Yes/No questions.

While conversational grammars with concept answers are very powerful, they use a statistical language model built on the set of training sentences provided. The CPU resources required to evaluate an utterance against this model is greater than that used for evaluation conventional grammars.

Use the following table as a guide to identifying the average complexity of grammars in your application.

Grammar Examples Complexity

DTMF Low

Yes/No Low

SimpleMenu Low

Date, currency Medium

City Names (1000s of nodes) Medium

Directory Assistance (10,0000s of nodes) High

Concept Answer in Conversational Grammar with 100s of training sentences per concept

High

While date and currency might seem like simple grammars because of the multiple ways in which dates (such as “February sixth nineteen sixty seven”, “twenty first of October nineteen sixty eight”, “Tuesday September fifth”, and “August fifth”) and currencies can be spoken (“one hundred and ten dollars”, “five dollars and seventy five cents”, and “eleven thirty three”) they are far more complex than the simple grammars such as those used in a menu or Yes/No.

The third question – “How much additional processing does the application have to do?” – takes into account any work the application has to do on top of regular speech dialog activity. Most applications likely make queries to a database or Web service on a separate computer where additional work can be done. In this case, the additional processing on the computer running Speech Server is low. If you have an application that needs to do some complex processing, such as converting voice mail recordings into a different codec or running a speaker verification algorithm, that can impose significant extra demands of the CPU resources on the computer running Speech Server.

Disk Space Considerations for Application Tuning and ReportingIn addition to whatever disk space is needed for the operating system, the Speech Server installation, and your application and its resources, the amount of disk space required to support tuning and reporting


activities can be quite significant. Disk space is needed on each computer running Speech Server for log files and on the computer running SQL Server where the logs will be imported into databases. The actual size of the log and databases depends on the amount of call traffic, the number of dialog turns per call, the level of audio logging, and the length of logged utterances.

Application Tuning Disk Space RequirementsTuning an application is usually done prior to its deployment (or in the early stages of deployment). To support tuning exercises, it is recommended to log Recognizer audio for several thousand calls. To do this, you need to configure Speech Server to log an appropriate percentage of calls using the Speech Server Administrator console (shown in the following diagram), and then allow Speech Server to collect the call data until sufficient calls have been captured. For example, if you generally get 15,000 calls per day, setting the capture level to 10 percent for a day results in logs containing Recognizer audio for around 1500 calls. The following diagram shows how to configure the Trace Logging tab in Server Properties to capture Recognizer audio in 10 percent of the calls.

This results in the following approximate disk space requirement.

Description ExampleNumber of calls per day 15000

Number of calls with Recognizer audio logged 10% (=1500)

Number of turns per call 8


Daily disk space required for logs on the computer running Speech Server

1500 * 8 * 50 KB= 0.57 GB

Database space required on the computer running SQL Server for tuning activities

15,00 * 8 * 87.8 KB=1 GB

Note that this assumes that the average log space per speech recognition is 50 KB, based on an average utterance duration of around 8 seconds. If your average question response is very short, your log and database space requirements might be somewhat less; if they are much longer, your log and database space requirements will be somewhat larger.

Reporting RequirementsWhen an application is fully tuned and ready for full production, logging is typically configured to only collect data that is needed for reporting purposes. The following diagram shows the settings in the Administrator console Server Properties to log data for reporting purposes only.

This logging configuration results in the following approximate disk space requirement.

Description ExampleNumber of calls per day 15,000

Number of turns per call 8

Number of days data to retain in reporting 100


DatabaseDaily disk space required or logs on the computer running Speech Server

15,000 * 8 * 5 KB= 0.57 GB

Database space required on the computer running SQL Server for reporting activities (for 100 days)

15,000 * 8 * 100 * 2.46 KB=28.1 GB

Use the Disk Space sheet of the Speech Server Capacity Planning workbook to calculate these numbers for your own deployment.

Estimating Hardware RequirementsBased on the expected maximum number of simultaneous calls that you need to support and the estimated channel capacity per computer you can arrive at the anticipated number of servers you will need to be able to handle the load while meeting your performance goals.

“Reference” Server Hardware SpecificationThe performance calculations in this white paper are based on the following hardware configuration:

Two Dual-core Intel Xeon 5150 processors (Woodcrest) running at 2.66 GHz

333 MHz Front Side Bus

1333 MHz Bus Speed

4096 KB Level 2 Cache

4 GB of RAM

Suggested Load (Channels) per Computer per Application Type

Application Complexity

Examples Expected Channel Capacity per Computer Running Speech Server

Simple DTMF, reading e-mail, simple Yes/No questions.

Up to 400 channels

Medium Appointment Scheduler, Flight Booking

Up to 133 channels

Highly complex “How May I Help You” call router, Directory Assistance

Up to 67 channels

Additional Factors that Affect CapacityIn addition to channel capacity other speech deployment factors to consider include the following:

Low memory can result in a drop in call pass rate and an increase in UPL.

Each individual connection requires additional memory.

Low memory can result in a failure to start new sessions.

The main gating factor for applications with small grammars is CPU utilization.


Larger grammars require more memory.

The language of the grammar affects the grammar size.

Generally the most heavily used resource by a speech application is the CPU cycles; using faster hardware than described above should allow you to scale to a greater number of channels per server.

For more complex applications (such as particularly large grammars or other high-memory demands), you might find that 4 GB of memory becomes a system throughput constraint (rather than CPU) and the application might benefit from upgrading the hardware to 8 GB of RAM.

Speech Server provides the capability of increasing the channel capacity of a deployment by adding additional computers running Speech Server and distributing the load across them. Typically, VOIP gateways provide the ability to distribute call traffic across multiple SIP peers.

Use the Servers sheet of the Speech Server Capacity Planning workbook to calculate the suggested number of servers for your own deployment.


Deployment TopologyThe following diagram shows a typical Office Communications Server 2007 Speech Server deployment topology.

Bandwidth ConsiderationsIn addition to the amount of call traffic received, the other primary factor affecting bandwidth utilization is the particular codec being used to encode voice traffic between Speech Server and SIP peers. The following table shows the average bandwidth utilization for each codec, per channel.

Codec Payload Bit Rate Actual Bit RateG.711 80kbps 80kbpsG.723.1 6.3kbps 17kbpsRTAudio 16KHz 29kbps 45kbpsRTAudio 8KHz 11.8kbps 27.8kbps

A 1-Gb network is recommended for handling audio traffic. Note also that the number of hops between Speech Server and SIP peers should be kept to a minimum because latencies can have a negative effect on application performance.

The RTAudio codecs are provided for interoperation with Microsoft Office Communicator and give better audio quality and compression than that other codecs, but at the expense of greater CPU usage. RTAudio is enabled for receiving only, allowing higher quality voice mail or other recordings from Office Communicator. The speech recognition engine does not benefit from the higher audio quality in this scenario.


Example Application Capacity planningNow that we have covered all aspects of the process for estimating capacity for a Speech Server-based deployment, let's put these principles and methods into practice by taking a look at how to apply these to three different application scenarios.

Simple Application – Store LocatorA department store has branches throughout the country. To help customers locate the branch near them and hear its hours of operation, the company is implementing a speech-enabled IVR application. Customers call a toll-free number, enter their postal code using their touchtone keypad, and the system responds with a text-to-speech readout of the local store’s hours and address.

The required performance is: Call pass rate > 95%

UPL < 2 seconds

Processor utilization < 70%

The deployment considerations are: Completing the application takes approximately 1.6 minutes.

The average call consists of three dialog turns.

The application is not mission critical; if the application fails, customers can locate the store in other ways, such as a telephone directory.

Fault tolerance is not required.

The application is relatively simple with low resource consumption.

At most 300 calls an hour are expected, based on current calls to the department store office.

The required call pass rate is 95 percent, so acceptable call blockage (at peak) is 5 percent or 0.05 in decimal.

SolutionUsing the Peak Call Load worksheet in the Speech Server Capacity Planning toolEnter the expected peak calls per hour, average call duration, and acceptable call blocking rate.

Expected Peak Calls per hour 300Average Call Duration (in seconds) 96

Acceptable call blocking (busy signal) rate 0.05Erlangs 8.0

Number of channels needed (based on Erlang B formula) 13

This worksheet shows that your peak traffic is equivalent to 8 Erlangs, for which you need to be able to handle 13 concurrent channels (calls).

Using the Application Load worksheet in the Speech Server Capacity Planning toolBecause this application is only doing DTMF recognition, the first question – “How much time is spent recognizing” – should be answered “Low”. Recognition of DTMF grammars requires very little CPU resource.


Because the complexity of DTMF grammars is also very low, the second question – “How complex are the grammars?” should be answered “Low”.

The third question – “How much additional processing does the application have to do?” – takes into account any work the application has to do on top of regular speech dialog activity. In this case, all the application has to do is a simple database lookup of a ZIP code to get a resulting store ID, which can then be used to reference a prompt for playback; this question would be answered “Low”.

How much time is spent recognizing versus speaking or recording?

1. Low (e.g. recording voicemail, reading emails)

How complex are the grammars 1. Low (e.g. digits, yes/no, simple menus)How much additional processing does the application

have to do? 1. Low (< 5% CPU)

Resulting Application complexity: Low

Expected Channel Capacity per server* 400

So this application gets an overall classification of “Low” complexity and the resulting expected channel capacity per server for this application is up to 288 concurrent channels.

Using the Disk Space worksheet in the Speech Server Capacity Planning tool

Enter the expected average number of dialog turns per call, the expected average number of calls per day, the expected peak number of calls in a day, and the number of day’s data you want to retain in the database.

Average number of turns per call 3 Expected number of calls per day 10000

Expected peak number of calls in a day 20000 Number of days data to retain for reporting 365

Number of calls with recognizer audio logged for tuning 1500

Requirements for Tuning scenarios

Space needed on MSS Servers for log files 0.4 GB

Space needed for Tuning DB 0.

4 GB

Requirements for Reporting scenarios Space needed on MSS Servers for log files 0.3 GB

Space needed for Reporting DB 26 GB

This shows that for tuning purposes you need to allow 0.4 GB of disk space on the computers running Speech Server for each day’s log files (this space would be spread across the computers if you have more than one computer running Speech Server) and 0.4 GB of disk space for your tuning database. When you have completed the tuning phase of your deployment, you need about 0.3 GB of disk space for each day’s log files on the computers running Speech Server and 26 GB of disk space for your reporting database.

Using the Servers worksheet in the Speech Server Capacity Planning tool


The Servers worksheet gives you the resulting forecast number of the computers running Speech Server required to support the estimated call load and application complexity of your deployment. You can specify additional servers for the optional roles to arrive at a total number of servers required for your deployment.

In this case, only a single computer running Speech Server is need to meet the capacity needs of the application and a second server is needed to run SQL Server for the tuning and reporting databases.

Speech Server servers 1Add additional servers for redundancy (optional) 0

Reporting/Tuning Servers (optional) 1TIM/TIMC Server (optional) 0

MOM Server (optional) 0Total Servers 2

Medium Complexity Application – Flight BookingA major travel agency has a speech-enabled application that allows a caller to find and purchase tickets for air travel. The application prompts for departure and arrival dates and locations and reads back to the user a list of flight options. A significant proportion of the call is spent reading information back to the user. When the user identifies a flight he wants to book, additional questions such as class of ticket, food requirements, and payment details are asked.

The deployment considerations are: Completing the application takes approximately 8 minutes.

The average call consists of 20 dialog turns.

The application is mission critical and requires a high level of availability.

The application processing overhead is low – figuring out which flights are available is done by calling a remote web service.

9500 calls an hour are expected at peak.

The average daily traffic is 20,000 calls.





This worksheet shows that your peak traffic is equivalent to 1266.7 Erlangs, for which you need to be able to handle 1296 concurrent channels (calls).

Using the Application Load worksheet in the Speech Server Capacity Planning toolThis application does a reasonable amount of recognition but the majority (~75 percent) of the call time is spent providing information back to the caller. The first question – “How much time is spent recognizing” – should be answered “Medium”.


This application uses some moderately complex grammars, such as a city names grammar of several thousand nodes and the date grammar (dates can be expressed in many different ways). The second question – “How complex are the grammars?” – should be answered “Medium”.

In this case the application simply calls a remote Web service that does all the work of identifying the available flights on given dates between given cities – there is little processing other than the dialog itself that the application has to do on the computer running Speech Server. The question “How much additional processing does the application have to do?” would be answered “Low”.

How much time is spent recognizing versus speaking or recording? 2. Medium

How complex are the grammars 2. Medium (100-1000 nodes; e.g. city name)How much additional processing does the

application have to do? 1. Low (< 5% CPU)

Resulting Application complexity: Medium

Expected Channel Capacity per server 133

So this application gets an overall classification of “Medium” complexity and the resulting expected channel capacity per server for this application is up to 96 concurrent channels.


Enter the expected average number of dialog turns per call, the expected average number of calls per day, the expected peak number of calls in a day, and the number of days data you want to retain in the database.







5 GB



This shows that for tuning purposes you need to allow 2.0 GB of disk space on the computers running Speech Server for each day’s log files and 2.5 GB of disk space for your tuning database. When you have completed the tuning phase of your deployment, you need about 1.9 GB of disk space for each day’s log files on the computers running Speech Server and 172 GB of disk space for your reporting database.



The Servers worksheet gives you the resulting forecast of 27 computers running Speech Server to support the estimated call load and application complexity of this deployment.


Reporting/Tuning Servers (optional) 1TIM/TIMC Servers (optional) 0

MOM Server (optional) 1Total Servers 30

High Complexity Application – “How May I Help You” Call DirectorThis application is simple in that it asks just a single question, but complex in that the question is a sophisticated conversational grammar with many training sentences and many keywords nodes. In addition, a large proportion of the call time is spent recognizing speech from the caller. When the user input has been mapped to a particular keyword (such as billing, technical support, returns, sales, close account, or add service), the call is redirected to a particular call queue and the application is complete.

The deployment considerations are: Completing the application takes approximately 20 seconds.

Each call consists of a single dialog turn.

The application is mission critical and requires a high level of availability.

The application processing overhead is low – simply mapping a keyword to a call queue number.

2500 calls an hour are expected at peak.

The average daily traffic is 12,000 calls; peak daily traffic is 18,000 calls.





This worksheet shows that your peak traffic is equivalent to 13.9 Erlangs, for which you need to be able to handle 23 concurrent channels (calls).

Using the Application Load worksheet in the Speech Server Capacity Planning toolThe first question – “How much time is spent recognizing?” – should in this case be answered “High” because the application is doing little else.

The second question – “How complex are the grammars?” should be answered “High” given that it is using a fairly extensive conversational grammar.

The question “How much additional processing does the application have to do?” should be answered “Low” for this application.

How much time is spent recognizing versus 3. High (e.g. booking a flight, HMIHY)


speaking or recording?

How complex are the grammars3. High (e.g. directory assistance, how may I help you)

How much additional processing does the application have to do? 1. Low (< 5% CPU)

Resulting Application complexity: High

Expected Channel Capacity per server* 67

This application gets an overall classification of “High” complexity and the resulting expected channel capacity per server for this application is up to 48 concurrent channels.


Enter the expected average number of dialog turns per call, the expected average number of calls per day, the expected peak number of calls in a day, and the number of days data you want to retain in the database.







1 GB



This shows that for tuning purposes you need to allow 0.2 GB of disk space on the computers running Speech Server for each day’s log files and 0.1 GB of disk space for your tuning database. When you have completed the tuning phase of your deployment, you need about 0.1 GB of disk space for each day’s log files on the computers running Speech Server and 10 GB of disk space for your reporting database.


The Servers worksheet gives you the resulting forecast of 27 computers running Speech Server to support the estimated call load and application complexity of this deployment.


Reporting/Tuning Servers (optional) 1TIM/TIMC Servers (optional) 0

MOM Server (optional) 1


Total Servers 30

Optimizing Speech Application Performance – Tips and Tricks

Consider Converting Conversational Understanding Grammars to GRXMLA Conversational grammar while much easier to develop and more robust to varied user input, is usually more expensive to process than a regular GRXML grammar. Think carefully about where Conversational grammars are used in your application to be sure that their use is appropriate. For example, if a question is likely to have a simple Yes or No response, there is little benefit (and some computational cost) to using a conversational grammar.

Minimize the Amount of Logging DoneIf you no longer need to do any tuning of your application, make sure that audio sampling and other logging options required for tuning are turned off.

If you have no reporting requirements of your application, you can turn off logging altogether.

Prefer Recorded Prompts over TTSNot only do prerecorded prompts result in a more pleasant sounding application, but the CPU resources required to render prerecorded prompts are lower than for synthesizing speech. Converting TTS prompts to recorded prompts is one way of fine tuning the performance of your application.

Test Your Application for Memory LeaksAny memory leaks caused by your application will cause performance to degrade over time and eventually result in a recycle of the speech service.

Office Communications Server 2007, Speech Server Capacity Planning ToolNote that you will need to enable macros to be able to use this workbook in Excel.


Capacity Planning for Speech Server 2007download.microsoft.com/download/a/4/4/a441d741-ff7… ·...

Documents

Transcript of Capacity Planning for Speech Server 2007download.microsoft.com/download/a/4/4/a441d741-ff7… ·...