On SCTP multi-homing performance - · PDF fileOn SCTP multi-homing performance ... Results of...

Telecommun Syst (2006) 31: 141–161

DOI 10.1007/s11235-006-6517-7

On SCTP multi-homing performance

Andreas Jungmaier · Erwin P. Rathgeb

Received: February 14, 2006 / Accepted: February 15, 2006C© Springer Science + Business Media, LLC 2006

Abstract The Stream Control Transmission Protocol (SCTP) is a general purpose trans-

port protocol featuring multi-homing support, message oriented and more flexible data de-

livery mechanisms than TCP, and an increased protection against well-known attacks. Orig-

inally developed for the transport of Signaling System No. 7 messages, e.g. MTP level 3

user primitives, over IP networks, SCTP has evolved to a general purpose transport proto-

col with a wide field of applications. With respect to multi-homing, the current SCTP stan-

dard uses this feature for network level redundancy only. Therefore we propose and evaluate

in this contribution mechanisms for the application-specific optimisation of the SCTP pro-

tocol behaviour with respect to its multi-homing capabilities. To satisfy the extremely strict

performance requirements for signalling transport, efficient load sharing among all active

links is also highly desirable in SCTP scenarios. To this end, we propose a novel, improved

load sharing algorithm for SCTP with path based selective acknowledgements which avoids

some of the drawbacks of the existing algorithms and achieves an increase in throughput.

Results of a comparative simulation study are presented to demonstrate the benefits of our

algorithm.

Keywords SCTP . Transport protocol . Multi-homing . Load sharing . Signalling transport

1. Introduction

The Stream Control Transmission Protocol (SCTP) [14] is the third IP-based transport protocol

defined by the IETF, besides TCP and UDP and became a full standards track RFC (i.e. a valid

internet standard) in October 2000.

The reason for introducing SCTP was the expected migration of public voice services from

circuit switched ISDN platforms onto enhanced IP-based next generation networks (NGN)

providing Voice-over-IP (VoIP) services. In such a migration scenario, where classical and

A. Jungmaier · E. P. RathgebComputer Networking Technology Group, IEM, University of Duisburg-Essen Ellernstr. 29, D-45326 Essene-mail: {ajung,rathgeb} @iem.uni-due.de

Springer

142 A. Jungmaier, E. P. Rathgeb

new networks will coexist for an extended period of time, seamless interworking of the two

concepts is a crucial issue. More important and demanding than transcoding of user data is

the aspect of signalling interworking. Assuming situations where ISDN islands are intercon-

nected via IP-based networks or where upper layer servers or data bases, e.g. for Intelligent

Network (IN) functions or mobility support for cellular networks, are located in the IP do-

main, fully functional and highly reliable and performing signalling transport via IP is required

[10].

Designed by the Signalling Transport working group of the IETF in particular for the trans-

port of signalling data, SCTP is – nevertheless – a general purpose transport protocol provid-

ing a more flexible data delivery service than TCP by using a specific SCTP stream layer,

and increased fault tolerance by allowing network level redundancy (SCTP multi-homing).

Therefore, SCTP is also a candidate transport protocol for applications beyond signalling

transport. One example is the ‘Reliable Server Pooling’ [17] concept of the IETF RSer-

Pool working group which requires the use of SCTP. Other working groups also included

SCTP as an option in their documents, e.g. for the new AAA protocol Diameter. Two ma-

jor SCTP extensions have been defined so far, providing a partially reliable delivery option

(PR-SCTP [13]) on one hand and the ability to dynamically add and drop IP addresses from

a multi-homed SCTP association (Dynamic Address Reconfiguration [12]) on the other. With

these extensions, there are even more areas in which the use of SCTP is promising, including

e.g. novel concepts for IP mobility support [2].

Many of the defined – as well as potential – SCTP applications, and in particular signalling

transport, have both strict performance and reliability requirements which must be specifically

supported by SCTP. In this respect, SCTP multi-homing scenarios providing multiple, redundant

paths through the IP network are of particular interest. However, the current SCTP standard only

uses redundant network paths as backup for retransmission of lost packets and to provide fast

switchover in case of network failures. Under normal operating conditions, one path carries

the full traffic load while the others remain unused except for heartbeat messages probing their

availability. Considering potential performance improvements, it is obvious to suggest to also

use the redundant links for load sharing to achieve a more balanced network load and to increase

application throughput in case of bandwidth limited links. However, simply distributing the load

evenly to all redundant links is not sufficient, since the systematic violation of packet sequence

integrity resulting from different end-to-end delays on the redundant paths will severely interfere

with the window based SCTP flow control and the acknowledgment based error control and

recovery mechanisms. Therefore, corresponding adaptations have to be introduced in these areas

to allow to actually benefit from load sharing.

After an overview of the relevant SCTP features and mechanisms and a review of the

tools and scenarios used to investigate SCTP multi-homing and SCTP-based load sharing,

we will look into possible optimisations to SCTP multi-homing algorithms and especially dis-

cuss SCTP-based load sharing in some detail. Based on a review of the already existing SCTP

load sharing proposals, we will propose a modified and improved algorithm based on path

specific acknowledgements and confirm its benefits by providing a quantitative performance

comparison.

2. An overview of the stream control transmission protocol

SCTP connections (named associations in SCTP parlay) are established after a 4-way handshake

between two SCTP endpoints (i.e. protocol instances), usually a client and a server.

Springer

On SCTP multi-homing performance 143

The association setup request contains a list of valid transport addresses1. The set of possible

combinations of n transport addresses of the server with m transport addresses of the client

defines all possible n × m path identifiers of one association. Thus, SCTP explicitly supports

multi-homed endpoints allowing to use multiple paths through the IP network to provide end-

to-end redundancy and fault tolerance.

SCTP is a message-oriented, reliable transport protocol which – other than TCP – preserves

message boundaries. The protocol may multiplex several short messages into one SCTP packet

(subsequently transmitted as IP payload) to reduce transmission overhead for small (signalling)

messages. By using MTU discovery, SCTP avoids IP fragmentation.

2.1. The SCTP packet format

SCTP packets consist of a common header, followed by a variable number of information units,

which are named chunks. There are two types of chunks: control and data chunks. Data chunks

contain the actual user messages, while control chunks are used to support the peer-to-peer

protocol. Control chunks are provided, e.g., for selective acknowledgements, monitoring of

peer reachability with heartbeats, setup and termination of associations, error messages and,

optionally, protocol extensions.

The SCTP common header contains source and destination port numbers, similarly to TCP

and UDP, and a 32 bit checksum. Moreover, it carries a 32 bit value named tag which is a

randomly chosen value exchanged with the peer endpoint at association start up. The tag protects

associations from ‘blind attacks’, i.e. where the attacker tries to blindly insert forged SCTP

packets into an association. As SCTP is a transport layer protocol (much like TCP), it does not

protect communication from man-in-the-middle attacks.

2.2. SCTP data transmission

Reliable data transmission involves two chunk types: the data chunk and the selective acknowl-

edgement chunk (SACK). Data chunks carry higher layer user data and a 32 bit transmission

sequence number (TSN). Each data chunk must be acknowledged by the receiver. If two pack-

ets with data chunks arrive within tsack(usually, tsack = 200 ms), the receiver returns a SACK

immediately after the second packet.

SCTP uses multiple selective repeat mechanisms for error recovery. The SACK contains a

cumulative TSN to be acknowledged (CTSNA), indicating the highest TSN that has been received

in sequence without interruption. Additionally, the SACK acknowledges all other data chunks

with higher TSNs that have been successfully received in a so-called gap report structure. The

sender interprets the CTSNA and the gap reports and when a TSN has been reported missing for

the fourth time, a fast retransmission is triggered.

The amount of data that may be outstanding (i.e. sent but not yet acknowledge) is limited by

the receiver window. The current value of the receiver window is also contained in the SACK

chunk. Flow control parameters are separately computed for each path (specific pair of sender

and receiver transport addresses) to the peer. These parameters are the congestion window cwnd,

and the slow start threshold ssthresh. Similar to TCP, when the cwnd for a path is less than the

current ssthresh, the path is said to be in ‘slow start’, else in ‘congestion avoidance’ mode.

1 a transport address is defined as the combination of one of the endpoint’s (multiple) host IP addresses with a portnumber that is common for all transport addresses belonging to an endpoint

Springer


The cwnd for a path is additively increased (by at most one MTU) when new data has

been acknowledged and the CTSNA value has advanced. It is decreased (halved) when fast

retransmissions are triggered, or a timeout has occurred (in which case the cwnd is reset to

one MTU). Thus, SCTP uses an Additive Increase, Multiplicative Decrease (AIMD) algorithm,

similarly to TCP, and its flow and congestion control is therefore TCP-compatible.

2.3. Message streams

SCTP provides its user with flexible methods of data delivery by separating the reliable transfer

of messages between endpoints (see Section 2.2) from the actual delivery to the application.

This is achieved at the cost of introducing an internal multiplexing layer for so called streamsidentified by 16 bit stream identifiers. To be able to perform resequencing and delivery on a

per-stream basis, 16 bit stream sequence numbers and stream identifiers are provided in addition

to TSNs, and are transported within data chunks.

SCTP streams are effectively unidirectional channels, within which messages are usually

transported in sequence. The application may also request a message to be delivered by an

unordered service, which can reduce blocking effects in case of message loss, since the reordering

mechanism of one stream is not affected by that of another stream that may have to wait for a

retransmission of a previously lost data chunk.

2.4. Multi-homing

Support for multi-homing refers to the capability of SCTP to establish communication between

hosts that have one or more IP addresses, and to make use of multiple paths between these

hosts. To ensure compatibility with established transport protocols, such as TCP, only one path is

chosen as primary path and subsequently carries the main traffic load. The multi-homing concept

is explained in some more detail in Section 4.

2.5. Extensions to SCTP

SCTP is extensible through the use of new control chunks (cf. Section 2.1). In the following we

present two important protocol extensions one of which has already been standardised within

the IETF, while the other is still being actively developed in standardisation.

2.5.1. Partially reliable message transfer (PR-SCTP)

The extension for partial reliability [13] specifies a mechanism an endpoint can use to indicate

to its associated endpoint that some lost data chunks will not be retransmitted. A new parameter,

called fwtsn is defined for that purpose together with a new control chunk type, the Forward TSNchunk, to transport it. The sender of this chunk does not need to retransmit any data chunk with

a TSN less than the fwtsn value. The receiver of the Forward TSN chunk advances its CTSNA

(see Section 2.2) to fwtsn, and further if possible, and stops indicating the skipped data chunks

as missing.

This mechanism is advantageous, e.g. in the following scenarios:

� In times of congestion, retransmitted data may add to the congestion. By skipping the retrans-

mission, the network is relieved of additional traffic caused by retransmissions.

Springer

On SCTP multi-homing performance 145� User data may have a limited time of validity (e.g. packetized voice samples, or sensor data when

a more recent sensor reading is available). After that time, there is no point in retransmitting

this obsolete data.

SCTP implementations conveniently allow for specifying a lifetime for data that is to be sent.

After expiry of this lifetime, data may not be sent or may not be retransmitted again.

2.5.2. Dynamic address reconfiguration

The “SCTP Dynamic Address Reconfiguration” extension proposed in [12] may be used

to dynamically add or remove addresses from an established SCTP association. Moreover,

this extension allows to signal to the peer endpoint which IP address is preferrable as

the primary address. Thus, in a handover situation where connectivity to two networks may

be given, a mobile device can signal to its peer which network is the preferred destination to

send data to and thus improve data throughput.

This is achieved by the use of the new Address Configuration Change (ASCONF) control

chunks, which contain a variable number of request parameters for the peer. These either signal

requests for� an addition of an address,� a removal of an address, or� setting a primary address.

This mechanism can also be used in cases where the originating source IP address of the AS-

CONF request does not match any known SCTP association (when addresses have changed

before this could be signaled to the peer endpoint): Usually, the association to which a packet

belongs is determined by the combination of source and destination IP addresses and port of a

received SCTP packet. Thus, for an ASCONF request it may not be possible to find the proper

association. For this end, the ASCONF contains an additional address parameter which allows

for the receiver of the ASCONF to determine the association. This parameter must contain an

address that was known to belong to the concerned association beforehand.

3. Implementation and deployment

This section will briefly discuss some SCTP implementations available to date (May 2005), and

looks into some tools we developed specifically for protocol testing purposes. In order to be able

to evaluate SCTP in a wider setting, we also created an OPNET based discrete event simulation

model discussed below.

3.1. SCTP kernel implementations

Since SCTP is a transport protocol based on IP, the ususal place for a working implementation is

the kernel of an operating system (much like it is the case for TCP). Within the SCTP community,

a number of kernel implementations have been developed in the past. As we cannot discuss all

implementations here, we will focus on two kernel implementations that were developed for

open source operating systems, namely FreeBSD and Linux.

The FreeBSD implementation comes with the KAME network stack readily available for

most BSD based unices (e.g. FreeBSD, OpenBSD and NetBSD). It was mainly developed by

Randall Stewart, one of the main authors of the SCTP RFC [14], and Peter Lei (both of Cisco),

Springer


and is the most feature complete, stable and up-to-date implementations available to date. It

supports both the SCTP extension for partially reliable transfer of messages and the dynamic

address reconfiguration (cf. Section 2.5).

The Linux kernel SCTP implementation has been sponsored by IBM, Motorola, Nokia and

Intel, and is currently being maintained by Sridhar Samudrala (of IBM). It is part of the latest

2.6 Linux kernel, quite stable, and features around 25000 lines of C code.

Both implementations offer a standardized, TCP- or UDP-compatible socket-based interface

for the use of most SCTP functions (cf. [15]), and come with a library that can be linked to

applications to use the more advanced SCTP features.

3.2. SCTPLIB – An open source SCTP implementation

As part of a cooperation with an industrial partner, SCTPLIB, a full standards-compliant SCTP

implementation – available for free as open source [9] – together with a suite of test applications

was created and further developed by our group and tested at several international interoperability

meetings. SCTPLIB is written in C, runs on Linux, FreeBSD, MacOS X, Solaris, and Windows,

supports the PR-SCTP and the dynamic address reconfiguration extensions (cf. Section 2.5), and

has around 12.000 lines of code.

The portability of SCTPLIB comes at the price that it is not a kernel but a userland imple-

mentation that relies on a privileged server process for handling SCTP network events (SCTP

server) and non-privileged applications that link to the SCTPLIB library. The data and primitive

exchange between SCTP server and SCTP applications is realized by a local interprocess commu-

nication mechanism also implemented in the SCTPLIB. The privileged server handles network

events (incoming/outgoing SCTP packets, ICMP packets), local IPC events for communication

with user processes, and timers, and distributes events and data to the proper user application

processes. The user programs need to register with the SCTP server, and implement the base

SCTP protocol (cf. Figure 1). All data that is exchanged between applications and server uses the

local IPC mechanism, which may adversely affect performance in certain high load scenarios.

3.3. Simulation environment

In order to be able to evaluate SCTP in a greater parameter space and within more elaborate

network topologies, a discrete event simulation model was developed based on the simulation

tool OPNET modeler [16].

To evaluate the SCTP performance for multi-homed endpoints in the case of path failures,

and for investigating different load sharing algorithms, we decided to create our own SCTP

simulation model, since at the time only an NS-2 SCTP model was available as open source

which implemented SCTP as an extension to TCP and did not readily allow for modeling native

multi-homing. Our OPNET [16] based SCTP simulation model is loosely based on the SCTPLIB

implementation mentioned above and interfaces with the native OPNET network layer models

of IP. Therefore, our SCTP model can be used with all IP-based OPNET node models, and

extensions of this SCTP model for use with more than two IP addresses are trivial.

4. Investigation of SCTP multi-homing

The SCTP support for multi-homing, which is enforced by the endpoints end-to-end, is one of

the key features2 of SCTP, that distinguish this transport protocol from other reliable transport

2 besides the support for independent message streams

Springer


Fig. 1 SCTPLIB implementation structure

protocols as, e.g., TCP. Multi-homing refers to the capability of SCTP to establish communication

between hosts that have one or more IP addresses, and to make use of multiple paths between

these hosts. For an endpoint with an established association, the notion of a path is equal to that

of the transmission route towards one destination transport address of its peer endpoint. Thus, a

multi-homed endpoint may reach the peer via a number of different paths. One of the paths to

the peer is chosen as primary path and subsequently carries the main traffic load. The user or

application may, however, explicitly request to use a path other than the primary for transmission.

When the primary path carries the main load, growth of the congestion window cwnd only

occurs for this path which is desirable for achieving fairness to TCP. Other paths are then only

used for data retransmissions and heartbeat control chunks. Compared to protocols that do not

support multi-homing, sending retransmissions on paths that are not congested will have an

advantageous effect on recovery from packet loss [8].

As shown in the following sections, SCTP multi-homing can be used for providing either� network level redundancy by ensuring the use of physically separate network paths,� higher performance compared to, e.g., TCP especially for demanding applications such as

signalling transport, and� with certain modifications to standard SCTP, a distribution of the traffic load which ensures an

optimal use of existing network resources.

4.1. Path and peer monitoring

By default, SCTP endpoints monitor peer reachability and path states by regularly sending

heartbeat control chunks to all of their destination addresses. These are immediately answered

Springer


by the peer with heartbeat acknowledgement control chunks. For each path, the endpoint will keep

an error counter that is being incremented should the endpoint not receive an acknowledgement

before a timer elapses. If the error counter exceeds a threshold (which is a configurable parameter),

the state of the path will be set to unreachable. The endpoint will then continue to send heartbeats

to this address, allowing to reinstate the path status to reachable later.

Since endpoints should send their acknowledgements of data and heartbeat control chunks

back to the originating peer destination address [14], paths that are actively used for data trans-

mission need not be monitored by heartbeat chunks.

An SCTP endpoint also keeps track of the number of consecutive retransmissions of data

or heartbeat chunks sent to the peer endpoint on an association level (as opposed to the path

level). Each time a chunk is acknowledged timely, the corresponding association error counter

is cleared. Once the counter exceeds the association error limit, the peer endpoint is considered

unreachable, and the association is closed.

4.2. SCTP behaviour in case of path failure

Assuming a signalling session between two dual-homed hosts, A and B, we investigated the

behaviour of SCTP in case of a failure of the primary path. The relevant SCTP parameters in this

case are the protocol parameters� RTOmax, the maximum retransmission timeout; defines the maximum time after which a re-

transmission occurs, if no SACK has been received after a data chunk was sent.� RTOmin, the minimum retransmission timeout; defines the minimum time after which a re-

transmission occurs.� PRL, the path retransmission limit; integer value that indicates the threshold for the number

of retransmissions that must be exceeded before a path is considered out of service.� ARL, the association retransmission limit; integer value that indicates the threshold for the

number of retransmissions that must be exceeded before an association is considered out of

service, and subsequently closed.

The current RTO and path error counters are computed separately for each path from an asso-

ciation endpoint to its peer and the association error counter is computed once per association.

Error counters are reset whenever a data or heartbeat chunk has been acknowledged (for the

association and for the path concerned).

For ensuring TCP compatibility, the recommended parameter settings for standard SCTP are

an RTOmin of 1 s, an RTOmax of 60 s, a PRL of 5 and an ARL of 10 [14]. With these values,

a path failure is only recognized and indicated to the upper layer (i.e. the application) after

1 + 2 + 4 + 8 + 16 + 32 = 63 s. On the other hand, requirements for transport of signalling

data in the Message Transfer Part (MTP) of the Common Channel Signalling System No. 7 [4]

in case of an MTP 2 link failure are such that a change-over process to a backup MTP 2 link must

take no longer than 800 ms. The change-over procedure is the process of reporting the MTP 2

link failure to the upper layer (MTP 3), retrieving all messages not yet sent, and re-sending these

messages on a secondary (active) MTP 2 link. Therefore, the SCTP parameters obviously need

some tuning in order for SCTP to be applicable to signalling transport of MTP 2 messages over

IP-based networks, as, e.g., for the MTP2-User Peer-to-Peer Adaptation Layer [3].

Possible suitable parameter settings were evaluated using the OPNET-based SCTP simulation

model (cf. Section 3.3) based on a simple topology with two dual-homed hosts, A and B, that were

connected by two distinct transmission links. These had a fixed delay of approximately 10 ms,

and a link bandwidth of 2.048 MBit/s. The host application mimicks the relevant behaviour of an

MTP 3 instance with an underlying M2PA stack relying on SCTP for the message transport. The

Springer


application performs a unidirectional data transfer of signalling messages from A to B featuring

an exponentially distributed traffic pattern with a mean message arrival rate of 100 messages

per second with 500 bytes per message. The parameters were chosen to model a lightly loaded

broadband signalling relation where an IP/SCTP/M2PA-based signalling endpoint is connected

to a signalling gateway located less than 100 km away. For a simple IP network with few hops,

the chosen link delay time is then appropriate. Two possible scenarios are being investigated:

1. The MTP 3 has only one link, and the underlying M2PA relies on the dual-homed SCTP

association to provide redundancy. This scenario will be named change-over scenario 1, in

the following.

2. The MTP 3 has two links, and the underlying M2PA relies on MTP 3 to handle link layer fail-

ures. Therefore, two single-homed SCTP associations are sufficient for providing redundancy.

In the following, this is change-over scenario 2.

Figure 2 visualizes the behaviour for the two different scenarios by showing the message

delay as perceived by the receiver as a transient function, as well as a moving average of the

message delay as a function of time. After the link failure and when native SCTP dual-homing

is used as in scenario 1, the receiver gets the first messages earlier, whenever data is re-sent

over the secondary backup path. Also, the situation is back to normal fairly quickly, as the first

successful retransmissions reduce the send queue sizes already. When the failure recognition is

handled by the upper layer, and each association is single-homed only, the transmission queue

size of the first association starts to build up after the first path fails. Subsequent retransmissions

0

100

200

300

400

500

600

700

500 0 500 1000 1500

Mes

sag

e D

elay

[m

s]

Time (ms)

Moving Average (10 Values)Individual Message

0

100

200

300

400

500

600

700

0 500 1000 1500

Mes

sag

e D

elay

[m

s]

Time (ms)

Moving Average (10 Values)Individual Message

(a)

(b)

Fig. 2 Typical change-overbehaviour in scenarios 1 (top) and 2(bottom)

Springer


are unsuccessful, as they are also sent over the failed path. Subsequently, failure recognition

may happen slightly faster compared to scenario 1 (retransmission timer for path 1 is started

earlier, and therefore elapses earlier). Once the failure has been recognized, the second SCTP

association must go through slow start first, and can then quickly send all queued messages over

path 2.

The parameters that were investigated and varied throughout the following simulation runs

are the PRL values for scenario 1 and ARL values for scenario 2 (between 1 and 5). The results

are plotted depending on the configurable RTOmax value (between 100 ms and 500 ms), and all

Figures show 99% confidence intervals over 20 simulation runs. RTOmin is assumed to be 40 ms,

i.e. twice the RTT. It should be noted that a low RTOmin setting is a requirement for achieving

a low change-over time. Figure 3 shows the values for the maximum message delay during

the change-over process in both scenarios. Interestingly, both scenarios achieve comparable

values for the maximum message delay over the simulated parameter space, although scenario

0

500

1000

1500

2000

0.1 0.2 0.3 0.4 0.5

Mess

age D

ela

y [m

s]

RTOmax [s]

PRL=5PRL=4PRL=3PRL=2PRL=1400 ms

0

500

1000

1500

2000

0.1 0.2 0.3 0.4 0.5

Mess

age D

ela

y [m

s]

RTOmax [s]

ARL=5ARL=4ARL=3ARL=2ARL=1400 ms

(a)

(b)

Fig. 3 Message delay during the change-over process (scenario 1 top and 2 bottom)

Springer


0

500

1000

1500

2000

2500

0.1 0.2 0.3 0.4 0.5F

ailo

ver

Dura

tion [s]

RTOmax [s]

PRL=5PRL=4PRL=3PRL=2PRL=1800 ms

0

500

1000

1500

2000

2500

0.1 0.2 0.3 0.4 0.5

Failo

ver

Dura

tion [s]

RTOmax [s]

ARL=5ARL=4ARL=3ARL=2ARL=1800 ms

(a)

(b)

Fig. 4 Duration of the change-overprocess (scenario 1 top and 2bottom)

2 performs slightly worse (by 50–100 ms). For staying safely below a 400 ms delay threshold,

only small values of ARL/PRL can be used (i.e. ARL/PRL=1), or for PRL=2 in scenario 1,

RTOmax has to stay below 200 ms.

Figure 4 shows the duration of the change-over process in both scenarios. This process is

assumed to have terminated after the size of the sending queue of the remaining active SCTP

association has gone back to a normal state after the path failure was recognized and after the

change-over procedure has started. Both figures also show the 800 ms threshold that corresponds

to the limit imposed by [4] on the duration of the MTP change-over procedure.

From the results presented in Fig. 4 it becomes clear that for both scenarios, the change-over

procedure can successfully be finished within the 800 ms limit, provided either the RTOmax

parameter is set sufficiently low (i.e. well below 150 ms) or the ARL/PRL parameter is set to

a low value (i.e. ARL/PRL ≤ 2). Also, while scenario 1 achieves slightly shorter maximum

message delay, the overall duration of the change-over is slightly shorter for scenario 2.

Springer


Fig. 5 Dual-homed simulation scenario with satellite and backup link

4.3. An optimisation for SCTP multi-homing

In the following section we present a simulation study of a modification of the SCTP retrans-

mission algorithm, comparing it to Standard SCTP. This modification can lead to a vastly

improved protocol behaviour, relying on the fact that a communication endpoint can choose

an optimal path depending on the situation. This leads to significant improvements when path

delay characteristics differ by an order of magnitude. While we first proposed this optimization

in [8] and presented some initial investigations of the behaviour within a testbed environment,

here more detailed simulation studies have been performed and are presented below.

We assume a unidirectional communication between two dual-homed hosts A and B as shown

in Fig. 5. The two hosts are connected by a primary path which is a broadband T1 satellite link

with a bandwidth of W ′ = 1.544 MBit/s, and a secondary backup link based on a dedicated ISDN

channel with a bandwidth of W ′′ = 64 kBit/s. The satellite link features a long transmission delay

D′ = 250 ms, while the secondary ISDN link has a short delay of D′′ = 10 ms. Furthermore, we

assume a fixed message loss probability on the primary path over a period of several seconds,

e.g. due to bad weather conditions. Host A sends messages with an arrival rate of approximately

136 messages per second, with a negative exponential distribution of interarrival times (i.e.

λ = 0.00736). The message length has a triangle distribution ranging from 20 to 1400 bytes,

with an average of 710 bytes. This results in the source at host A creating an average load of

96.467 KByte/s which makes up for an average link load of approximately 50% on the primary

link. We investigated the SCTP behaviour for different bit error rates (BER) on the primary

link, ranging from 0 to 10−5 (the latter results in one out of 16 messages being dropped due

to transmission errors). For clarity, we present a range for the BER from 0 to 2 × 10−6 in the

following figures only, which corresponds to an average of 1 message out of 82 being dropped

in this scenario. All Figures show a 99% confidence interval over 20 simulation runs.

The behaviour of standard SCTP in the face of message loss is that the receiver will notice

a gap in the sequence of received data chunks, and subsequently reports the missing TSN in all

returned SACK chunks. Furthermore, the receiver will start returning one SACK chunk for each

incoming packet that contains a data chunk, until the gap is closed again3. As per RFC 2960, the

3 as opposed to one SACK chunk for every second incoming packet containing data chunks in the normal case

Springer


0

20000

40000

60000

80000

100000

0 2e-07 4e-07 6e-07 8e-07 1e-06 1.2e-06 1.4e-06 1.6e-06 1.8e-06 2e-06

Av

g.

SC

TP

Th

rou

gh

pu

t [b

yte

s/s]

Bit Error Rate

SCTP (optimized)SCTP (standard)

Fig. 6 Dual-homed simulation scenario: Throughput vs. BER

receiver returns any packet with a SACK chunk (including those indicating a gap) to the source

address of the incoming data packet that triggered the SACK. Once the sender has received four

SACK chunks with gap reports reporting the same TSN missing, it will immediately re-schedule

the missing data chunk, and retransmit it as soon as possible (i.e. before any new data chunk) using

an alternative path. Assuming that the satellite link is the primary SCTP path, the receiver would

send the SACK chunks back via the satellite path as well, so that the sender can only react to the

message loss after a full RTT over the long delay path (and after having received four SACKs).

The proposed modification, named Fast-SACK, uses the link with the shortest link delay (or

with the shortest RTT when the link delay is unknown) not only for heartbeat messages and for

the actual retransmission of previously lost data packets, but also for returning SACK chunks.

Thereby it is speeding up the growth of the congestion window on the primary path, and also

speeds up the recovery process from lost packets. As shown in Fig. 6, the throughput of the mod-

ified SCTP is constant even for BER values of up to 3 × 10−7 which in our simulation scenario

approximately corresponds to an average of 1 out of 318 packets being dropped due to bit errors.

This is also due to the fact that after a loss event, recovery and growth of the congestion window

is much faster than for standard SCTP. The throughput of standard SCTP, on the other hand,

decreases sharply with increasing BER since any lost packet halves the congestion window and

reduces the slow start threshold of the primary path, and it takes several (long) RTTs to reach a sim-

ilar state again. As can be seen in Fig. 7, the optimisation of the acknowledgement process in error

cases also greatly reduces the maximum message delay in the presence of transmission errors.

5. SCTP-based load sharing

SCTP load sharing potentially provides significant increases in transport protocol performance

(i.e. higher application level throughput) and network efficiency. Since a number of proposals

Springer


0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 2e-07 4e-07 6e-07 8e-07 1e-06 1.2e-06 1.4e-06 1.6e-06 1.8e-06 2e-06

Max

. M

essa

ge

Del

ay [

s]

Bit Error Rate

SCTP (standard)SCTP (optimized)

Fig. 7 Dual-homed simulation scenario: Max. message delay vs. BER

for load sharing extensions to SCTP exist already, we will give a short review first. Based on

this discussion, we will propose a novel algorithm which enhances load sharing performance by

using path specific acknowledgements.

5.1. Existing load sharing proposals

In the following, we will not address network or link layer load sharing algorithms, as e.g. use

of multiple links between routers with a round-robin distribution of packets over these multiple

links (ECMP), or the PPP multilink protocol [11], since these algorithms operate under different

assumptions, and not typically in an end-to-end fashion.

5.1.1. SCTP with loadsharing extensions (LS-SCTP)

In their internet draft [1], Abd El Al et al. suggest the introduction of new SCTP chunk types and

additional association setup parameters for their load sharing extension to SCTP. They propose to

introduce additional, path related sequence numbers and time stamps in new SCTP data chunks

and acknowledgments. While this is likely to simplify the handling of path related congestion

control parameters when load sharing is used, it also introduces extensions to SCTP that are

not wire compatible with existing SCTP implementations. Moreover, the additional meta-data

carried in the proposed LS-DATA and LS-SACK chunks is not necessary, as this information

can be derived from sender information and corresponding interpretation of SCTP selective

acknowledgements received by the sender.

5.1.2. Concurrent multipath transfer (CMT)

In [7], Iyengar et al. propose an algorithm for avoiding the unfair overgrowth of the congestion

control window for an SCTP path that occurs when a change-over is triggered by the application

Springer


layer. SCTP load sharing may be thought of as a cyclic change-over that is triggered (by the

application layer, or by a protocol implementation itself ) whenever the congestion window for a

given path does not allow sending any more new data. At that time, alternate paths may still allow

sending if the sender is not limited by the receiver window of the peer. In this case, the sender

performs change-overs to the alternate paths, and continues to send data. Therefore, sending

continues until the congestion windows of all available paths or the receiver window are fully

exploited.

When change-overs are periodically triggered, significant reordering can be observed by the

receiver [6], which is reported to the sender in SCTP SACK chunks containing gap reports. In this

case, standard SCTP reacts with unnecessary fast retransmissions, and needlessly reduces the

congestion window which limits the overall throughput. Also, since standard SCTP only increases

the congestion window for a path when an incoming SACK advances the highest cumulative

TSN acknowledged so far (CTSNA), the congestion window grows too slow. When chunks are

delivered to the receiver out of sequence over multiple paths, a standard SCTP implementation

will send back too many SACK chunks (one SACK for every new incoming data chunk) even

if packet loss is not occurring. Therefore, the rate of returned SACK chunks can and should be

reduced when load sharing is applied. In [6], Iyengar et al. propose the concurrent multipath

transfer (CMT) which aims at

1. avoiding unnecessary fast retransmissions by proposing an algorithm that only increases the

gap counter for data chunks in the retransmission queue when (i) they were reported missing

by an incoming SACK chunk, and (ii) when higher TSNs were already acknowledged for the

path to which the data chunk had been sent.

2. allowing more (and fairer) updates of the congestion window, since the congestion window for

a path should not only be increased, when the CTSNA value for an association is increased by

a new incoming SACK. If load sharing was applied, this would lead to a stronger growth of the

congestion window for the slower path only. Therefore, a path CTSNA variable is introduced

which stores the highest TSN that was acknowledged for this path without discontinuity. Now

the congestion window is advanced for paths on which new data chunks were acknowledged,

and for which the path CTSNA has advanced.

3. delaying selective acknowledgements appropriately so that unnecessary SACKs need not be

sent. With CMT, flags are used to ensure that retransmissions are still triggered in time (even

by fewer than four SACKs).

5.2. Path based selective acknowledgements

Although the algorithms discussed in Section 5.1.2 adapt the behaviour of the SCTP flow control

and error recovery mechanisms fairly well to the specifics of a load sharing scenario – in particular

in homogeneous environments with respect to link capacity and path delay – there is still room for

improvement. We therefore propose path based selective acknowledgement, in short PB-SACK.

The basic idea of our proposal is that the load sharing receiver maintains a SACK counter

d(i).path sack count for each path d(i) – and not per association as in [6] – and increases this

counter by one whenever a packet containing data chunks is received on the corresponding path.

Whenever d(i).path sack count = 2, a SACK chunk is immediately sent to d(i) and the counter

is reset to d(i).path sack count = 0. As a result, other than with the combination of algorithms

discussed in Section 5.1.2, two successive SCTP packets with data chunks arriving on different

paths can trigger two SACKs on these paths. It should be emphasized, that nevertheless the

PB-SACK algorithm still sends one SACK chunk for every two data packets on average which

is in accordance to the requirement in [14] (also cf. Section 2.2).

Springer


Upon receipt of a SACK chunk, the sender performs the following actions:

� For all paths d(i), set the flag d(i).saw new path sack = FALSE.� For all paths d(i), set the flag d(i).new pCTSNA = FALSE.� For any path d(i), for which a data chunk has been newly acknowledged, set the flag

d(i).saw new path sack = TRUE.� For any d(i) for which d(i).saw new path sack = TRUE, find the highest TSN newly acknowl-

edged. Store this value in d(i).highest path tsn acked .� For any d(i) for which d(i).saw new path sack = TRUE, store the number of bytes newly

acknowledged in d(i).newly acked bytes.� For any d(i) for which d(i).saw new path sack = TRUE, find the corresponding d(i).pCTNSA.

If d(i).pCTSNA was advanced by the SACK that is being processed, set the flag

d(i).new pCTSNA = TRUE.� For any d(i) for which d(i).new pCTSNA = TRUE, and for which the number of outstand-

ing bytes is higher than the congestion window d(i).cwnd, increase the congestion window

as required in sections 7.2.1 and 7.2.2 of RFC 2960 [14], e.g., in slow start, if the num-

ber of outstanding bytes on d(i) exceeds d(i).cwnd, the congestion window is increased by

d(i).cwnd+ = min(d(i).newly acked bytes, d(i).pMTU).� If the SACK chunk contains gap reports, check for any data chunk t remaining in the retrans-

mission queue that is reported missing and was sent to path dt , whether dt .saw new path sack= TRUE and t < dt .highest path tsn acked. If so, increase the gap counter for t . If this counter

reaches the threshold (e.g., 4), perform a fast retransmission as per Section 7.2.4 of RFC 2960.

By using this algorithm, the sender receives one SACK chunk on a path for any two packets

with data chunks sent over this path. This strict allocation of SACK chunks to their paths is

used for the so-called SACK-clocking, where each incoming SACK triggers an update of the

outstanding bytes counter, and the receiver and congestion windows.

5.3. Simulation environment

For evaluating the results of the above mentioned algorithms we added the CMT algorithm as

well as the PB-SACK algorithm to our OPNET SCTP simulation model. The simulation scenario

reflects a typical case in which two dual-homed IP-based signaling end points are connected to

routers via a fast LAN technology (gigabit ethernet). The routers in turn are interconnected

by broadband WAN links, as shown in Fig. 8. Typically, these WAN links use transmission

systems found in public networks, e.g. PDH (E3/T3) or SDH/Sonet (STM-x) providing data

rates ranging from 34 MBit/s (E3) to 155 MBit/s (STM-1) and beyond. In order to estimate the

maximum throughput that can be achieved by SCTP with different load sharing implementations,

we assumed a unidirectional data transmission initiated by a saturated traffic source sending

data chunks of constant length towards the sink. Without loss of generality, we assume long

SIP signalling messages, or SS7 broadband MTP3 messages [5] with 1000 bytes payload, and

one data chunk per SCTP packet. The results presented in the following section are averaged

throughput values as perceived by the application and averaged values for the congestion window

as perceived by the SCTP protocol entity. To isolate the effects due to the load sharing algorithm

from those induced by competing traffic in the network, a scenario without interfering background

traffic has been used. Due to the fairly deterministic scenario, the confidence intervals calculated

from the repeated simulation runs are insignificantly small and have been omitted.

Springer


Fig. 8 Simulation scenario

5.4. Simulation results

In order to evaluate scenarios that are relevant to the purpose of signalling transport, we assumed

that the two bottleneck links are typical E3 broadband links with a bandwidth of W = 34,368

MBit/s. The delay of Path 1, d1, was configured to be 10 ms, and the delay of Path 2, d2, was

varied between 10 ms and 200 ms (cf. Figure 8). The properties of SS7 signalling links are well

within this range, with delays typically below 100 ms.

For comparison, the throughput of a single-homed SCTP endpoint – the same as for a multi-

homed SCTP endpoint using standard SCTP without loadsharing – is also given in the figure

(for one, we assume the slower path 2 is used: the graph is labelled “only path 2”, and when

path 1 is used, the throughput remains constant since is does not depend on d2). As indicated by

the standard SCTP curve for path 2 only, an SCTP association could fully exploit the bandwidth

of the bottleneck link until the bandwidth delay product limits the achievable throughput – as is

commonly known for all window based transport protocols. Thus, up to a delay d2 of 50 ms, the

throughput is limited by the link bandwidth, and for d2 > 50 ms, the throughput is limited by

the receiver window, and the link delay. Note that the throughput in Fig. 9 is that as perceived

by the application layer and takes into account the overhead of IP and SCTP headers.

Moreover, Fig. 9 shows the throughput of both load sharing algorithms, CMT and PB-SACK.

The through-put for both algorithms is limited by the bandwidth of the bottleneck links and

fully exploits the link capacity for values of d2 ≤ 20 ms, and in the case of PB-SACK, even for

d2 ≤ 30 ms. Due to the interdependence between transmissions on both links in the loadsharing

scenario, the throughput for higher delays of link 2 decreases, even though d1 remains constant

at 10 ms.

Let rd be the ratio of d2/d1. For values of rd > 3, i.e. where the difference of link delays is

substantial, the higher delay of Path 2 affects the overall association throughput, which becomes

limited by the receiver window. Path 1 is blocked in this case. The receiver cannot free its buffers

at that time, as it needs to wait for earlier messages arriving on Path 2. Were we to use multiple

independent message streams, the throughput would be higher, as messages on different streams

could be delivered independently without blocking, thus freeing up the receiver window.

For high values of rd , e.g. rd ≥ 9, an association does not benefit from load sharing in the

case of two equal bandwidth links, as the throughput approaches the limit for a standard SCTP

association without loadsharing. Therefore, SCTP implementations should generally refrain from

Springer


0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 50 100 150 200

Th

rou

gh

pu

t [K

By

te/s

]

Delay (Path 2) [ms]

Limit (2 Associations)PB–SACK

CMTSCTP (only path 1)SCTP (only path 2)

Fig. 9 Application layer throughput of two load sharing algorithms

0

0.1

0.2

0.3

0.4

0.5

0.6

0 50 100 150 200

Max

. M

essa

ge

Del

ay [

s]

Delay (Path 2) [ms]

f(d2)=3d2f(d2)=2d2f(d2)=d2

CMTPB–SACK

Fig. 10 Maximum message delay of two load sharing algorithms

using load sharing in this case. Still, over the whole parameter range, our path specific SACK

algorithm yields a higher throughput than the CMT algorithm.

Figure 10 shows the values of the maximum message delay of the simulation for both load

sharing algorithms. These were also determined after eliminating the initial transient effects of

the simulation. It is obvious that the increase in throughput of the PB-SACK algorithm has not

been achieved at the cost of a substantially higher maximum message delay. Indeed, for values

of d2 < 50 ms, the PB-SACK performs somewhat better than the CMT algorithm. This is due to

the fact that for the PB-SACK algorithm, the growth of the congestion windows for both paths is

more in line with the delay exhibited by each path. The CMT algorithm can cause transmissions

of successive SACK chunks over path 2 which has a higher delay. In this case, the growth of the

congestion window of both paths is likely to happen somewhat slower.

Springer


0

50000

100000

150000

200000

250000

300000

350000

400000

0 50 100 150 200

Aggre

gat

e C

WN

D

Delay (Path 2) [ms]

PB–SACKCMT

Fig. 11 Development of the aggregate CWND for two load sharing algorithms

For rd = 10, i.e. d2 = 100 ms, both algorithms reach a state where effective traffic load

distribution cannot be guaranteed any more (for this the difference in link characteristics has

become too significant). At that point, both algorithms allocate traffic almost exclusively to Path

1, and the maximum message delay equals the link delay of the slower path, i.e. it is 100 ms. For

higher values of d2, outstanding data on the slower path blocks the sender from sending more data

(i.e. the receiver window is fully used). Therefore for increasing values of d2, the throughput of

both variants approaches that of a single-homed association using only path 2, and the maximum

message delay increases strongly.

Finally, Fig. 11 shows the development of the aggregate congestion window for both algo-

rithms. As expected (throughput of PB-SACK is higher, its delay lower), the value of the aggregate

congestion window is higher for PB-SACK than for CMT for d2 < 70 ms. However, this is not the

case beyond d2 = 100 ms and the value is substantially smaller than that of the CMT algorithm

around d2 = 150 ms. This seems counter intuitive, as in this region the absolute throughput for

the PB-SACK algorithm exceeds that of the CMT algorithm by almost 500 KByte/s. This clearly

indicates that the growth of the aggregate congestion window is not the only major criterion that

should be optimized to achieve efficient load sharing as the results in [6] suggest. Moreover, for

high values of rd , i.e. rd > 10, the blocking effects of the limited receiver window become more

significant, whereas the sizes of the congestion windows are of lesser importance.

The more inhomogeneous the scenarios become, the more important it gets that the load

sharing algorithm adapts well to the respective characteristics of the individual links in terms

of bandwidth and delay. This has been achieved by introducing path specific selective acknowl-

edgements. The argument also holds for heterogeneity caused by links with different capacity

within the network. Simulations have shown that also in these cases the path specific selective

acknowledgments result in a higher throughput compared to just optimizing the value for the

aggregate congestion window.

6. Conclusion and outlook

With a more widespread usage of SCTP and a growing variety of SCTP applications, the issue

of optimizing SCTP multi-homing performance and protocol variants that allow for effective

Springer


end-to-end load sharing will become increasingly interesting. In this respect, a wide range of

optimisations can lead to significant improvement of the transport protocol performance, but

also requires application specific tuning. Simulation results were presented for a scenario with

diverse path delay characteristics, in which the default behaviour of standard SCTP leads to

suboptimal results. By introducing a modification of the acknowledgement algorithms, a sub-

stantial improvement both in terms of throughput and message delay characteristics could be

achieved.

Assuming network scenarios with a certain degree of homogeneity with respect to link ca-

pacity and delay characteristics on redundant SCTP paths, load sharing mechanisms can yield

significant benefits. Therefore, such mechanisms have already been proposed to IETF stan-

dardization. While LS-SCTP introduces incompatible protocol extensions, CMT tries to adapt

SCTP flow control mechanisms to the specific requirements of a load sharing scenario with

the goal to optimize the value for the aggregate congestion window of the association. How-

ever, in less homogeneous load sharing scenarios, it is advantageous to adapt the congestion

windows on a per path basis instead. This reasoning led to the definition of the novel load

sharing variant using path based selective acknowledgements (PB-SACK) presented in this pa-

per. The simulation results presented confirm that this algorithm provides better throughput and

end-to-end delay characteristics than CMT. Furthermore, with respect to standardization require-

ments and implementation complexity, it is quite similar to CMT. Therefore, when load sharing

extensions to SCTP are further discussed in the IETF working group, PB-SACK is one of the

candidate algorithms that should be considered.

While the simulations performed so far have confirmed the benefits of load sharing and

the superior performance of our PB-SACK algorithm with respect to the maximum achievable

throughput for given delay bandwidth combinations, additional simulations quantifying the av-

erage gain in more dynamic scenarios (bursty, non-saturated senders and interfering background

traffic in the network) should be performed. In addition, a study on how the extension of the

scenario to more than two network paths influences the results could provide some additional

insight.

References

1. A. Abd El Al, T. Saadawi and M. Lee. Load Sharing in Stream Control Transmission Protocol, May 2003.draft-ahmed-lssctp-00.txt, Internet Draft, Work in Progress.

2. T. Dreibholz, A. Jungmaier and M. Tuxen. A new scheme for ip-based internet-mobility. In Proceedings ofthe IEEE Conference on Local Computer Networks (LCN2003), Bonn, October 2003.

3. T. George, B. Bidulock et al. SS7 MTP2-User Peer-to-Peer Adaptation Layer. IETF, Network Working Group,September 2005. RFC 4165

4. International Telecommunication Union. Signalling System No. 7 – Message Transfer Part Signaling Perfor-mance, March 1993. ITU-T Recommendation Q.706.

5. International Telecommunication Union. Message Transfer Part Level 3 functions and messages using theservices of ITU Recommendation Q.2140, July 1996. ITU-T Recommendation Q.2210 (07/96).

6. J.R. Iyengar et al. Concurrent multipath transfer using sctp multihoming. In SPECTS 2004, San Jose, July2004.

7. J.R. Iyengar et al. Preventing SCTP Congestion Window Overgrowth During Changeover, February 2004.draft-iyengar-sctp-cacc-02.txt, Internet Draft, Work in Progress.

8. A. Jungmaier, E.P. Rathgeb, M. Schopp and M. Tuxen. Sctp – a multi-link end-to-end protocol for ip-basednetworks. AE – International Journal of Electronics and Communications, 55(1) (2001) 46–54.

9. Andreas Jungmaier et al. SCTPLIB – an SCTP implementation, April 2005. For reference, seehttp://freshmeat.net/projects/sctplib.

10. L. Ong, I. Rytina et al. Framework Architecture for Signaling Transport. IETF, Signaling Transport WorkingGroup, October 1999. RFC 2719.

11. K. Sklower et al. The PPP Multilink Protocol (MP). IETF, Network Working Group, August 1996. RFC 1990.

Springer


12. R. Stewart et al. SCTP Dynamic Address Reconfiguration. IETF, Network Working Group, November 2005.draft-ietf-tsvwg-addip-sctp-13.txt, work in progress.

13. R. Stewart, M. Ramalho, Q. Xie, M. Tuxen and P. Conrad. Stream Control Transmission Protocol (SCTP)Partial Reliability Extension. IETF, Network Working Group, Mai 2004. RFC 3758.

14. R. Stewart, Q. Xie et al. Stream Control Transmission Protocol. IETF, Signaling Transport Working Group,October 2000. RFC 2960.

15. R. Stewart, Q. Xie, L. Yarroll, J. Wood, K. Poon, K. Fujita and M. Tuxen. Sockets API Extensions for StreamControl Transmission Protocol. IETF, Network Working Group, September 2005. draft-ietf-tsvwg-sctpsocket-11.txt, work in progress.

16. OPNET Technologies. OPNET Modeler, April 2005. Commercial simulation tool. See http://www.opnet.com/products/modeler/home.html for further reference.

17. M. Tuxen et al. Requirements for Reliable Server Pooling. IETF, Network Working Group, January 2002. RFC3237.

Springer

On SCTP multi-homing performance - · PDF fileOn SCTP multi-homing performance ... Results of...

Documents

Transcript of On SCTP multi-homing performance - · PDF fileOn SCTP multi-homing performance ... Results of...