Security Framework for IP Telephony White Paper

Document Number: TR-41.4-02-02-12

TR-41.4.4-02-02-08

STANDARDS PROJECT:

TITLE: Security Framework for IP Telephony White Paper

SOURCE: Polycom 1000 West 14th Street North Vancouver, B.C. V7P 3P3 Canada

CONTACTS: N. Dadoun

Phone: 604-697-9336 Internet: [email protected]

DATE: February 16, 2002

DISTRIBUTION TO: TIA TR-41.4 and TR-41.4.4

Disclaimer:

This document has been prepared to assist TIA standards development process. It is offered as a basis for discussion and is not a binding proposal on Polycom. The contents are subject to change in form, numerical value or both after further study. Polycom specifically reserves the right to add to, revise or otherwise amend, the information contained herein.

Intellectual Property Statement:

The individual preparing this contribution is unaware of patents, the use of which may be essential to a standard resulting in whole or in part from this contribution.

Copyright Notice: The contributor grants a free, irrevocable non-exclusive license to the Telecommunications Industry Association (TIA)

to incorporate text contained in this contribution and any modification thereof in the creation of a TIA standards publication; to copyright in TIA's name any standards publication even though it may include portions of this contribution; and at TIA's sole discretion to permit others to reproduce in whole or in part the resulting TIA standards publication.

Security Framework for IP Telephony 1

Security Framework for

IP Telephony

White Paper

Document: PC-VD-WP-Security-001 DRAFT 15 February 2002

N. Dadoun

Sr. Software Engineer

Voice Communications Division

Polycom, Inc.

604-697-9336

[email protected]


Table of Contents

1.0 Introduction............................................................................................................................3

2.0 Goals for a Security Framework.......................................................................................4

3.0 Systemic vulnerabilities: The nature of current and future threats ......................5

3.1. Denial of Service Attacks I: General.................................................................................... 5

3.2. Denial of Service Attacks II: VoIP Specific .......................................................................... 5

3.3. RTP attacks: Hijack, Replay.................................................................................................. 6

3.4. Signaling attacks: Hijack, Replay.......................................................................................... 6

3.5. Spoofing................................................................................................................................. 7

3.6. Security Holes – Soft Clients................................................................................................. 7

3.7. Security Holes – Additional Client/Servers Applications: Telnet/Shell access .................... 7

3.8. Security Holes – Provisioning Information/Provisioning Server/XML Downloading ......... 7

3.9. Security Holes – Computer port plug- in................................................................................ 7

3.10. Security Holes – 802.11b Wireless device security............................................................... 8

4.0 General Mechanisms for Assuring Secure Communications ..................................9

4.1. Physical/Logical Security: Network Infrastructure .............................................................. 9

4.2. Authentication/Authorization schemes.................................................................................. 9

4.3. Privacy/Encryption schemes................................................................................................ 10

4.4. Other Issues: Firewalls and NAT ........................................................................................ 11

4.5. Application specific firewalls .............................................................................................. 12

5.0 Specific Security Mechanisms and their suitability for VoIP ................................ 13

5.1. TLS/SSL .............................................................................................................................. 13

5.2. SRTP .................................................................................................................................... 14

5.3. IPSec .................................................................................................................................... 14

5.4. Kerberos............................................................................................................................... 16

5.5. Certificates ........................................................................................................................... 17

6.0 VoIP Security Considerations and Possible Strategies.......................................... 18

6.1. Security Strategies ............................................................................................................... 18

6.1.1. Denial of Service ............................................................................................................. 18

6.1.2. Authentication.................................................................................................................. 18

6.1.3. Encryption/Privacy .......................................................................................................... 18

6.1.4. Encryption Approaches ................................................................................................... 19

6.1.5. Key Management ............................................................................................................. 20

6.2. Overall Security Levels ....................................................................................................... 20

7.0 References .......................................................................................................................... 21


1.0 Introduction IP telephony is roughly at the level of trust and (assumed) collegiality that the Internet was in the late

1980s before the first automated attack (the Morris worm) disabled over 10% of the computers then on the Internet. Most deployed systems and their developers have given little if any thoughts to security and may not until a cathartic attack comparable to the Morris worm forces them to. Although the security best practices identified by the Internet community (e.g. [CERT]) are a necessary starting point, there is a wealth of new vulnerabilities in the VoIP domain.

According to Bell [Bell1], security within a VoIP system is based on:

?? Physical and logical security of infrastructures (servers, endpoints, routers etc.)

?? Authentication of system elements and users

?? Authorization of system usage

?? Privacy of signaling and user information.

There is a basic level of logical security afforded to systems that are deployed on a secure intranet which is protected by firewalls and the general hardening associated with secure installations. This is not absolute protection – it is possible to be subjected to internal attacks from an internal malicious user or simply from a pervasive worm that has breached the firewall (e.g. the Nimda worm).

Increasingly, network installations are incorporating security measures by default. For telephony platforms to co-exist on those networks, they will have to address security concerns or risk being written off as too great a vulnerability.


2.0 Goals for a Security Framework There is an inherent difficulty in constructing a comprehensive security framework in a development

environment which attempts to be protocol agnostic since many approaches to security and encryption operate at and attempt to secure the signaling level. There are also fundamental differences in stimulus and functional protocols in which entities communicate and how. Although there has been much good security work done in the context of specific protocols (see [H.235] for H-series multimedia terminals as an example), the foremost goal of this white paper is to consider security and formulate a framework for secure communication which is (as much as possible) independent of the signaling protocol.

Before examining other issues, it’s worth elucidating some general goals that a security framework should attempt to address. The main thrust is that a security framework should provide an enhancement of communication services allowing the user to feel confident that they are in of control of the privacy and authentication of their interactions – in other words, a security framework should enhance and not form an impediment to communications. Some of the following general goals for a security framework have been paraphrased from the H.235 document on security and encryption for H-series multimedia terminals [H.235]:

?? The framework should be able to operate in a mixed environment of secured and unsecured entities and should allow security configuration decisions to be made on an entity by entity basis;

?? A security architecture should be developed as an extensible and flexible framework for implementing a security system. This should be supplied through flexible and independent services and the functionality that they supply. This includes the ability to negotiate and to be selective concerning cryptographic techniques utilized and the manner in which they are used;

?? Provide the desired level of security for all communications including all aspects of connection establishment, call control and media exchange between all entities. This includes the use of confidential communication (privacy) and may require peer authentication as well as protection of the user’s environments from attack;

?? The framework should not preclude integration of other security functions which may protect entities against attacks in the network. A secondary goal should be to avoid the overhead of redundancy, e.g. the application of encryption/decryption more than once either in series or parallel between two communicating entities;

?? The framework should not limit the ability of any system configuration to scale as appropriate. This may include both the number of secured users and the levels of security provided;

?? Where appropriate, all mechanisms and facilities for signaling security should be provided independent of any underlying transport or topologies. Separate mechanisms and facilities may be required to handle security at those lower levels;

?? The framework should provide facilities for managing and distributing session keys associated with the cryptography utilized.

These general goals are provided here to help evaluate the design of a security infrastructure and security mechanisms from a user (not a technical) point of view.


3.0 Systemic vulnerabilities: The nature of current and future threats

One standard definition of security is “freedom from danger”. Let’s start by understanding the nature of the dangers involved by identifying systemic vulnerabilities. The fact that data, voice stream and signaling traffic are all carried in the same network segment means that a compromised computer can be used to launch an automated attack on various portions of the VoIP infrastructure. Even if general data network segments are isolated from voice and signaling traffic (as Bell [Bell2] has suggested), the fact that voice and signaling traffic still share a network segment is a return to ‘in-band’ signaling which Schneier points out is the vulnerable pre-SS7 mode of operation [Schneier1]. Although the stated focus here is on IP telephony, most of this discussion can be straightforwardly extended to video and other real-time communications.

3.1. Denial of Service Attacks I: General A Denial of service (DoS) attack is one which conspires to make a particular entity unavailable so that

it is not able to provide its intended service. This is a logical security of infrastructures concern. This type of attack may be directed at the endpoints or servers/MGCs/call managers or other portions of the VoIP infrastructure. DoS attacks generally take one of four forms: bandwidth consumption, resource starvation, programming flaws, and routing/DNS attacks.

Briefly, bandwidth consumption attacks attempt to consume all of the network bandwidth available to the victim, either directly by attacking from a network with more bandwidth than the victim or with an amplifying attack in which other sites are enlisted to overwhelm the bandwidth available to the victim.

Resource starvation attacks focus on consuming system resources rather than network resources e.g. CPU, memory, disc space etc. This may take the form of making so many spurious processing requests that legitimate requests can no longer be serviced.

Programming flaw attacks occur when the attacker tries to make the victim crash or enter an unstable state by trying to exploit improper handling of exceptional conditions. These exceptional conditions might include badly formed or otherwise non-compliant input, or extremely long input strings which might cause a buffer overflow.

Routing/DNS attacks occur when attackers attempt to fool a victim server into storing incorrect address information such that a particular victim site does not received intended traffic; that is, its traffic is redirected to an incorrect or non-existent site.

3.2. Denial of Service Attacks II: VoIP Specific All four of the DoS attacks identified above have simple extensions to VoIP.

A bandwidth consumption attack could be launched in several ways; the most obvious might simply be to direct a high volume of traffic to the phone’s address. The traffic would not even have to be legitimate signaling or voice traffic if it achieves its purpose of preventing legitimate traffic from getting through. Examples of this are so-called ping floods (layer 2) or UDP floods (transport layer 4).

Similarly a resource starvation attack could be launched in several ways, generally by sending service requests to the phone either through the signaling protocol used by the phone or by service protocols available for various Internet services such as TCP or HTTP. For example, if the phone supports TCP, a standard SYN flood attack (in which the first of the three-step process in setting up a TCP connection is performed repeatedly) could create more ‘pending’ connections than the system is able to handle.


Programming flaw attacks are difficult to predict (otherwise everything would be fixed in advance) but the major vulnerability would seem to be the robustness of any parser used to receive and decode the signaling traffic. Variations of this attack include a ‘jolt’ attack in which malformed IP packets are deliberately sent to the IP processing stack to cause a processing error or to force the stack into an unstable state [JOLT]. Ideally, parsing errors or other exceptional situations should attempt to leave the phone in a stable (known) state, preferably with the ability to recover and/or resume operation without intervention.

A Routing/DNS attack is an attack on the routing infrastructure that supports the phone environment, but it’s not hard to see how call signaling or media could be misdirected if this type of attack were successful. There is on-going work in the IETF with respect to DNS security both operational security and secure DNS transactions (see [DNSSec] for a roadmap).

3.3. RTP attacks: Hijack, Replay These types of attack are generally privacy attacks, although in some instances they can be used as an

authentication attack.

In the absence of encryption, it is relatively easy to intercept RTP media traffic by anyone with the appropriate tools who has access to a network segment which is carrying the RTP traffic. For example, there is a publicly available ‘hack tool’ which allows someone with access to a SCCP call manager network segment to intercept RTP from [privacy] and insert RTP into [authentication] an active conversation (see [VOMIT]). This is a form of a ‘man in the middle’ attack which could easily go undetected by the participants of the call.

This insertion of previously played (hence validly formed/potentially authenticated) RTP data is an example of a replay attack. Normally the RTP packet numbering is meant to be a base level ‘sanity check’ for making sure that RTP is being received properly. In fact, many systems do not check for jumps in packet number and it might not be hard to fool those that do by using a plausible packet number available from recently captured RTP packets.

3.4. Signaling attacks: Hijack, Replay This type of attack is generally an authentication attack, although in some instances it can also be used

as an authorization and/or privacy attack. In the absence of encryption it is relatively easy to perform by anyone with access to a network segment which is carrying the signaling traffic. This could be used in a number of ways to compromise the security of the phone and its infrastructure.

A phone might accept plausible signaling from a server other than the one which is authorized to send signals. In fact, MGCP has a NotifiedEntity parameter which changes the server with which the endpoint interacts (see [MGCP]). This is useful for MGC fail over but could be misused to hijack subsequent endpoint communications.

An authorization attack could come from using signaling to set up long-distance or pay per call services. Signaling the phone to initiate a one-way connection could turn the phone into a desktop listening device with no external indication that this was happening. A process could hijack the phone or another processing element within the VoIP infrastructure with a spoofing attack, that is changing the source IP address to masquerade as an entity that you’re not, although care would need to be taken that ACKs did not mess things up.

As with the RTP stream description above, signaling traffic is also vulnerable to a replay attack. Although most signaling protocols do employ some kind of sequence number, the numbering rarely appears to be enforced.


3.5. Spoofing There are a variety of spoofing attacks, all of which are related to authentication. Spoofing is the act

of misrepresenting the source of a communication; there are examples of spoofing for both media stream and signaling as given in the sections above.

3.6. Security Holes – Soft Clients Although not a ‘type’ of attack per se, soft clients pose a security hole by defining an entity which can

be used to launch automated attacks. The existence of soft clients violates the separation of data network segments from voice/signaling network segments advocated by [Bell2] and discussed below. This forms a particular problem in situations in which trust is extended to the device rather than a particular session (e.g. the use of IPSec); although the application participating in a session may be worthy of that trust, another rogue application on the device may not.

3.7. Security Holes – Additional Client/Servers Applications: Telnet/Shell access

Many phones support remote access through telnet for debugging and development. This access often uses a simple user/password authentication. In the absence of encryption and/or any other kind of authentication, this could expose a vulnerability for capturing the password or sending unauthorized commands to a telnet session.

Similarly, many phones support FTP and/or TFTP for provisioning. In some signaling environments, micro-browsers are used for accessing directories and 3rd party services such as weather or stock quotations. Some vendors also incorporate web servers in the phone for additional services such as localized directories, accessing phone statistics/diagnostics, and phone state control. This potentially opens the phone to many standard web server attacks.

3.8. Security Holes – Provisioning Information/Provisioning Server/XML Downloading

The information required to provision the phone is sensitive and provides another potential security vulnerability. The provisioning information is often stored in XML (plaintext) and could potentially be intercepted either when downloading (i.e. provisioning) or if there were a breach of security on the server. Potentially the provisioning information could include passwords, server and other reference addresses, parameters for enabling/disabling test environments etc.

3.9. Security Holes – Computer port plug-in One of the selling features of most IP telephones is that only one Ethernet ‘drop’ is required to support

both phone and data. This is usually accomplished through having a network jack plug-in for a computer to access the data network through a switch in the phone. This is a security risk for phones in public areas; unless that computer port plug-in can be disabled, anyone can walk up with a laptop and a jack and access the data network. Note that some vendors provide a soft mechanism for disabling the port on their phone presumably to avoid this problem.


3.10. Security Holes – 802.11b Wireless device security To take the public phone port access issue one step further, there is also the use of 802.11; the wireless

network access standard. There have been concerns raised about WEP (Wired Equivalent Privacy) and the fact that it seems to be vulnerable to attack and cracking. There is also concern over installations not changing default passwords and otherwise leaving themselves open to easy security breaches.


4.0 General Mechanisms for Assuring Secure Communications Incorporation of security into IP telephony seems to be subdivided into the areas of physical/logical

security, authentication, authorization and privacy.

4.1. Physical/Logical Security: Network Infrastructure The comment has been made that the only way to make IP telephony secure is to put it on its own

dedicated network. This would defeat much of the incentive to move to IP telephony but it does suggest a direction for improving telephony security: identify the types of traffic on the network (with respect to telephony), try to separate them (physically or logically) and control their interaction wherever possible. With this in mind, Bell [Bell2] identifies seven network segments along with a mnemonic colour scheme:

?? Blue Network Central Call Control Segment

?? Yellow Network Peripheral VoIP Elements Segment

?? Green Network Voice associated Work Station Segment

?? Black Network Administrative Data Segment

?? Orange Network General Intranet Data Segment

?? White Network Bastion Segment

?? Red Network Internet Segment

Bell argues that separating these network segments, either physically or through VLANs, and controlling the access points between them can significantly improve the security of the overall deployment. The inter-segment access points could conceivably be controlled with a physical or logical gateway or monitored according to his rules for inter-segment access. This approach potentially provides many security benefits but in particular, it may provide the best line of defense against DoS attacks and worms/viruses which have breached a firewall to date.

4.2. Authentication/Authorization schemes A fundamental concern in communications is authentication: is this message authentic, that is, did it

originate with who it claims and was it sent in the manner intended? Authorization simply asks: is the entity sending this message authorized to request this service (e.g. make a long-distance phone call or listen to voice mail). We assume the fundamental problem to be authentication; once we can authenticate the message, it should be relatively straightforward to check the authorization.

One way to authenticate a message is with a digital signature. A digital signature is usually produced by encrypting a piece of data in such a way that it uniquely identifies the sender and associates that sender with the message. If that piece of data includes a time stamp or sequence information then it can also authenticate the sender’s intention.

A common way of producing such a digital signature is by encrypting a message digest value which has been derived from the message; this has less computational overhead than encrypting the entire message, as the digest is usually much shorter than the message itself. A message digest value can be calculated using a one-way hash function that produces a fixed length value from an arbitrary length message. The hash function in question must satisfy several properties: the hash algorithm is easy to calculate, given only the hash (message digest) value it is hard to calculate the message which produced that value and given only a hash (message digest) value it is hard to find another message that hashes to that same value. Although most common message digest algorithms are described as unkeyed schemes, it


is not difficult to incorporate a key by combining it with the message before computing the digest; if a secret (shared) key were used in this way, further encryption would not be necessary to provide a signature.

The digest value may be used in several ways. An entity may be challenged by another and may only be able to authenticate its identity by hashing a given value in an identifiable way, by combining it with a shared key perhaps. Often for authentication and privacy, the hash value produced from a message is combined with the message somehow and is encrypted separately using some kind of shared key algorithm. In this way, the message is ‘signed’ with the hash value and being able to reproduce the hashed ‘message digest’ value indicates that the message has not been tampered with in transit. Furthermore, using an asymmetric (public key) scheme, if it has been signed using a ‘private’ key, the message cannot be repudiated, only the possessor of the private key could have sent it.

There have been a number of unkeyed one-way hash functions used in this area; a current popular choice is MD5 (message digest 5) which produces a 128-bit hash value. An equivalent version which incorporates a session key (as described above) is known as MD5-sess. Unfortunately, like its predecessor MD4, given current technologies and processor speeds, MD5/MD5-sess is no longer considered to be secure and has been superceded by other hash schemes. Secure Hash Algorithm or SHA-1 is a scheme which can be considered secure for the foreseeable future. SHA-1 has been proposed by the US government and produces a 160-bit digest value.

4.3. Privacy/Encryption schemes If there is a possibility that data can be intercepted between communicating entities (using a packet

sniffer or other eavesdropping mechanism), the only way that privacy can be ensured is by using encryption/decryption. Briefly, encryption is the act of applying a transformation function to a message in such a way that someone with the appropriate piece of information about the transformation can apply an inverse function (decryption or reconstructing the original message) and that anyone else cannot. Generally speaking there are two approaches to encryption, symmetric (shared key) and asymmetric (public/private key).

In symmetric encryption, the value used for decryption can be calculated easily from the value used for encryption. As a simple example, a substitution algorithm (e.g. a decoder ring) is a symmetric algorithm, if you know the substitution applied it is easy to reverse the substitution and reconstruct the original message. Assuming a more sophisticated (robust) encryption algorithm, the security of the scheme depends on the secrecy of the key.

In asymmetric encryption, the value used for decryption cannot be calculated easily from the value used for encryption. Calculating the value used for decryption usually depends on solving some mathematically difficult problem such as finding the prime factors of a very large number or calculating discrete logarithms. In this case, the encryption value can be made public without worrying about it being used to decrypt any message associated with it.

Similarly, exploiting the difficulty of calculating a mathematically difficult inverse, two parties wishing to communicate privately can negotiate secret keys even if someone is listening in. This is the concept behind the Diffie-Hellman key exchange algorithm which enables two peers (e.g. phones) to negotiate a session key without the need for any other entities.

Fundamental to most encryption schemes is key management, either pair-wise between communicating entities as part of a Security Association (SA) or centrally amongst a community of communicating entities as a key distribution centre (KDC) or certificate scheme. There are a number of key management schemes such as those associated with IPSec, IPSec Key Exchange (IKE) a combination


of Internet Security Association and Key Management Protocol (ISAKMP) and OAKLEY used for establishing shared keys etc.

4.4. Other Issues: Firewalls and NAT Firewalls are a critical mechanism for providing a general first layer of defense protection for

computers and networks. Firewalls protect networks by enforcing a restrictive packet filtering policy. This usually uses statically-defined rules to enforce policies to allow traffic flows to and from trusted addresses and to restrict all others. This works against the dynamic nature of IP telephony in allowing spontaneous communication to and from unknown potentially mobile entities with dynamic addresses and ports. This difficulty is often exacerbated by the additional use of network address translation (NAT) behind firewalls.

NAT is the widely-deployed ‘short-term’ solution to IP address space depletion. In this scheme, machines on isolated networks can be configured with locally unique IP addresses which are used directly for communication within the network. (There is a family of designated addresses which is recommended for this usage which will not be allocated by IANA.)

When a machine on the isolated network needs to communicate with an external machine or device, it needs to supply a globally unique address. This is accomplished by allowing an externally visible machine (that is, a machine with a globally unique IP address) to act as a NAT device and a proxy for that machine. A packet from the source machine is sent to the NAT device with the destination IP address, the NAT proxy substitutes its own IP address combined with an unused port number, and sends the packet with the modified source IP/port address pair. On receiving the first packet, the NAT proxy also creates an entry in its NAT table to record this translation/association. Thereafter any data going out from the source machine is sent with the externally visible IP/port number pair, any data received on that port has its IP modified to the internal address and is forwarded to the source machine.

There are a number of implications of this scheme, security and otherwise. First of all, any device behind a NAT is not visible to any external device wishing to initiate communications. For example, an IP phone behind a NAT could not receive an external call unless it had a prearranged NAT table entry and the external device knew that external IP/port entry. There are security benefits to this in that a phone can’t be attacked if it’s not visible. On the other hand, this implies that to make phone calls on the Internet at least one of the phones (the recipient of the call) needs to have an externally visible (i.e. globally unique) IP address potentially behind a non-NAT firewall.

The visibility issue is partially mitigated with the use of DNS: if communications are performed between named entities then a name server can provide external visibility providing a mapping to an appropriate IP number/port combination.

Once encryption becomes part of the security environment, the situation gets more complicated. Authentication schemes will often produce a hash message digest of the packet information as a ‘crypto-checksum’ to demonstrate that the packet has not been modified in transit. Doing a NAT address/port substitution will invalidate this digest and cause the packet to be discarded.

One company has proposed a solution this problem with Dynamic Network Address Translation, a mechanism for doing external/internal IP and Port association dynamically. Another mechanism which has been proposed which allows end-to-end security is Realm-Specific IP (RSIP) which has a subtle difference from ordinary address translation – it still performs address translation but an RSIP server will tell the originating application what that translation is in advance so that the translated address is inserted in the message at the endpoint eliminating the need for the address translation process to make substitutions within the message sent. Thus the message can be encrypted at the originating endpoint and that message will still incorporate the correct externally visible address.


4.5. Application specific firewalls One proposal to mitigate the problems of packet communication in a filtering or translation

(middlebox) environment is the use of Application Level Gateways (ALGs). From IETF draft [MidCom], “Application Level Gateways (ALGs) are entities that possess the application specific intelligence and knowledge of an associated middlebox function. An ALG examines application traffic in transit and assists middlebox in carrying out its function.

“An ALG may be co-resident with a middlebox or reside externally, communicating through a middlebox communication protocol. It interacts with a middlebox to set up state, access control filters, use middlebox state information, modify application specific payload or perform whatever else is necessary to enable the application to run through the middlebox.”

The IETF has set up a (Middlebox Communication) MidCom working group to try to address some of these issues, to identify situations which require some middlebox communication mitigation and to develop an appropriate protocol to facilitate this approach. In parallel to this, IpTel has proposed and is working on a Firewall Communication Protocol (FCP) [FCP] to try to address some of these issues.


5.0 Specific Security Mechanisms and their suitability for VoIP Moving from the general to the specific, we examine specific mechanisms which have been proposed

for providing data communications security and assess their suitability for the VoIP domain.

5.1. TLS/SSL TLS (Transport Layer Security) is an IETF open standard outgrowth of SSL (Secure Sockets Layer)

originally developed by Netscape; TLS v1.0 is closely related to SSL v3.0. TLS actually sits above the transport layer and provides application level security for communications. TLS provides facilities for authentication, integrity and privacy between communicating entities.

It operates in two phases and layers:

?? TLS Handshake protocol is used for establishing a secure session – entities are authenticated using asymmetric cryptography (e.g. Diffie-Hellman or RSA) and once asymmetric secure communication has been established, entities negotiate a secret key securely and reliably,

?? TLS Record protocol provides privacy using symmetric cryptography with a shared secret key (potentially agreed on using the TLS Handshake Protocol) and provides reliability using a keyed message authentication code (with a hash function such as MD5 or SHA).

The use of TLS requires a reliable tranport mechanism such as TCP and thus TLS will not work over UDP. The obvious implication for IP telephony is that TCP-based signaling and other ‘out-of-band’ communications can make use of TLS but non-TCP-based signaling and UDP-based media streams cannot.

Advantages:

?? Protocol addresses key management issues within simple framework.

?? Relatively lightweight approach to signaling security.

?? Can provide security within the notion of a session, that is, trust is granted to the specific application not to the device which may be running applications which are not to be trusted.

Disadvantages:

?? Since it requires a suitable underlying transport layer (i.e. not UDP), it cannot secure the media stream requiring a separate mechanism such as SRTP.

?? Similarly, although some signaling protocols use TCP (e.g. SCCP), some can use TCP or UDP (e.g. SIP) and yet others use UDP exclusively (e.g. H.323, MGCP); due to the requirement for a reliable transport layer TLS cannot be used for signaling protocols which use UDP.

?? Since TLS operates just above the transport layer each application layer protocol needs to negotiate its own secure TCP channel (e.g. FTP, Telnet, HTTP, signaling protocols that use TCP etc).

?? Will require programming changes to phone protocol stack necessitating changes to all protocol stacks.

Note that FTP used in conjunction with TLS has been proposed as a mechanism to provide secure file downloads (i.e. for provisioning etc.) and similarly, HTTP used in conjunction with TLS could be used as a mechanism to provide a secure minibrowser etc.


5.2. SRTP SRTP (Secure Real-time Transport Protocol) [SRTP] is a profile and enhancement of RTP to provide

privacy, message integrity, message authentication and replay protection. The goals for these security enhancements within a real-time context include speed, parallelizability, stream-based (so that transmission bit errors are not propagated), and limited packet expansion. It also defines some default (mandatory to implement) mechanisms for providing these security features, such as a played packet list to provide replay protection. Note that a played packet list in the absence of authentication is still vulnerable to an attack in which plausibly numbered packets are sent earlier than the real packets with the same numbering.

SRTP should be implemented as a ‘bump in the stack’ in which encryption/decryption is done without affecting any other transport mechanism either above or below the RTP processing. The RFC recommends an encryption algorithm based on AES which essentially performs a block encryption/decryption on a packet’s contents by building an XOR mask which is a function of the sequence number1. In this way, any individual packet can be operated upon quickly and independently of any other packet. The optional appended authentication tag (the only addition to the standard RTP packet) is constructed by calculating a hash digest of a combination of the session key, payload contents and sequence number then truncating it to the required tag length.

Advantages:

?? Tailored to media stream requirements for speed, parallel processing, streaming, and limited expansion.

?? Seems to have simple implementation, basically a call to an encryption/decryption routine library just between RTP and UDP layers, and some simple mechanisms to detect replayed packets.

Disadvantages/Risks:

?? Does not address any signaling security issues, will require separate mechanism for all other communications.

?? Requires separate key management, may need to use mechanism like IKE, ISAKMP/Oakley, Kerberos or peer-to-peer mechanisms like Diffie-Hellman to manage keys.

?? Will require programming changes to phone protocol stack

5.3. IPSec IPSec (IP Security) is a collection of interoperating standards to provide IP level security between

communicating entities. It incorporates a number of mechanisms to provide authentication and data integrity (encryption) but does not mandate particular algorithms, or key management policies. Highlights of the standards are:

?? Communicating entities negotiate and maintain a security association (SA) which defines how security issues are to be managed between the entities. This can include the types of security desired, algorithms for integrity/encryption (e.g. AES or DES etc.) and/or algorithms for

1 The sequence number as referred to here is a slight generalization of the normal packet sequence number. This

generalization incorporates a ‘roll over counter’ (ROC) to expand the range of possible sequence numbers. Once the entire range has been exhausted, the session key expires and must be renegotiated.


integrity/signing (MD5, SHA etc.), session keys for encryption or keyed hashing, how those keys are to be derived, digital certificates etc.

?? Two new IP packet formats: authentication header (AH) and encapsulating security payload (ESP). Both packet formats contain a security parameter index (SPI) which identifies the applicable security association as defined above. The AH authenticates the packet contents by including a ‘crypto-checksum’ calculated as a (keyed) hash function of the message content combined with a shared key. The ESP uses its SPI to determine the encryption scheme and keys and the packet payload is (optionally) encrypted using these parameters, that is the encryption algorithm may be NULL. ESP (optionally) includes authentication data as a ‘crypto-checksum’ similar to that defined in the AH but applied to the encrypted payload. Note that ESP RFC states that at least one of encryption or authentication must be included, i.e. not both are NULL.

?? Two modes of operation: transport mode in which the IP payload is encrypted and the original headers are left intact or tunnel mode in which the entire packet is encrypted and becomes the payload in a new IP packet with a new IP header.

The general advantage is that security associations can be negotiated and managed ‘below’ the application layer, that is, transparently to end users and applications. This is as opposed to TSL/SSL which is associated with a given application such as a browser. The associated disadvantage is that trust is extended to a device rather than a session or application thus opening a vulnerability to a ‘rogue’ process’ which might be running on a trusted host. Other features might be summarized as follows:

Advantages:

?? One-stop solution to many security concerns, IPSec can be used to secure the IP layer thus securing both the signaling and media stream at once.

?? The authentication (e.g. AH) and privacy/encryption (e.g. ESP) services can be used separately or in combination.

?? Enforcing authentication might be able to mitigate some DoS attacks, specifically the ones which originate from an untrusted host – the unauthenticated packets would be discarded. Of course, rogue processes (e.g. worms) on a trusted host would still get through due to the ‘trusted device’ model of IPSec. It’s also likely that the volume of traffic generated within a DoS attack might overwhelm the processing capabilities of the phone regardless of authentication. IPSec cannot screen layer two ICMP (ping) and similar requests which operate below the IP/IPSec layer either.

?? Typically uses IPSec Key Exchange (IKE) to negotiate security associations but could use Kerberos or some simpler key management scheme.

Disadvantages:

?? The IPSec suite is quite complicated and its interactions may be difficult to implement and maintain properly in a fresh domain (IP telephony). For example, the standard calls for a Security Policy Database and a Security Association Database both of which will require additional infrastructure assuming that the site installation doesn’t already use IPSec;

?? The security association scheme of IPSec basically associates its ‘trust’ with the device/IP number, that is, it has no sense of session or application. This is a disadvantage in that a particular device may have a number of sessions or applications running, some of which may be trusted and some of which may not. IPSec has no way of differentiating between them;


?? The resources (processing/memory) required to negotiate and maintain a security association may be taxing on the phone architecture and may only be feasible on a ‘higher end’ phone;

?? A centralized element in a telephony network topology (a call-manager/MGC in a stimulus architecture or a gate-keeper in a functional architecture) could be heavily burdened in maintaining a large number of security associations. This could be catastrophic in a system initialization or fail-over scenario when a large number of endpoints could be trying to negotiate security associations simultaneously;

?? Depending on where the security associations are maintained, end-to-end traffic may traverse several security associations each with its own decrypt/encrypt pair which may increase communication latency;

?? IPSec communication in tunnel mode does not interoperate with NAT since source and destination IP addresses are encrypted. Due to this encryption, NAT cannot perform its required address substitutions.

5.4. Kerberos Kerberos is a system to perform centralized authentication services and privacy mainly in closed

domains such as local area networks. It is designed around an authentication server which is used to authenticate both users and services. To use a service on the network, a user must have an encrypted ‘ticket’ which is presented to the server authorizing the user.

The basic scheme is that the user identifies himself to the authentication server (AS) using a shared secret (a password); the scheme uses the password as a (symmetric) key for encrypting the exchange so that the password is never explicitly transmitted over the network. Once the user has been authenticated, he is granted a ticket-granting ticket (TGT) which he can present to a ticket-granting service (TGS) to get tickets for all other services; hence the user need be authenticated only once to make use of the services available. Although the AS and TGS are distinct entities, they are often located on the same machine and referred to together as the Key Distribution Centre (KDC). Security is maintained through a combination of session keys (used for symmetric cryptography so that no information is visible in plaintext on the network), expiration times based on time stamps, and a used ticket list maintained by each server to prevent replays.

Advantages:

?? Could simplify security management for phone deployment, seems to be lighter weight than any of the other popular keying schemes such as IKE.

?? Source code is freely available with little restriction (an acknowledgement in any source code or documentation) from MIT, with sample code for ‘Kerberizing’ individual applications.

Disadvantages/Risks:

?? Kerberos has had some security problems in recent times, e.g. authorization timeout is vulnerable to SNTP insecurity – attacker could resync time to ‘revalidate’ ticket, e.g. centralized password facility does not have ‘second’ line of defense – compromising password database gives full access to all services and requires change to all passwords.

?? Individual applications need to be individually ‘Kerberized’ or adapted to participate properly in the Kerberos ticket scheme. There are Kerberized versions of some of the necessary application layer services (e.g. Telnet, FTP) but there’s no assurance that these will be compatible with our phone requirements. Other application layer services (e.g. signaling) would have to be Kerberized individually.


?? Large deployments of Kerberos applications and services don’t scale well; to alleviate this drawback, the concept of realms has been introduced (in Kerberos version 5) which complicates the basic Kerberos model somewhat. Realms are organized hierarchically (similar to Certification Authorities, see below) and authentication between ‘peer’ realms is performed up to a common parent. Pairwise relations between ‘peer’ realms (known as shortcuts) are encouraged and, in practice, are used more than hierarchical authentication.

5.5. Certificates The use of certificates is a mechanism which allows users to authenticate (and provide a framework

for public key/asymmetric encryption) across networks. It requires the existence of a trusted Certification Authority (CA) to provide the public key used to decrypt the certificate’s signature (signed using the CA’s private key) thus verifying the validity of the certificate. The contents of the certificate can then provide details like public keys, algorithms, parameters, validity dates etc. (see [Schneier2] Section 24.9 for details).

The complication arises when the two users are certified by two different CA’s; the certification process involves traversing the certification hierarchy to find a common CA (which is therefore trusted by both parties) and the CA keys can then be applied to each of the user’s ancestors (downward from the common CA) to verify each user’s certificate. I am assuming that this process is probably more trouble than it’s worth compared to Diffie-Hellman or some other key negotiation scheme, although it is more robust with respect to man-in-the-middle attacks. This could probably be radically simplified with a universally recognized Certification Authority which specialized in phone certificates


6.0 VoIP Security Considerations and Possible Strategies As outlined above, there are numerous possibilities for security compromises and it seems unlikely

that any single tack will protect against all (or even most) vulnerabilities. The additional constraint in the VoIP domain is to ensure low resource utilization (both in terms of memory/filesystem footprint and computation) due to the performance requirements of a real-time embedded device.

6.1. Security Strategies To discuss and demonstrate strategies for handling denial of service, authentication,

encryption/privacy and key management, we examine their associated requirements separately.

6.1.1. Denial of Service

It is not clear how to defend against denial of service attacks particularly within a phone platform with (comparatively) limited processing capabilities. Attempts to isolate voice net segments from data net segments using combinations of hardware (firewalls) and software (VLANs etc.) are a good precautionary measure. The best that might be expected from a phone exposed to a DoS attack is that the phone somehow detect that an attack is underway and remove itself from service for a random period of time. Caution should be taken to ensure that the phone cannot be hijacked through an applications/services vulnerability to itself be used as the platform for launching a denial of service attack.

6.1.2. Authentication Authentication can take several forms, in order of increasing trustworthiness:

?? No authentication

?? Endpoint – only endpoint device identity has been verified

?? Sessional authentication – endpoint and user identity have been verified.

These relate to increasing levels of ‘trust’: no trust, device trust, and user trust.

Within a VoIP session, there are (at least) several authentication and related trust issues:

?? Server/signaling needs to be authenticated (i.e. is this signal coming from the right server?)

?? Signaling contents need to be authenticated/verified (i.e. is this signaling what they sent and how they intended to send it?)

?? RTP source needs to be authenticated (i.e. is this media coming from the right sender?)

?? RTP contents need to be authenticated/verified (i.e. is this media what they sent and how they intended to send it?)

6.1.3. Encryption/Privacy Encryption can take several forms,

?? No encryption [current for most platforms]

?? Media encryption [privacy of communication]

?? Signaling encryption [privacy of calls/connections]

?? Total encryption [complete privacy]


6.1.4. Encryption Approaches The issues for using TLS for encryption can be summarized as follows:

?? Requires TCP (not UDP), not appropriate for RTP media and UDP based signaling protocols

?? Uses its own pairwise negotiated keys, no external management scheme is required

?? Provides privacy application by application, session by session

?? Allows different trust levels for different services and different applications, on a session by session basis.

Although TLS does provide ‘sessional authentication’, its reliance on TCP makes it unsuitable for media and many signaling protocols.

The issues for using IPSec for encryption can be summarized as follows:

?? Requires separate key management facility (Kerberos or IKE etc.)

?? Low layer approach (AH and ESP headers on IP packets), optional security extension to IPv4, standard part of IPv6

?? Transparent to application

?? All applications and services operate at same level of trust, dangerous for PC/softphone, a malicious application (e.g. a virus) or any application running another application (e.g. VB macros or Java applets)

Although IPSec does not provide ‘sessional authentication’, its transparency and independence of signaling protocol and its inclusion in the IPv6 standard make it a strong contender for security adoption.

The issues for using SRTP for encryption can be summarized as follows:

?? Comparatively simple implementation.

?? Design optimized for securing RTP – limited packet expansion, robust in the face of lost packets etc.

?? Does not address any signaling security issues, requires separate mechanism for all other communications.

?? Requires separate key management mechanism such as IKE, ISAKMP/Oakley or Kerberos to manage keys.

In terms of examining the difficulty of incorporating security mechanisms into existing platforms, the following implementation strategies for IPSec/SRTP have been proposed:

?? Native stack implementation – this implies access to the IP stack source code or a complete protocol stack which includes the desired facilities (IPSec and/or SRTP).

?? Bump in the Stack (BITS) implementation which does not require access to the protocol stack but inserts a code layer between the protocol stack and the drivers which does the necessary translation (intercepting and translating RTP in the case of SRTP, or intercepting and incorporating the appropriate AH/ESP headers in the case of IPSec).

?? Bump in the Wire (BITW) implementation which uses specialized hardware to perform the translation equivalent to the BITS strategy. This is general discussed with respect to IPSec and not SRTP.


6.1.5. Key Management

For various forms of both authentication and encryption/privacy, some form of key management strategy will be necessary. This may be achieved through peer-to-peer negotiation such as IKE, ISAKMP/Oakley (which incorporates exchange algorithms such as Diffie-Hellman or RSA) or some centralized management scheme such as Kerberos.

6.2. Overall Security Levels Various combinations of authentication and encryption mechanisms can be used to provide varying levels of security in the phone. Following are levels in order of increasing security and combinations of mechanisms which could provide that level of security:

1.0 No authentication, no encryption [Current]

2.0 Authentication, no encryption [IPSec AH]

3.0 Authentication, media encryption (no signaling encryption) [IPSec AH + SRTP]

4.0 Authentication, full encryption (media and signaling) [IPSec AH + ESP or IPSec AH + ESP + SRTP or IPSec ESP (with authentication) +SRTP]

Incidentally, the US Communications Assistance for Law Enforcement Act (CALEA) is likely to require access at various levels which may correspond to these. For example, when law enforcement subpoena call records (3.0) or obtain a full wiretap (4.0), the key management facility may provide the necessary session keys to law enforcement personnel. This could provide a mechanism for service provider compliance by making the appropriate keys available to law enforcement personnel when required under subpoena. This may be complicated somewhat with the use of keys which have been negotiated peer-to-peer such as Diffie-Hellman.

In addition to this, it may be useful to define additional mechanisms (e.g. TLS) for securing individual sessions which are required for the phone such as provisioning the phone securely using FTP or setting up diagnostic or regression test sessions. Note that TFTP uses UDP as a transport layer and hence provisioning using TFTP cannot use TLS as a security mechanism.


7.0 References [Bell1] Authentication, Authorization and Privacy, B. Bell, submitted to TIA 41.4 committee

IP telephony Infrastructure, Ottawa, Canada, August 2001.

[Bell2] VoIP Telephone Network Security Architectural Considerations, B. Bell, submitted to TIA 41.4 committee IP telephony Infrastructure, Greensboro NC, November 2001.

[Burian] Burian, Geoff, personal communication, November 2001.

[CERT] Allen, Julia H., “The CERT Guide to System and Network Security Practices”, Addison-Wesley 2001.

[CiscoIPSec] Cisco Reference Guide: Deploying IPSec, http://www.cisco.com/warp/public/cc/so/neso/sqso/eqso/dplip_in.htm

[DNSSec] DNS Security Document Roadmap, http://www.ietf.org/internet-drafts/draft-ietf-dnsext-dnssec-roadmap-04.txt

[FCP] Firewall Communication Protocol, http://iptel.org/fcp/

[JOLT] Jolt2 - a new Windows DoS attack; http://www.securiteam.com/exploits/Jolt2_-_a_new_Windows_DoS_attack.html

[MGCP] Media Gateway Control Protocol (MGCP), http://www.ietf.org/rfc/rfc2705.txt?number=2705

[MidCom] IETF MidCom Working Group Charter with further pointers to drafts, http://www.ietf.org/html.charters/midcom-charter.html

[OpenH323] Open H.323 Gateway GUI,

[SRTP] The Secure Real Time Transport Protocol, Baugher et al, http://www.ietf.org/internet-drafts/draft-ietf-avt-srtp-02.txt

[Schneier1] Schneier, Bruce, “Phone Hacking: The Next Generation”, http://www.counterpane.com/crypto-gram-0107.html#1

[Schneier2] Schneier, Bruce, “Applied Cryptography 2nd Edition”, Wiley 1996.

[TLS] The TLS Protocol Version 1.0, http://www.ietf.org/rfc/rfc2246.txt?number=2246

[VOMIT] Voice over Misconfigured Internet Telephone, http://vomit.xtdnet.nl/

Security Framework for IP Telephony White Paper

Documents

Transcript of Security Framework for IP Telephony White Paper