Theory

What is IPsec?

IPsec is an extension to the IP protocol which provides security to the IP and the upper-layer protocols. It was first developed for the new IPv6 standard and then “backported” to IPv4. The IPsec architecture is described in the RFC2401. The following few paragraphs will give you a short introduction into IPsec.

IPsec uses two different protocols - AH and ESP - to ensure the authentication, integrity and confidentiality of the communication. It can protect either the entire IP datagram or only the upper-layer protocols. The appropiate modes are called tunnel mode and transport mode. In tunnel mode the IP datagram is fully encapsulated by a new IP datagram using the IPsec protocol. In transport mode only the payload of the IP datagram is handled by the IPsec protocol inserting the IPsec header between the IP header and the upper-layer protocol header (see Figure 1).

Figure 1. IPsec tunnel and transport mode

To protect the integrity of the IP datagrams the IPsec protocols use hash message authentication codes (HMAC). To derive this HMAC the IPsec protocols use hash algorithms like MD5 and SHA to calculate a hash based on a secret key and the contents of the IP datagram. This HMAC is then included in the IPsec protocol header and the receiver of the packet can check the HMAC if it has access to the secret key.

To protect the confidentiality of the IP datagrams the IPsec protocols use standard symmetric encryption algorithms. The IPsec standard requires the implementation of NULL and DES. Today usually stronger algorithms are used like 3DES, AES and Blowfish.

To protect against denial of service attacks the IPsec protocols use a sliding window. Each packet gets assigned a sequence number and is only accepted if the packet's number is within the window or newer. Older packets are immediately discarded. This protects against replay attacks where the attacker records the original packets and replays them later.

For the peers to be able to encapsulate and decapsulate the IPsec packets they need a way to store the secret keys, algorithms and IP addresses involved in the communication. All these parameters needed for the protection of the IP datagrams are stored in a security association (SA). The security associations are in turn stored in a security association database (SAD).

Each security association defines the following parameters:

Some implementations of the security association database allow further parameters to be stored:

Since the security association defines the source and destination IP addresses, it can only protect one direction of the traffic in a full duplex IPsec communication. To protect both directions IPsec requires two unidirectional security associations.

The security assocations only specify how IPsec is supposed to protect the traffic. Additional information is needed to define which traffic to protect when. This information is stored in the security policy (SP) which in turn is stored in the security policy database (SPD).

A security policy usually specifies the following parameters:

The manual setup of the security association is quite error prone and not very secure. The secret keys and encryption algorithms must be shared between all peers in the virtual private network. Especially the exchange of the keys poses critical problems for the system administrator: How to exchange secret symmetric keys when no encryption is yet in place?

To solve this problem the internet key exchange protocol (IKE) was developed. This protocol authenticates the peers in the first phase. In the second phase the security associations are negotiated and the secret symmetric keys are chosen using a Diffie Hellmann key exchange. The IKE protocol then even takes care of periodically rekeying the secret keys to ensure their confidentiality.

IPsec Protocols

The IPsec protocol family consists of two protocols: Authentication Header (AH) and Encapsulated Security Payload (ESP). Both are independent IP protocols. AH is the IP protocol 51 and ESP is the IP protocol 50 (see /etc/protocols). The following two sections will briefly cover their properties.

AH - Authentication Header

The AH protocol protects the integrity of the IP datagram. To achieve this, the AH protocol calculates a HMAC to protect the integrity. When calculating the HMAC the AH protocol bases it on the secret key, the payload of the packet and the immutable parts of the IP header like the IP addresses. It then adds the AH header to the packet. The AH header is shown in Figure 2.

Figure 2. The AH Header protect the integrity of the packet

The AH header is 24 bytes long. The first byte is the Next Header field. This field specifies the protocol of the following header. In tunnel mode a complete IP datagram is encapsulated; therefore the value of this field is 4. When encapsulating a TCP datagram in transport mode the corresponding value is 6. The next byte specifies the length of the payload. This field is followed by two reserved bytes. The next double word specifies the 32 bit long Security Parameter Index (SPI). The SPI specifies the security association to use for the decapsulation of the packet. The 32 bit Sequence Number protects against replay attacks. Finally the 96 bit holds the hash message authentication code (HMAC). This HMAC protects the integrity of the packets since only the peers knowing the secret key can create and check the HMAC.

Since the AH protocol protects the IP datagram including immutable parts of the IP header like the IP addresses the AH protocol does not allow NAT. Network address translation (NAT) replaces an IP address in the IP header (usually the source IP) by a different IP address. After the exchange the HMAC is not valid anymore. The NAT-Traversal extension of the IPsec protocol implements ways around this restriction.

ESP - Encapsulated Security Payload

The ESP protocol can both ensure the integrity of the packet using a HMAC and the confidentiality using encryption. After encrypting the packet and calculating the HMAC the ESP header is generated and added to the packet. The ESP header consists of two parts and is shown in Figure 3.

Figure 3. The ESP header

The first doubleword in the ESP header specifies the Security Parameter Index (SPI). This SPI specifies the SA to use for the decapsulation of the ESP packet. The second doubleword holds the Sequence Number. This sequence number is used to protect against replay attacks. The third doubleword specifies the Initialization Vector (IV) which is used in the encryption process. Symmetric encryption algorithms are susceptible to a frequency attack if no IV is used. The IV ensures that two identical payloads lead to different encrypted payloads.

IPsec uses block ciphers for the encryption process. Therefore the payload may need to be padded if the length of the payload is not a multiple of the block length. The length of the pad is then added. Following the pad length the 2 byte long Next Header field specifies the next header. Lastly the 96 bit long HMAC is added to the ESP header ensuring the integrity of the packet. This HMAC only takes the payload of the packet into account. The IP header is not include in the calculation process.

The usage of NAT therefore does not break the ESP protocol. Still in most cases NAT is not possible in combination with IPsec. The NAT-Traversal offers a solution in this case by encapsulating the ESP packets within UDP packets.

IKE Protocol

The IKE protocol solves the most prominent problem in the setup of secure communication: the authentication of the peers and the exchange of the symmetric keys. It then creates the security associations and populates the SAD. The IKE protocol usually requires a user space daemon and is not implemented in the operating system. The IKE protocol uses 500/udp for it's communication.

The IKE protocol functions in two phases. The first phase establishes a Internet Security Association Key Management Security Association (ISAKMP SA). In the second phase the ISAKMP SA is used to negotiate and setup the IPsec SAs.

The authentication of the peers in the first phase can usually be based on pre-shared keys (PSK), RSA keys and X.509 certificates (racoon even supports Kerberos).

The first phase usually supports two different modes: main mode and aggressive mode. Both modes authenticate the peer and setup an ISAKMP SA, but the aggressive mode uses only half the number of messages to achieve this goal. This does have its drawbacks though, because the aggressive mode does not support identity protection and is therefore susceptible to a man-in-the-middle attack if used in conjunction with pre-shared keys. On the other hand this is the only purpose of the aggressive mode. Because of the internal workings of the main mode it does not support the usage of different preshared keys with unknown peers. The aggressive mode does not support identity protection and transfers the identity of the client in the clear. The peers therefore know each other before the authentication takes place and different pre-shared keys can be used for different peers.

In the second phase the IKE protocol exchanges security association proposals and negotiates the security associations based on the ISAKMP SA. The ISAKMP SA provides the authentication to protect against a man-in-the-middle attack. This second phase uses the quick mode.

Usually two peers negotiate only one ISAKMP SA, which is then used to negotiate several (at least two) unidirectional IPsec SAs.

NAT-Traversal

What is NAT-Traversal and why is it needed?

Often one peer in the VPN is behind a NAT-device. I just assume Source-NAT devices here. Whenever I talk about NAT I mean Source-NAT or Masquerading. What does this mean concerning the VPN? Well, first of all the original IP address of the peer is hidden by the NAT-device. The NAT-device conceals the original source IP address and replaces it by its own IP address.

This make the IPsec AH protocol immediately unusable. But ESP can still be used if both sides are configured correctly.

So why do you need NAT-Traversal? Because as soon as two machines behind the same NAT device try to build a tunnel to the outside, both will fail.

Why is this happening? The NAT device needs to keep track of the "natted" connections to be able to "de-nat" the reply packets back to the original client. Therefore the NAT device maintains an internal table where all "natted" connections are stored. Lets assume one client connects to a webserver on the Internet. The NAT device conceils the original address by replacing it with its own address. It then makes a note in its internal table that all packets coming back on the chosen client port have to be send to the original client1. As soon as the second client starts a connection, it handles that connection identical. If the second client chose the same client port by coincidence the NAT device will also modify the client port for unambuigity. This works very well using TCP and UDP because those protocols provide ports. ESP does not use ports. Therefore the NAT device can only use the protocol distinguish the packets. When the first client connects it stores the information in the table that all ESP packets have to be "denatted" to the first client. When the second client connects it will overwrite this entry with the appropiate entry for the second one thus breaking at least the first connection.

What does NAT traversal do to help? NAT-traversal again encapsulates the ESP packets in UDP packets. These can easily be handled by a NAT device since they provide ports. By default port 4500/udp is used. NAT traversal is specified in several drafts. There are no RFCs at the moment. A nice feature of NAT traversal is the fact that once activated the peers automatically use it when needed.