October 8, 2017

Internet 101: IP

The Internet Protocol is responsible for moving data across network boundaries. How does it do this?

Addressing

To route data on a multi node path from one machine to another, it is necessary to uniquely and deterministically identify the source and destination machine (it should be noted that this idea is abused a lot in the wild). IP does this by defining a numerical IP address.

For IPv4, this is a 32 bit number (e.g. 172.16.254.1). Since the number of possible IPv4 addresses is grossly insufficient, IPv6 defines a 128 bit number. IPv6 addresses are typically shown using eight four-digit hexadecimal numbers delimited by colons (e.g. 2001:db8:0:1234:0:567:8:1).

On the public internet, these addresses are managed by the IANA and five regional internet registries. Ultimately, an ISP ends up with a pool of IP addresses, and can assign them to customers as they please.

Packets

When routing data across networks, each node must be aware of the destination address along with some other useful metadata. This leads to the definition of an IP packet. A packet encapsulates a segment of data by adding an IP header.

This header contains the source and destination address, in addition to a number of other fields such as the IP protocol version, the transport layer protocol, and packet length. It also contains an eight bit TTL, or hop limit. This value is decremented at each node, and the packet is discarded when it reaches zero. This ensures that packets do not circle around the internet forever.

A standard IPv4 header includes a number of extra fields and options, such as packet fragmentation and a checksum for the header. However, most of these features were deemed unnecessary, and IPv6 only leaves the bare essentials (although it does specify a number of optional extensions). Both versions of IP specify the length of the packet as a 16 bit number, thereby limiting the packet length to 65535 octets.

Routing

IP is designed to bridge across networks. Local networks will vary widely in terms of hardware, protocols and actual physical connections used. As long as all nodes speak IP, packets can be wrapped according to the local network interface, transmitted across the local network, and be received and stripped at the other end. This node can repeat this process on another network, until the packet reaches its destination address.

This begs the question of how an individual node routes packets. If it does not host the destination address, and is not directly connected to a machine that does, how does the node know where to forward an incoming packet to? IP provides the building blocks for routing, but it does not define the implementation of routing itself. For this, other standards and protocols are required.

IP is a best effort service. Each packet is considered independently. Packets may be duplicated, dropped silently, corrupted during transmission or delivered out of sequence, and IP does not provide any mechanisms to augment this. There is no notion of a connection.

For this reason, other protocols such as TCP are built on top of IP to provide reliability and/or other services. This separation is advantageous for machines that forward packets, such as routers. Routers do not need to keep state in relation to the packets that they forward, nor do they need to know anything about the upper layer protocols that depend on IP (in practice, this is not always true).

[1] Internet Protocol. RFC 791, 1981. https://tools.ietf.org/html/rfc791

[2] Internet Protocol, Version 6 (IPv6) Specification. RFC 8200, 2017. https://tools.ietf.org/html/rfc8200