In an earlier post, I talked about Yggdrasil’s use of a tree-based metric space for name-dependent routing. The purpose of this post is to explain how Yggdrasil’s name-independent routing is implemented.
Name-dependent routing means that addressing follows network topology: if the state of the network changes, then a machine’s address can change. For example, in the CIDR-based internet, at least if you’re the kind of person who reads random blogs about experimental routing schemes, then you’ve probably run into the dynamic addressing problem. If you’re offline for too long, or your ISP does any non-trivial maintenance or upgrade work on their network, or sometimes for no visible reason, IP addresses tend to change. Sure, you can pay extra for a static address, but this is one of the cases where ISPs have a genuinely good excuse to charge extra: assigning a static IP may require the ISPs routers to keep an additional route to you, which would consume excessive resources if everyone needed static addresses.
If addresses are dynamic, and you want to host some kind of service, then you probably want to attach a static identifier to the host. This usually means Dynamic DNS. This also tends to wrap the problem up with the separate problem of assigning human-readable identifiers and securing this layer of abstraction complicates things. The real issue is that we simultaneously use an IP address as an address, specifying where we want to send a packet, and as a name, specifying who the packet is for.
So, what if we didn’t do that?
Yggdrasil uses name-independent routing, which means identifiers (IPv6 addresses) and network state are independent.
As a consequence, we can treat addresses as some flat / opaque identifier.
Yggdrasil uses the
200::/7 region of the IPv6 network space.
This region was set aside for NSAP-mapped IPv6 addresses in RFC1888, which was (AFAIK) never used and eventually deprecated in RFC4048 (with further explanation in RFC4548).
The address space is currently unused and, we believe, unlikely to be reassigned for the foreseeable future.
As such, we think it’s reasonable to use this address space for Yggdrasil, to avoid conflicts with existing use of the more obvious ULA range (which we used previously).
200::/7, we partition the network into
Each node running Yggdrasil as assigned a
/128 address in
200::/8, and an optional matching
/64 prefix in
Other than the change of 1 bit for
2->3, the rest of the first 64 bits of the address and prefix are identical.
/64 address range allows a Yggdrasil node to operate as a router, where it advertises a route (using normal IPv6 tools, like radvd) allowing the rest of its LAN to reach Yggdrasil without needing to run the code.
This allows unsupported and legacy devices (phones, network printers, game consoles, old OSs, etc.) to reach the network, or low-power / battery-powered devices (IoT) to off-load cryptography and routing logic onto something with fewer constraints.
A node’s address is constructed from its
NodeID, described in the next section, but for now just think of it as a 512-bit unsigned integer.
The process is pretty simple: the first byte is always
0x03 for a
/64 prefix), to fall within the allowed address range.
The next byte is an unsigned 8 bit integer, and equals the number of leading
1 bits in the
NodeID (if you brute force a
NodeID with many leading 1 bits, this number will be larger).
The rest of the address (or
/64 prefix) is equal to the
NodeID, but with the leading
1 bits and the first leading
0 bit removed (since you know the first bit after the last
1 must be a
0), and truncated to a maximum of
/128 bits for an address (or
/64 for a prefix).
By optionally brute-forcing a large number of leading
1 bits, a node can increase the difficulty of address and prefix collisions.
This is basically a workaround for the fact that 128 bit IPv6 addresses (and much moreso, 64-bit prefixes) are too short really to be immune to collisions.
Fixing that is on my wish-list for IPv7, if for some reason that ever happens.
Other than when packets enter and leave the network, Yggdrasil is oblivious to IPv6 addresses.
The untruncated equivalent, a
NodeID, is used instead.
NodeID is equal to a sha512sum of the node’s curve25519 public cryptographic key.
Traffic, including protocol traffic, is encrypted in Yggdrasil, so spoofing an ID is equivalent to generating a private key for which the derived public key happens to hash to the right
NodeID – presumably a somewhat harder problem than finding a sha512sum collision.
This is mostly influenced by cjdns’s use of a double-sha512sum truncated to 128 bits as an addressing scheme, but with a single round of sha512 and a more complicated truncation rule.
Meanwhile, to actually route a packet through the network, we need to know not only to whom it is going, but where it is going – name-independent routing implies some form of locator identifier separation (similar in purpose to LISP).
In Yggdrasil, a locator consists of a node’s position in the spanning tree.
This includes two parts: the root of the tree, and a path from the root to the destination node.
Since the same root is implied, only the path portion, referred to as
coords in the code, needs to be attached to each packet.
This path is represented by a list of interface numbers.
Each node assigns an interface to each of their peers, with 0 corresponding to their self in the current implementation, which are stored as a list of 64-bit unsigned integers.
This list is then encoded as a list of 8-bit unsigned integers, with each 64-bit integer represented by a variable-length quantity – in practice, this means 1 byte per hop in the path from the root to the destination.
coords are then used as distance labels, wherein the
coords of any two nodes are sufficient to compute their distance apart on the tree.
Now that each node has a
NodeID and a locator with a set of
coords, we need a way to use them.
Specifically, when given a node’s
200::/7 IPv6 address, we need to figure out the node’s coords.
To do that, we first unpack the IPv6 address into a
NodeID and a bitmask signifying which bits of the
NodeID are known (since some were truncated away when compressing it into an IPv6 address).
Then, to map a
coords, we use a Distributed Hash Table (DHT).
In particular, we use a Kademlia-like DHT with no proper key:value store – we only store information about nodes in each keyspace bucket.
In practice, what this means is the following.
Each node has a 512-bit
For each number of possible bits that could match in the address (0 to 511), the node stores a list of other nodes (called a bucket), which includes node keys (to derive
coords (so you can actually send packets to them).
These buckets store information about nodes for which exactly that number of leading bits match, so bucket 3 stores info about nodes where exactly the first 3 bits match.
More generally, we can define the distance between
NodeIDs, in some abstract keyspace, as being the
xor of the two
NodeIDs, and each bucket covers some range of keyspace, but it turns out to be easiest to align these things so the bucket edges match bits of the
Then, for every possible
NodeID that could be in the network, either a node has that
NodeID, or else it should know someone closer to that
NodeID in one of its buckets.
Nodes can then search for a given
NodeID by iteratively asking nodes they know about for a list of nodes that are closer to the destination in keyspace.
Like any good Kademlia-based DHT, this requires a logarithmic number of steps, and uses a logarithmic amount of memory for each node’s local table.
1 suppression from address generation tends to lead to some additional steps, as it encourages nodes to cluster in the same region of keyspace, which acts as a form of sybil attack mitigation (it becomes computationally more expensive to become a keyspace neighbor any particular node – let alone all the keyspace neighbors of that node – if everyone’s address starts with a bunch of extra
The currently implementation’s bucket size is 2 nodes, plus any directly connected peers (always tracking all peers helps from a bootstrapping perspective, and we know about them already from the switch layer, so we can include them in the DHT at basically no cost).
It’s important to note that packets are routed based on
coords, with no regard to the structure of the DHT.
This is in contrast to cjdns’s old dht-based pathfinder, where source routing and switch label splicing cause the path a packet takes to mirror the path followed in the DHT (and if the DHT learns paths from other DHT entires, then extra stretch may compound as the network grows).
Once a node learns the destination’s
coords and key from the DHT lookup, it can initiate a session exchange.
The session exchange involves sending a session ping packet the destination, which includes the sender’s public key,
coords, and an ephemeral session public key (used for forward secrecy).
The destination then responds with a pong, in the same format.
The nodes cache this information, and use the cached
coords and ephemeral derived secret key to communicate.
So full connection process looks something like the following:
NodeID, and initially fills it with information from the node’s own portion of the DHT.
NodeID(or hits a dead end).
NodeIDand bitmask match the
NodeIDof the closest node (if not, then it means the destination node doesn’t exist / is offline / is unreachable while the network re-converges due some disruption).
coordsto avoid future lookups (unless the session is unresponsive for too long, in which case any new sends will also trigger a ping, or a new DHT lookup if the pings fail).
Yggdrasil provides name-independent routing with cryptographically generated addresses.
Static addressing means you can keep the same address as long as you want, even if you move to a new location, or generate a new address (by generating new keys) whenever you want.
Cryptographically generated addressing means it’s impractical (if not outright impossible) to spoof addresses, so you get end-to-end encryption, with automatic key exchange and forward secrecy, and with some degree of collision resistance, without any dependence on public key infrastructure.
This also means reflection and amplification attacks, which are only possible because of IP address spoofing, or attacks based on route hijacking, should be at least as hard as finding a truncated sha512sum collision or breaking the cryptographic primitives.
Oh, and the routing tables are ridiculously small – we could exhaust the entire
200::/8 part of the address space and the average routing table would only be ~240 nodes + directly connected peers.
The main cost you pay for this is an extra delay during the initial DHT lookup, which is sort-of like having a slower (because it’s
O(log(n))-ish steps) DNS lookup, except the names aren’t human readable.
The DHT takes a little while to converge, so incoming connections also tend to fail immediately after node startup, until your keyspace neighbors have a chance to hear about you, but the lookups for outgoing connections tend to work almost immediately.
Also, while the current DHT should be fine for peer-to-peer type applications, I wouldn’t want to be the keyspace neighbor of an especially popular server, as this would mean a large number of DHT lookups would go through my node.
A caching DHT could, in principle, be able to address that issue without changes to the protocol, but that’s not a very high priority given the size of the current network.
Lastly, IPv6 anycast may work with the current implementation, but which anycast node the lookup directs you to probably has nothing to do with the network topology, so it would make sense to add an extra lookup at the end of a search to return all
coords where a particular
NodeID can be found.
This would require slightly the DHT to include identical
NodeIDs, and adding an additional DHT lookup step at the end to get a list of all
coords (presumably selecting the closest
coords returned by the lookup).
I’m not certain if this can be done in a backwards compatible way, so this will require further investigation at some point.