(m)TLS concept guide

One of the most important jobs of a service mesh is providing secure communications at a platform level across the entire application. By “secure”, we mean that we provide authenticity, confidentiality, and integrity for all communications in the mesh.

Authenticity gives us confidence that we’re really talking to who we think we are.
Confidentiality lets us trust that our messages are private, and that no one else can read them.
Integrity lets us be confident that no one is editing our messages in transit.

These are old problems: they’ve been around for as long as people have been communicating, and a great many different techniques have been developed to address them. Linkerd relies on industry-standard mTLS to provide these guarantees.

TLS and mTLS

Transport Layer Security, or TLS, is currently defined by RFC 8446. This is the technology we all use every day on the Web to securely communicate with banks, shopping sites, our governments, and everything else. TLS has been around since 1999, which is important because it means that cryptanalysts and researchers have had plenty of time to scour it looking for real-world weaknesses, and implementers and security experts have had plenty of time to correct everything they’ve found. TLS is battle-hardened in a way that few other bits of software are.

There are two distinct ways of using TLS, which we’ll call “normal” TLS and mutual TLS (mTLS). Both apply only to connection-based transports like TCP, and both involve a client and a server, with the client being the one that initiates the connection. Both handle encryption and integrity checking exactly the same way; the difference is only in how they handle authentication.

“Normal” TLS

“Normal” TLS is the variant used all over the Web – it’s what you’re probably using right now to read this page. In this variant, the client verifies the server’s identity, but the server doesn’t verify the client’s identity. This is fine for the Web, where the server is a public resource and the client is a random person on the Internet, but in a microservices application it’s not great.

Mutual TLS

Mutual TLS is what you want for microservices: the client verifies the server’s identity and the server verifies the client’s identity, so that each end of the connection can be confident of the identity of the other. (The name “mutual TLS” is from the fact that the two ends mutually authenticate each other.)

Cryptography in TLS and mTLS

TLS and mTLS lean very heavily on cryptography. In particular, they use asymmetric or public-key cryptography as a building block for basically everything else.

In asymmetric cryptography, each party has a pair of keys: a public key and a private key. The public key can - and should! - be shared with anyone, but the private key must be kept secret. If you encrypt something with the public key, only the private key can decrypt it, and vice versa, with the result that:

If Alice encrypts data with Bob’s public key, she knows that only Bob can read it, because only Bob has the private key that can decrypt it. The public key therefore gives you a way of communicating privately with the owner of the key.
If Alice encrypts data with her own private key, then anyone can decrypt it, because her public key is public. However, only Alice could have created the encrypted message, because she’s the only one with her private key.

In short, the public key is the basis of privacy and the private key is the basis of identity, and TLS and mTLS make very heavy use of these concepts.

Of course, actually implementing this in the real world is far more complex than this simple description, but the concepts hold true even if the details are more complicated. One important detail to know, though, is that if Alice is going to use her private key to prove that she created a message, she mustn’t encrypt the message data itself: instead, she’ll encrypt a hash of the data, for various reasons. This operation is so common in public-key cryptosystems that encrypting a hash of some data with your private key is called signing the data; the encrypted hash is called a digital signature or simply a signature.

Confidentiality and integrity in TLS and mTLS

Confidentiality in TLS (and therefore mTLS) is handled simply by encrypting the data in transit. Integrity is handled by adding message digests (a form of cryptographic checksum) to the data, so that the recipient can verify that nothing was altered in flight. These functions are typically part of the low-level implementation of TLS, and you as a user will almost never need to think about them.

Authentication in TLS and mTLS

Authentication in TLS and mTLS is a bit more complex than confidentiality and integrity. Glossing over massive amounts of detail, the basic idea is that:

each end of the connection signs something magic with their private key, and
the other then uses the public key to check that the magic thing was correctly signed.

In practice, keeping track of all the keys rapidly becomes annoying. TLS and mTLS use X.509 certificates to manage this annoyance.

X.509 certificates

X.509 certificates, named for the protocol that originally defined them (in 1988!), have a reputation for being extremely complex, but they’re actually very simple in concept:

First, certificates provide a way to use one certificate to vouch for another certificate’s validity, using a digital signature. We say that the first certificate issues or signs the second.
Second, certificates let us associate certain metadata - like a name and an expiration date - with a keypair.

The trust hierarchy

The way certificates sign other certificates mean that certificates naturally exist in a trust hierarchy:

At the top of the hierarchy is the root or anchor certificate, which always signs itself. At the bottom are leaf certificates, which represent single entities. In between can be any number of intermediate certificates (including zero). The root certificate and the intermediate certificates are often called certifying authority certificates (CA certificates), because of the way their job is to certify the validity of other certificates.

The hierarchy is important because it gives microservices a way to easily verify the identity of other microservices: if you get a certificate that claims to identify the workload, and you have access to the public keys to every certificate in the hierarchy, you can quickly verify all the signatures and be confident that you’re looking at a valid identity. In practice, many systems will simply transmit all the intermediate certificates along with the leaf certificate, but the recipient always needs to be told which root certificates to trust.

In practice, also, a given entity will almost always need to handle multiple root certificates at once: even if you’re only talking to one service, you’ll still need to handle multiple root certificates to support certificate rotation, which we’ll discuss below. This is commonly dealt with by always supporting a trust bundle which can contain multiple root certificates, rather than just support a single root certificate.

Anatomy of an X.509 certificate

There is a lot of internal structure to an X.509 certificate, and many of the tools for working with certificates assume some familiarity with the structure. Here are the most important parts:

Subject: The Subject is the certificate’s name.
Issuer: The Issuer is the name of the certificate that signed this certificate.
Validity times: The certificate has a not before and not after time which give the time period during which the certificate is valid. (They’re specified this way to avoid questions of whether the moment of the timestamp is included in the validity period or not: they both are.)
Fingerprint: The fingerprint is a hash of the private key associated with the certificate; it’s a good way to check whether or not two certificates are the same.

Both the Subject and Issuer are X.509 Distinguished Names, which we’re not actually going to define here: for Linkerd (and most other meshes), the important bit in the Subject and Issuer is the CN field. CN stands for Common Name; for Linkerd, it will be a hostname or (if you’re using mesh expansion) a SPIFFE ID.

Rotating certificates

The reason that X.509 certificates have a validity period is that they contain keys, and a basic axiom of digital security is that the longer you use a key, the more valuable it becomes to an adversary. Changing out a certificate’s private key is called rotating the certificate, and it’s a basic security practice. (“Rotating” is a bit of a misnomer: you actually create an entirely new certificate that has a new key but mostly the same metadata as the old certificate, so it’s less “rotating” and more “recreating”.)

Note that rotating a certificate also means rotating every certificate issued by the certificate being rotated, because even if you didn’t change the key for those certificates lower down in the trust hierarchy, the signature would change! So in practice, rotating a certificate is more of a headache the closer you get to the root certificate: leaf certificates are trivial, root certificates require redoing the entire world.

This means that the only way to do zero-downtime certificate rotation is to allow the old certificate to coexist with the new one until all the certificates below yours have been rotated: if you rotate an intermediate or root certificate, your workloads might get connections that are using old certificates while the process of rotating the leaf certificates is still ongoing. This is why the trust bundle is typically important: when rotating the root certificate, you’ll need to keep the old root in the trust bundle until all the intermediate and leaf certificates have been rotated, which can take a little time.

Linkerd and X.509 certificates

Linkerd uses X.509 certificates to provide the authenticity part of mTLS, with a specific trust hierarchy:

At the top of the hierarchy is the Linkerd trust anchor.
The trust anchor issues the Linkerd identity issuer.
Linkerd uses the identity issuer to issue workload certificates to each workload in the mesh.

The most important thing to remember is that Linkerd manages workload certificates for you, but you need to manage the trust anchor and the identity issuer! While it’s possible to do this manually, we recommend using Venafi’s cert-manager, along with its companion trust-manager, instead.

What's on this page