Buoyant Enterprise for Linkerd

enterprise-2.16.0

August 13, 2024

Linkerd 2.16 is a new major release that adds new retry, timeout, and per-route metrics to HTTPRoute and GRPCRoute types, bringing Linkerd’s Gateway API implementation to feature parity with ServiceProfiles and addressing some long-standing wrinkles with these features. Linkerd 2.16 also adds support for IPv6 and introduces an audit mode for Linkerd’s zero trust network policies.

Buoyant Enterprise for Linkerd 2.16.0 also introduces new external workload automation to ease the management of VMs and other off-cluster workloads, and a “send a flare” remote diagnostics CLI command.

See the Linkerd 2.16 announcement blog post for more details.

Who should upgrade?

This is a feature release. We recommend upgrading to BEL 2.16.0 for customers who:

  • Need to manage meshed VM or off-cluster applications at scale;
  • Need to run Linkerd on IPv6 networks;
  • Are introducing authorization policy to live environments and want to derisk adoption; or
  • Are making significant use of retries, timeouts, and circuit breaking, and want to have a consistent model for composing these features.

Note that while BEL 2.16.0 includes several significant bugfixes, these have all been backported to earlier BEL 2.15.x point releases.

Supported Kubernetes versions

For this release, the minimum supported Kubernetes version is 1.22, and the maximum supported Kubernetes version is 1.29.

Upgrade guidance

To upgrade with BEL’s lifecycle automation operator, you will need Buoyant Extension version v0.32.0 or later.

There are several important notes before upgrading.

Gateway API resource management

Linkerd 2.16 requires the Gateway API CRDs to be installed on the cluster. These CRDs can either be installed and managed by Linkerd, or they can be installed and managed by another component on the system. (For example, GCP clusters may already have these CRDs installed by default.)

If you want these CRDs to be managed by Linkerd (the default): proceed as normal. Linkerd will install these CRDs for you and upgrade them as appropriate.

If the CRDs are managed by another component: Set the enableHttpRoutes setting to “false” when upgrading or installing Linkerd. In this mode, Linkerd will not touch these CRDs. Note that if the CRDs correspond to an earlier version of the Gateway API that does not include the GRPCRoute CRD, Linkerd’s GRPCRoute-related functionality will not be available, but Linkerd will otherwise operate normally.

Breaking changes to shutdown endpoint

To mitigate CVE-2024-40632, in which a meshed application that is already vulnerable to an SSRF attack may also leave the proxy open to shutdown, the /shutdown endpoint is now disabled by default. This endpoint is used to terminate the proxy programmatically, typically by the linkerd-await command as part of a meshed Job or CronJob workload.

While native sidecar support has reduced the need for this endpoint, it can be re-enabled by setting proxy.enableShutdownEndpoint to “true”.

Docker runtime version incompatibilities

Due to an incompatibility between modern versions of glibc and old versions of the Docker runtime engine, Linkerd 2.16 no longer supports Docker runtime earlier than version 20.10.10. Attempting to run Linkerd with an old Docker runtime will result in a proxy crash.

To determine whether you have an older Docker runtime, run the command

kubectl get node -o jsonpath="{.items[*].status.nodeInfo.containerRuntimeVersion}"

If the output is of the form docker://20.x.y, ensure the version is greater than 20.10.10. If the output is of the form containerd://..., this issue should not affect you.

Changes to log output

Prior to Linkerd 2.16, HTTP headers were logged by the proxy when the log level was set to debug or trace. These headers may contain sensitive information such as access tokens. As of Linkerd 2.16, these headers are no longer part of log output in debug or trace modes. Header output can be reenabled by setting the logHTTPHeaders configuration value to “insecure”.

Changelog

New features

  • Timeouts and retries can now be configured with Gateway API HTTPRoute and GRPCRoute resources. Timeout and retry configuration can be placed as annotations on HTTPRoutes, GRPCRoutes, or their parent Services to control client behavior matching these resources.
  • Retries and timeouts configured with Gateway API resources have improved semantics compared to those configured with ServiceProfiles: they can be combined with circuit breaking, and requests that timeout are eligible for retries. Learn more.
  • Linkerd now emits traffic metrics for HTTPRoute and GRPCRoute resources, providing granular per-route telemetry about traffic matching those resources.
  • A new “audit mode” has been introduced that allows safer introduction of authorization policy to live systems. When enabled, corresponding policy violations will be logged but not enforced. Audit mode can be enabled on specific policy resources and the policy generation command now uses audit mode as the default. Learn more.
  • Linkerd now features an IPv6 mode that supports both IPv6-only and dual-stack clusters (using only IPv6 endpoints in the latter case). This mode is disabled by default; to enable set disableIPv6:false. Learn more.
  • BEL now provides significant automation to ease the management of external workloads (e.g. applications running on VMs): This includes a “harness” that runs alongside the application and handles the mechanics of network configuration, registration, and probes; an ExternalGroup CRD that provides a principled way to manage multiple similar external applications (e.g. multiple replicas); and an autoregistration control plane component that ties the two together.

Other notable changes

  • HTTP/2 keep-alive messages are now enabled by default for all meshed communication, allowing proxies to proactively detect connections that have been lost by the underlying network or OS. (linkerd2#12498 and linkerd2#12504)
  • Linkerd CLI commands that output Kubernetes resources now support JSON output.
  • The proxy’s /shutdown endpoint is now disabled by default, unless explicitly enabled. (linkerd2#12705)
  • HTTP headers are no longer logged in debug or trace output, unless explicitly enabled (linkerd2#12665)
  • Resource requests for proxy-init now simple use those of the proxy, removing unnecessary configuration. (linkerd2#12741)
  • A new control plane component, linkerd-enterprise, now handles BEL-specific functionality, and, if enabled, a new linkerd-autoregistration component handles automation of external workloads.

Bug fixes

  • Fix a bug in the destination controller causing incorrect local traffic policy (linkerd2#12254)
  • Fix a bug in the destination controller where it could drop updates when there were a large number of Server resources (linkerd2#12013)
  • Fix panic in service mirror controller (linkerd2#12406)
  • Fix bug where policy controller would continuously update HTTPRoute status (linkerd2#12454)
  • Fix a bug where backend_not_found route status was being set incorrectly (linkerd2#12565)
  • Remove an internal limit on the number of concurrent gRPC streams to the control plane, leaving available memory as the only constraint (linkerd2#12598)
  • Fix bug where the policy controller could get stuck with stale data when services change (linkerd2#12635)
  • Fix bug where the policy controller could return incorrect initial data for HttpRoutes in a producer namespace (linkerd2#12619)

Other changes

Control plane

CLI

  • Modify linkerd policy generate use audit-mode Server resources by default. Previous behavior can be enabled by using the --disable-audit flag.
  • Introduce a new linkerd policy generate --concurrency flag to decrease time for policy generation
  • Fix typo in diagnostics command (linkerd2#12723)

Helm manifests

  • Simplify license configuration. Setting additionalEnv values for custom licenses is no longer necessary.
  • Relax Helm chart release name constraints. Helm release names are no longer required to be named linkerd-crds and linkerd-control-plane.
  • Default Policy controller resources to destination resources in manifest (linkerd2#12191)

CVE remediations and updates

  • Update Go Docker dependency in controller for both non-FIPS and FIPS to remediate: CVE-2024-41110