enterprise-2.16.0
August 13, 2024
Linkerd 2.16 is a new major release that adds new retry, timeout, and per-route metrics to HTTPRoute and GRPCRoute types, bringing Linkerd’s Gateway API implementation to feature parity with ServiceProfiles and addressing some long-standing wrinkles with these features. Linkerd 2.16 also adds support for IPv6 and introduces an audit mode for Linkerd’s zero trust network policies.
Buoyant Enterprise for Linkerd 2.16.0 also introduces new external workload automation to ease the management of VMs and other off-cluster workloads, and a “send a flare” remote diagnostics CLI command.
See the Linkerd 2.16 announcement blog post for more details.
Who should upgrade?
This is a feature release. We recommend upgrading to BEL 2.16.0 for customers who:
- Need to manage meshed VM or off-cluster applications at scale;
- Need to run Linkerd on IPv6 networks;
- Are introducing authorization policy to live environments and want to derisk adoption; or
- Are making significant use of retries, timeouts, and circuit breaking, and want to have a consistent model for composing these features.
Note that while BEL 2.16.0 includes several significant bugfixes, these have all been backported to earlier BEL 2.15.x point releases.
Supported Kubernetes versions
For this release, the minimum supported Kubernetes version is 1.22, and the maximum supported Kubernetes version is 1.29.
Upgrade guidance
To upgrade with BEL’s lifecycle automation operator, you will need Buoyant Extension version v0.32.0 or later.
There are several important notes before upgrading.
Gateway API resource management
Linkerd 2.16 requires the Gateway API CRDs to be installed on the cluster. These CRDs can either be installed and managed by Linkerd, or they can be installed and managed by another component on the system. (For example, GCP clusters may already have these CRDs installed by default.)
If you want these CRDs to be managed by Linkerd (the default): proceed as normal. Linkerd will install these CRDs for you and upgrade them as appropriate.
If the CRDs are managed by another component: Set the enableHttpRoutes
setting to “false” when upgrading or installing Linkerd. In this mode, Linkerd
will not touch these CRDs. Note that if the CRDs correspond to an earlier
version of the Gateway API that does not include the GRPCRoute CRD, Linkerd’s
GRPCRoute-related functionality will not be available, but Linkerd will
otherwise operate normally.
Breaking changes to shutdown endpoint
To mitigate CVE-2024-40632,
in which a meshed application that is already vulnerable to an SSRF attack may
also leave the proxy open to shutdown, the /shutdown
endpoint is now disabled
by default. This endpoint is used to terminate the proxy programmatically,
typically by the linkerd-await
command as part of a meshed Job or CronJob
workload.
While native sidecar support has reduced the need for this endpoint, it can be
re-enabled by setting proxy.enableShutdownEndpoint
to “true”.
Docker runtime version incompatibilities
Due to an incompatibility between modern versions of glibc and old versions of
the Docker runtime engine, Linkerd 2.16 no longer supports Docker runtime
earlier than version 20.10.10
. Attempting to run Linkerd with an old Docker
runtime will result in a proxy crash.
To determine whether you have an older Docker runtime, run the command
kubectl get node -o jsonpath="{.items[*].status.nodeInfo.containerRuntimeVersion}"
If the output is of the form docker://20.x.y
, ensure the version is greater
than 20.10.10
. If the output is of the form containerd://...
, this issue
should not affect you.
Changes to log output
Prior to Linkerd 2.16, HTTP headers were logged by the proxy when the log level
was set to debug or trace. These headers may contain sensitive information
such as access tokens. As of Linkerd 2.16, these headers are no longer part of
log output in debug or trace modes. Header output can be reenabled by
setting the logHTTPHeaders
configuration value to “insecure”.
Changelog
New features
- Timeouts and retries can now be configured with Gateway API HTTPRoute and GRPCRoute resources. Timeout and retry configuration can be placed as annotations on HTTPRoutes, GRPCRoutes, or their parent Services to control client behavior matching these resources.
- Retries and timeouts configured with Gateway API resources have improved semantics compared to those configured with ServiceProfiles: they can be combined with circuit breaking, and requests that timeout are eligible for retries. Learn more.
- Linkerd now emits traffic metrics for HTTPRoute and GRPCRoute resources, providing granular per-route telemetry about traffic matching those resources.
- A new “audit mode” has been introduced that allows safer introduction of authorization policy to live systems. When enabled, corresponding policy violations will be logged but not enforced. Audit mode can be enabled on specific policy resources and the policy generation command now uses audit mode as the default. Learn more.
- Linkerd now features an IPv6 mode that supports both IPv6-only and dual-stack
clusters (using only IPv6 endpoints in the latter case). This mode is disabled
by default; to enable set
disableIPv6:false
. Learn more. - BEL now provides significant automation to ease the management of external
workloads (e.g. applications running on VMs): This includes a “harness” that
runs alongside the application and handles the mechanics of network
configuration, registration, and probes; an
ExternalGroup
CRD that provides a principled way to manage multiple similar external applications (e.g. multiple replicas); and an autoregistration control plane component that ties the two together.
Other notable changes
- HTTP/2 keep-alive messages are now enabled by default for all meshed communication, allowing proxies to proactively detect connections that have been lost by the underlying network or OS. (linkerd2#12498 and linkerd2#12504)
- Linkerd CLI commands that output Kubernetes resources now support JSON output.
- The proxy’s
/shutdown
endpoint is now disabled by default, unless explicitly enabled. (linkerd2#12705) - HTTP headers are no longer logged in debug or trace output, unless explicitly enabled (linkerd2#12665)
- Resource requests for proxy-init now simple use those of the proxy, removing unnecessary configuration. (linkerd2#12741)
- A new control plane component, linkerd-enterprise, now handles BEL-specific functionality, and, if enabled, a new linkerd-autoregistration component handles automation of external workloads.
Bug fixes
- Fix a bug in the destination controller causing incorrect local traffic policy (linkerd2#12254)
- Fix a bug in the destination controller where it could drop updates when there were a large number of Server resources (linkerd2#12013)
- Fix panic in service mirror controller (linkerd2#12406)
- Fix bug where policy controller would continuously update HTTPRoute status (linkerd2#12454)
- Fix a bug where
backend_not_found
route status was being set incorrectly (linkerd2#12565) - Remove an internal limit on the number of concurrent gRPC streams to the control plane, leaving available memory as the only constraint (linkerd2#12598)
- Fix bug where the policy controller could get stuck with stale data when services change (linkerd2#12635)
- Fix bug where the policy controller could return incorrect initial data for HttpRoutes in a producer namespace (linkerd2#12619)
Other changes
Control plane
- Add support for loadBalancerCluster to multi-cluster gateway service (linkerd2#12116)
- Use the correct resources attribute values for repair-controller (linkerd2#12180)
- Parameterize PodDisruptionBudget config for Linkerd Control Plane components (linkerd2#11687)
- Change log level when err is
http.ErrServerClosed
(linkerd2#12167) - Fix excessive logging when encountering “unimplemented resource type” errors in injector webhook (linkerd2#12254)
- Add default values to
proxy-*-connect-timeout
annotations docs (linkerd2#12155) - Set proxy-injector, tap-injector and jaeger-injector mutating webhook rules scope to Namespaced (linkerd2#12195)
- Add support for metric_relabel_configs (linkerd2#12248)
- Fix function name in comment (linkerd2#12396) and (linkerd2#12512)
- Allow setting revisionHistoryLimit (linkerd2#12234)
- Avoid unnecessary headless endpoint mirrors cleanups during GC (linkerd2#12500)
- Make group ID configurable (linkerd2#11924)
- Fix crash in destination controller (linkerd2#12689)
- Clarify documentation on connectAddr (helm chart) (linkerd2#12827)
CLI
- Modify
linkerd policy generate
use audit-mode Server resources by default. Previous behavior can be enabled by using the--disable-audit
flag. - Introduce a new
linkerd policy generate --concurrency
flag to decrease time for policy generation - Fix typo in diagnostics command (linkerd2#12723)
Helm manifests
- Simplify license configuration. Setting
additionalEnv
values for custom licenses is no longer necessary. - Relax Helm chart release name constraints. Helm release names are no longer
required to be named
linkerd-crds
andlinkerd-control-plane
. - Default Policy controller resources to destination resources in manifest (linkerd2#12191)
CVE remediations and updates
- Update Go Docker dependency in
controller
for both non-FIPS and FIPS to remediate: CVE-2024-41110