Fix linkerd policy generate command to work on with BEL proxies that have
custom image names.
Helm changes
Add support for configuring the timeout and failure thresholds for health
probes of the multicluster gateway
(linkerd2#13061)
Fix ability to set tolerations for the linkerd-autoregistration and
linkerd-enterprise workloads
Docker images and Helm packages are now signed. Learn
more.
Proxy changes
Fix a bug in which the linkerd2-proxy may panic if a response was received
before a request frame with the END_STREAM flag was sent
(linkerd2-proxy#3216)
For this release, the minimum supported Kubernetes version remains 1.22, and the
maximum supported Kubernetes version has been increased to 1.31.
Who should upgrade?
2.15.x users who use gRPC with retries should upgrade to this version, or to
2.16.1. All other users may upgrade at their convenience or skip this release.
Upgrade guidance
This is a stable point release designed to introduce minimal change. Please see
the instructions in Upgrading BEL for how to upgrade.
Fix linkerd policy generate command to work on with BEL proxies that have
custom image names.
Helm changes
Add support for configuring the timeout and failure thresholds for health
probes of the multicluster gateway
(linkerd2#13061)
Proxy changes
Fix a bug in which the linkerd2-proxy may panic if a response was received
before a request frame with the END_STREAM flag was sent
(linkerd2-proxy#3216)
CVE remediations and updates
Update Go Docker dependency in controller for both non-FIPS and FIPS to
remediate CVE-2024-41110
Linkerd 2.16 is a new major release that adds new retry, timeout, and per-route
metrics to HTTPRoute and GRPCRoute types, bringing Linkerd’s Gateway API
implementation to feature parity with ServiceProfiles and addressing some
long-standing wrinkles with these features. Linkerd 2.16 also adds support for
IPv6 and introduces an audit mode for Linkerd’s zero trust network policies.
Buoyant Enterprise for Linkerd 2.16.0 also introduces new external workload
automation to ease the management of VMs and other off-cluster workloads, and a
“send a flare” remote diagnostics CLI command.
There are several important notes before upgrading.
Gateway API resource management
Linkerd 2.16 requires the Gateway API CRDs to be installed on the cluster. These
CRDs can either be installed and managed by Linkerd, or they can be installed
and managed by another component on the system. (For example, GCP clusters may
already have these CRDs installed by default.)
If you want these CRDs to be managed by Linkerd (the default): proceed as
normal. Linkerd will install these CRDs for you and upgrade them as appropriate.
If the CRDs are managed by another component: Set the enableHttpRoutes
setting to “false” when upgrading or installing Linkerd. In this mode, Linkerd
will not touch these CRDs. Note that if the CRDs correspond to an earlier
version of the Gateway API that does not include the GRPCRoute CRD, Linkerd’s
GRPCRoute-related functionality will not be available, but Linkerd will
otherwise operate normally.
Breaking changes to shutdown endpoint
To mitigate CVE-2024-40632,
in which a meshed application that is already vulnerable to an SSRF attack may
also leave the proxy open to shutdown, the /shutdown endpoint is now disabled
by default. This endpoint is used to terminate the proxy programmatically,
typically by the linkerd-await command as part of a meshed Job or CronJob
workload.
While native sidecar support has reduced the need for this endpoint, it can be
re-enabled by setting proxy.enableShutdownEndpoint to “true”.
Docker runtime version incompatibilities
Due to an incompatibility between modern versions of glibc and old versions of
the Docker runtime engine, Linkerd 2.16 no longer supports Docker runtime
earlier than version 20.10.10. Attempting to run Linkerd with an old Docker
runtime will result in a proxy crash.
To determine whether you have an older Docker runtime, run the command
kubectl get node -o jsonpath="{.items[*].status.nodeInfo.containerRuntimeVersion}"
If the output is of the form docker://20.x.y, ensure the version is greater
than 20.10.10. If the output is of the form containerd://..., this issue
should not affect you.
Changes to log output
Prior to Linkerd 2.16, HTTP headers were logged by the proxy when the log level
was set to debug or trace. These headers may contain sensitive information
such as access tokens. As of Linkerd 2.16, these headers are no longer part of
log output in debug or trace modes. Header output can be reenabled by
setting the logHTTPHeaders configuration value to “insecure”.
Changelog
New features
Timeouts and retries can now be configured with Gateway API HTTPRoute and
GRPCRoute resources. Timeout and retry configuration can be placed as
annotations on HTTPRoutes, GRPCRoutes, or their parent Services to control
client behavior matching these resources.
Retries and timeouts configured with Gateway API resources have improved
semantics compared to those configured with ServiceProfiles: they can be
combined with circuit breaking, and requests that timeout are eligible for
retries. Learn more.
Linkerd now emits traffic metrics for HTTPRoute and GRPCRoute resources,
providing granular per-route telemetry about traffic matching those resources.
A new “audit mode” has been introduced that allows safer introduction of
authorization policy to live systems. When enabled, corresponding policy
violations will be logged but not enforced. Audit mode can be enabled on
specific policy resources and the policy generation command now uses audit
mode as the default.
Learn more.
Linkerd now features an IPv6 mode that supports both IPv6-only and dual-stack
clusters (using only IPv6 endpoints in the latter case). This mode is disabled
by default; to enable set disableIPv6:false.
Learn more.
BEL now provides significant automation to ease the management of external
workloads (e.g. applications running on VMs): This includes a “harness” that
runs alongside the application and handles the mechanics of network
configuration, registration, and probes; an ExternalGroup CRD that provides
a principled way to manage multiple similar external applications (e.g.
multiple replicas); and an autoregistration control plane component that ties
the two together.
Other notable changes
HTTP/2 keep-alive messages are now enabled by default for all meshed
communication, allowing proxies to proactively detect connections that have
been lost by the underlying network or OS.
(linkerd2#12498 and
linkerd2#12504)
Linkerd CLI commands that output Kubernetes resources now support JSON output.
The proxy’s /shutdown endpoint is now disabled by default, unless explicitly
enabled. (linkerd2#12705)
HTTP headers are no longer logged in debug or trace output, unless explicitly
enabled (linkerd2#12665)
Resource requests for proxy-init now simple use those of the proxy, removing
unnecessary configuration.
(linkerd2#12741)
A new control plane component, linkerd-enterprise, now handles BEL-specific
functionality, and, if enabled, a new linkerd-autoregistration component
handles automation of external workloads.
Bug fixes
Fix a bug in the destination controller causing incorrect local traffic policy
(linkerd2#12254)
Fix a bug in the destination controller where it could drop updates when there
were a large number of Server resources
(linkerd2#12013)
Fix bug where policy controller would continuously update HTTPRoute status
(linkerd2#12454)
Fix a bug where backend_not_found route status was being set incorrectly
(linkerd2#12565)
Remove an internal limit on the number of concurrent gRPC streams to the
control plane, leaving available memory as the only constraint
(linkerd2#12598)
Fix bug where the policy controller could get stuck with stale data when
services change
(linkerd2#12635)
Fix bug where the policy controller could return incorrect initial data for
HttpRoutes in a producer namespace
(linkerd2#12619)
Other changes
Control plane
Add support for loadBalancerCluster to multi-cluster gateway service
(linkerd2#12116)
Use the correct resources attribute values for repair-controller
(linkerd2#12180)
Parameterize PodDisruptionBudget config for Linkerd Control Plane components
(linkerd2#11687)
Change log level when err is http.ErrServerClosed
(linkerd2#12167)
Fix excessive logging when encountering “unimplemented resource type” errors
in injector webhook
(linkerd2#12254)
Add default values to proxy-*-connect-timeout annotations docs
(linkerd2#12155)
Set proxy-injector, tap-injector and jaeger-injector mutating webhook rules
scope to Namespaced
(linkerd2#12195)
Users who are experiencing panics in the destination controller, or who want
to run the CLI without setting the BUOYANT_LICENSE envvar, should upgrade.
Users who want to further secure their Linkerd installation by disabling the
/shutdown endpoint or by removing HTTP header content from debug logging,
should upgrade.
All other users may upgrade at their convenience or skip this release.
Remove requirement that CLI users must always set the BUOYANT_LICENSE
environment variable. Note that a license must still be provided to commands
that require it (e.g. install), either via the environment variable or the
--set license=... flag.
Improve error handling and timeout behavior in the linkerd license command
Control plane changes
Fix panic in the destination controller when reading endpoint hostname
(backported from
linkerd2#12689)
Proxy changes
Add config to disable proxy /shutdown admin endpoint (backported from
linkerd2#12705). When
enabled, this remediates
CVE-2024-40632.
Add config to disable outputting HTTP headers by default in proxy debug logs
(backported from
linkerd2#12665)
Mesh expansion changes
Remove empty shortnames from ExternalWorkload (backported from
linkerd2#12793)
CVE remediations and updates
Update extension-init, policy-controller, and proxy base images to remediate
CVE-2023-5678 (first fixed
in hotpatch enterprise-2.15.4-1)
Update extension-init, policy-controller, and proxy base images to remediate
CVE-2023-6129 (first fixed
in hotpatch enterprise-2.15.4-1)
Update extension-init, policy-controller, and proxy base images to remediate
CVE-2024-0727 (first fixed
in hotpatch enterprise-2.15.4-1)
Users who are seeing OOMKills in the linkerd-destination service at scale
should upgrade. This release improves memory of the destination controller at
scale.
Users who are using HTTPRoutes should upgrade. This release fixes several
issues, including issues that may cause routing to fail sporadically.
Users who have to unset an existing ENVIRONMENT environment variable to use
the Linkerd CLI may upgrade to avoid this issue.
All other users may upgrade at their convenience or skip this release.
Fix an issue where linkerd install-cni was outputting an invalid image URL
Fix an issue where the CLI was reading configuration information from an
ENVIRONMENT envvar, which was sometimes already set in customer
environments. The CLI no longer uses this variable.
Add a new --token flag to the linkerd diagnostics policy command, to allow
users to see the policy from the perspective of a a specific Kubernetes
context (backported from
linkerd2#12613)
Control plane changes
Remove unnecessary stream concurrency limits (backported from
linkerd2#12598)
Allow control plane components to specify concurrency (backported from
linkerd2#12643)
Fix issue where initial outbound policy did not contain producer routes
(backported from
linkerd2#12619)
Set backend_not_found route status when any backends are not found
(backported from
linkerd2#12565)
Reindex outbound policy backends when a service changes (backported from
linkerd2#12635)
CVE remediations and updates
Update busybox in proxy-init Docker image to remediate
CVE-2023-42364
Update busybox in proxy-init Docker image to remediate
CVE-2023-42365
Update the default Docker image user to be non-root, which was occasionally
being flagged by overly pedantic vulnerability scanners
The 2.15.3 stable point release includes a variety of bug fixes, usability
improvements, and new diagnostic and configuration features. It also adjusts the
default configuration of the HAZL load balancer to be more aggressive
in shifting load to other zones.
Users who are using native sidecars should upgrade. This release contains
several bugfixes related to native sidecars.
Users who are making heavy use of HTTPRoutes, or who are experiencing high
memory usage in the policy controller accompanied by “Failed to patch
HTTPRoute” error messages, should upgrade. This release fixes an issue with
how the policy controller was interacting with the Kubnetes API for
HTTPRoutes.
Users who are using multicluster should upgrade. This release fixes a panic in
the service mirror controller as well as another minor issue.
Note that in this release, we’ve moved the on-cluster storage for license keys
from ConfigMaps to Secrets. Users with license keys in ConfigMaps will be
automatically upgraded to a Secret. For more information on managing licenses,
see Configuring license secret
installation.
Print license information to stderr instead of stdout
Install version edge-24.2.4 of viz and jaeger extensions, rather than pointing
to non-existing BEL versions
Remove the need to include the --set license= flag on install commands
Add a diagnostics profile command (backported from
linkerd2#12383)
Helm chart changes
Correct the minimum supported Kubernetes version in the BEL Helm charts to
1.22 (not 1.21)
Support arbitrary proxy parameters in Helm values (backported from
linkerd2#12493)
Control plane changes
Move license storage from a ConfigMap to a Secret
Revert HAZL default load band parameters to the configuration used in BEL
2.15.1 and earlier, allowing HAZL to be more aggressive in shifting to other
zones by default
Update HTTPRoutes CRD to include a port field in the route status parent ref
(backported from
linkerd2#12454)
Fix multiple issues with native sidecars (backported from
linkerd2#12453)
Update policy controller to rename “patchs” metric to “patches” (backported
from linkerd2#12533)
Extension changes
Fix panic in mulitcluster service mirror controller (backported from
linkerd2#12406)
Avoid unnecessary headless endpoint mirrors cleanups during GC (backported
from linkerd2#12500)
Proxy changes
Clear balancer endpoint gauges on teardown (backported from
linkerd2-proxy#2928)
The 2.15.2 stable point release includes bug fixes, CVE remediations, and some
minor feature updates. It merges HAZL into the main proxy build (previous
releases required a separate build), improves certain metrics, and fixes a
memory leak in the policy controller.
Updated Helm charts to synchronize with the newer
AdditionalEnv and
AdditionalArgs values,
allowing users to enable features such as HAZL (when available) with the newer
terminology.
Linkerd 2.15 is a new major release that adds support for workloads outside of
Kubernetes. This new “mesh expansion” feature allows Linkerd users to bring
applications running on VMs, physical machines, and other non-Kubernetes
locations into the mesh.
Linkerd 2.15 also introduces support for SPIFFE, a standard for workload
identity which allows Linkerd to provide cryptographic identity and
authentication to off-cluster workloads.
Finally, Linkerd 2.15 adds support for native sidecar containers, a new
Kubernetes feature that eases some of the long-standing annoyances of the
sidecar model in Kubernetes, especially with Job workloads.
This is a feature release that unlocks new capabilities. Users with
non-Kubernetes workloads that they want to add to the mesh, or users who want to
use Kubernetes 1.29, should upgrade.
Users with Job workloads, init container race conditions, or other situations
that would benefit from native sidecar support, can upgrade to simplify their
usage of Linkerd. Native sidecar support can obviate the need for
linkerd-await in Job workloads
and can allow Linkerd to work well with other init containers.
This release changes the minimum supported Kubernetes version to 1.22, and the
update the maximum supported Kubernetes version to 1.29.
Changelog
Native sidecar containers
Introduced support for the native sidecar containers entering beta support in
Kubernetes 1.29, improving the startup and shutdown ordering for the proxy
relative to other containers, and fixing the long-standing shutdown issue with
injected Jobs. Furthermore, traffic from other initContainers can now be
proxied by Linkerd (#11465;
fixes #11461).
Mesh expansion
Introduced a new ExternalWorkload CRD to support enrolling VMs into a meshed
Kubernetes cluster
Introduced a new controller in the destination service that will manage
EndpointSlice resources for Service objects that select external workloads
Control Plane
Fixed policy controller error when deleting a Gateway API HTTPRoute resource
(#11471)
Fixed an issue where the Destination controller could stop processing service
profile updates, if a proxy subscribed to those updates stops reading them
(#11546)
Fixed an issue in the destination controller where the metadata API would not
initialize a Job informer. The destination controller uses the metadata API
to retrieve Job metadata, and relies mostly on informers. Without an
initialized informer, an error message would be logged, and the controller
relied on direct API calls
(#11541; fixes
#11531)
Fixed an issue in the destination controller that could cause outbound
connections to hang indefinitely.
(#11540 and
#11556)
In the Destination controller, added informer lag histogram metrics to track
whenever the Kubernetes objects watched by the controller are falling behind
the state in the kube-apiserver
(#11534)
Changed how the policy controller updates HTTPRoute status so that it doesn’t
affect statuses from other non-linkerd controllers
(#11705; fixes
#11659)
Added a control plane metric to count errors talking to the Kubernetes API
(#11774)
Fixed an issue causing spurious destination controller error messages for
profile lookups on unmeshed pods with port in default opaque list
(#11550)
Changed how Server updates are handled in the destination service. The
change will ensure that during a cluster resync, consumers won’t be overloaded
by redundant updates
(#11907)
Updated the Destination controller to return INVALID_ARGUMENT status codes
properly when a ServiceProfile is requested for a service that does not
exist. (#11980)
Changed how updates to a Server selector are handled in the destination
service. When a Server that marks a port as opaque no longer selects a
resource, the resource’s opaqueness will reverted to default settings
(#12031; fixes
#11995)
Removed uses of OpenSSL v1.1.1 in favor of OpenSSL v3
Fixed a bug in the GetProfile API where the destination controller could
become stuck and stop serving discovery responses.
Improved proxy logging so that all service discovery updates are logged at
INFO.
Added externalWorkloadSelector to the Server resource to fascilitate
policy for ExternalWorkloads`
#11899
Added queue metrics to endpoints controller workqueue
#11958
Implemented handling of EndpointSlices that point ExternalWorkload
resources #11939
Enabled support for SPIFFE IDs in MeshTLSAuthentication#11882
Fixed a race condition in the destination service that could cause panics
under very specific conditions
#12022; fixes
#12010
Proxy
Improved the load balancer so that service discovery updates are processed
eagerly, ensuring that low-traffic services do not retain connections to
defunct endpoints.
The proxy’s control plane clients now limit time they maintain discovery
streams to ensure that load is more evenly distributed across control plane
instances.
The proxy’s control plane clients now limit idle discovery streams to help
prevent staleness.
Added a variety of load-balancer specific metrics to measure the queue’s state
Updated the hyper and h2 dependencies to address bugs
Fixed an issue where the default queue sizes were too small and could cause
issues in high-traffic services
Fixed an issue where the ‘classification’ label was incorrectly applied on
inbound gRPC traffic
Multi-cluster extension
Fixed a "duplicate metrics" warning in the multicluster service-mirror
component #11875; fixes
#11839
Fixed broken affinity rules for the multicluster service-mirror when running
in HA mode
Added a new check to linkerd check that ensures all extension namespaces are
configured properly
Fixed a bug where the linkerd multicluster link command’s
--gateway-addresses flag was not respected when a remote gateway exists
Fixed an issue where an empty remoteDiscoverySelector field in a
multicluster link would cause all services to be mirrored
Jaeger extension
Extended linkerd-jaeger’s imagePullSecrets Helm value to also apply to the
namespace-metadata ServiceAccount
#11504
Viz extension
Improved linkerd viz check to attempt to validate that the Prometheus scrape
interval will work well with the CLI and Web query parameters
(#11376)
Service Profiles
Fixed an issue in the ServiceProfile CRD schema. The schema incorrectly
required that a not response match should be an array, which the service
profile validator rejected since it expected an object. The schema has been
updated to properly indicate that not values should be an object
(#11510; fixes
#11483)
Fixed an issue where trailing slashes wouldn’t be stripped when generating
ServiceProfile resources through linkerd profile --open-api
(#11519)
CLI
Improved CLI error handling to print differentiated error information when
versioncheck.linkerd.io cannot be resolved
(#11377; fixes
#11349)
Introduced a new multicluster check --timeout flag to limit the time allowed
for Kubernetes API calls
(#11420; fixes
#11266)
Changed linkerd install error output to add a newline when a Kubernetes
client cannot be successfully initialised
(#11917)
Manifests config / Helm
Introduced Helm configuration values for liveness and readiness probe timeouts
and delays (#11458; fixes
#11453)
Added probes to the debug container to appease environments requiring probes
for all containers (#11308;
fixes #11307)
Added a prometheusUrl field for the heartbeat job in the control plane Helm
chart (#11343; fixes
#11342)
Added a createNamespaceMetadataJob Helm value to control whether the
namespace-metadata job is run during install
(#11782)
Added a podAnnotations Helm value to allow adding additional annotations to
the Linkerd-Viz Prometheus Deployment
(#11374; fixes
#11365)
Added namespaceSelector fields for the tap-injector and jaeger-injector
webhooks. The webhooks are now configured to skip kube-system by default
(#11649; fixes
#11647)