enterprise-2.15.0
February 21, 2024
Linkerd 2.15 is a new major release that adds support for workloads outside of Kubernetes. This new “mesh expansion” feature allows Linkerd users to bring applications running on VMs, physical machines, and other non-Kubernetes locations into the mesh.
Linkerd 2.15 also introduces support for SPIFFE, a standard for workload identity which allows Linkerd to provide cryptographic identity and authentication to off-cluster workloads.
Finally, Linkerd 2.15 adds support for native sidecar containers, a new Kubernetes feature that eases some of the long-standing annoyances of the sidecar model in Kubernetes, especially with Job workloads.
See the BEL 2.15 announcement blog post for more details.
Who should upgrade?
This is a feature release that unlocks new capabilities. Users with non-Kubernetes workloads that they want to add to the mesh, or users who want to use Kubernetes 1.29, should upgrade.
Users with Job workloads, init container race conditions, or other situations that would benefit from native sidecar support, can upgrade to simplify their usage of Linkerd. Native sidecar support can obviate the need for linkerd-await in Job workloads and can allow Linkerd to work well with other init containers.
How to upgrade
To upgrade with BEL’s lifecycle automation operator, you will need Buoyant Extension version v0.27.1 or later.
Kubernetes version support
This release changes the minimum supported Kubernetes version to 1.22, and the update the maximum supported Kubernetes version to 1.29.
Changelog
Native sidecar containers
- Introduced support for the native sidecar containers entering beta support in
Kubernetes 1.29, improving the startup and shutdown ordering for the proxy
relative to other containers, and fixing the long-standing shutdown issue with
injected
Job
s. Furthermore, traffic from otherinitContainer
s can now be proxied by Linkerd (#11465; fixes #11461).
Mesh expansion
- Introduced a new
ExternalWorkload
CRD to support enrolling VMs into a meshed Kubernetes cluster - Introduced a new controller in the destination service that will manage
EndpointSlice
resources forService
objects that select external workloads
Control Plane
- Fixed policy controller error when deleting a Gateway API HTTPRoute resource (#11471)
- Fixed an issue where the Destination controller could stop processing service profile updates, if a proxy subscribed to those updates stops reading them (#11546)
- Fixed an issue in the destination controller where the metadata API would not
initialize a
Job
informer. The destination controller uses the metadata API to retrieveJob
metadata, and relies mostly on informers. Without an initialized informer, an error message would be logged, and the controller relied on direct API calls (#11541; fixes #11531) - Fixed an issue in the destination controller that could cause outbound connections to hang indefinitely. (#11540 and #11556)
- In the Destination controller, added informer lag histogram metrics to track whenever the Kubernetes objects watched by the controller are falling behind the state in the kube-apiserver (#11534)
- Changed how the policy controller updates HTTPRoute status so that it doesn’t affect statuses from other non-linkerd controllers (#11705; fixes #11659)
- Added a control plane metric to count errors talking to the Kubernetes API (#11774)
- Fixed an issue causing spurious destination controller error messages for profile lookups on unmeshed pods with port in default opaque list (#11550)
- Changed how
Server
updates are handled in the destination service. The change will ensure that during a cluster resync, consumers won’t be overloaded by redundant updates (#11907) - Updated the Destination controller to return
INVALID_ARGUMENT
status codes properly when aServiceProfile
is requested for a service that does not exist. (#11980) - Changed how updates to a
Server
selector are handled in the destination service. When aServer
that marks a port as opaque no longer selects a resource, the resource’s opaqueness will reverted to default settings (#12031; fixes #11995) - Removed uses of OpenSSL v1.1.1 in favor of OpenSSL v3
- Fixed a bug in the GetProfile API where the destination controller could become stuck and stop serving discovery responses.
- Improved proxy logging so that all service discovery updates are logged at INFO.
- Added
externalWorkloadSelector
to theServer
resource to fascilitate policy for ExternalWorkloads` #11899 - Added queue metrics to endpoints controller workqueue #11958
- Implemented handling of
EndpointSlices
that pointExternalWorkload
resources #11939 - Enabled support for SPIFFE IDs in
MeshTLSAuthentication
#11882 - Fixed a race condition in the destination service that could cause panics under very specific conditions #12022; fixes #12010
Proxy
- Improved the load balancer so that service discovery updates are processed eagerly, ensuring that low-traffic services do not retain connections to defunct endpoints.
- The proxy’s control plane clients now limit time they maintain discovery streams to ensure that load is more evenly distributed across control plane instances.
- The proxy’s control plane clients now limit idle discovery streams to help prevent staleness.
- Added a variety of load-balancer specific metrics to measure the queue’s state
- Updated the hyper and h2 dependencies to address bugs
- Fixed an issue where the default queue sizes were too small and could cause issues in high-traffic services
- Fixed an issue where the ‘classification’ label was incorrectly applied on inbound gRPC traffic
Multi-cluster extension
- Fixed a
"duplicate metrics"
warning in the multicluster service-mirror component #11875; fixes #11839 - Fixed broken affinity rules for the multicluster service-mirror when running in HA mode
- Added a new check to
linkerd check
that ensures all extension namespaces are configured properly - Fixed a bug where the
linkerd multicluster link
command’s--gateway-addresses
flag was not respected when a remote gateway exists - Fixed an issue where an empty
remoteDiscoverySelector
field in a multicluster link would cause all services to be mirrored
Jaeger extension
- Extended
linkerd-jaeger
’simagePullSecrets
Helm value to also apply to thenamespace-metadata
ServiceAccount #11504
Viz extension
- Improved
linkerd viz check
to attempt to validate that the Prometheus scrape interval will work well with the CLI and Web query parameters (#11376)
Service Profiles
- Fixed an issue in the
ServiceProfile
CRD schema. The schema incorrectly required that anot
response match should be an array, which the service profile validator rejected since it expected an object. The schema has been updated to properly indicate thatnot
values should be an object (#11510; fixes #11483) - Fixed an issue where trailing slashes wouldn’t be stripped when generating
ServiceProfile
resources throughlinkerd profile --open-api
(#11519)
CLI
- Improved CLI error handling to print differentiated error information when versioncheck.linkerd.io cannot be resolved (#11377; fixes #11349)
- Introduced a new
multicluster check --timeout
flag to limit the time allowed for Kubernetes API calls (#11420; fixes #11266) - Changed
linkerd install
error output to add a newline when a Kubernetes client cannot be successfully initialised (#11917)
Manifests config / Helm
- Introduced Helm configuration values for liveness and readiness probe timeouts and delays (#11458; fixes #11453)
- Added probes to the debug container to appease environments requiring probes for all containers (#11308; fixes #11307)
- Added a
prometheusUrl
field for the heartbeat job in the control plane Helm chart (#11343; fixes #11342) - Added a
createNamespaceMetadataJob
Helm value to control whether the namespace-metadata job is run during install (#11782) - Added a
podAnnotations
Helm value to allow adding additional annotations to the Linkerd-Viz Prometheus Deployment (#11374; fixes #11365) - Added
namespaceSelector
fields for the tap-injector and jaeger-injector webhooks. The webhooks are now configured to skipkube-system
by default (#11649; fixes #11647) - Added a validating webhook config for httproutes.gateway.networking.k8s.io resources (#11150; fixes #11116)
- Added support for config merge and Deployment environment to
opentelemetry-collector
in the jaeger extension (#11283) - Introduced support for arbitrary labels in the
podMonitors
field in the control plane Helm chart (#11222; fixes #11175) - Introduced
PodDisruptionBudgets
in the linkerd-viz Helm chart for tap and tap-injector (#11628; fixes #11248) - Allowed the
MutatingWebhookConfig
timeout value to be configured (#12028; fixes #12011) - Added
nodeAffinity
todeployment
templates in thelinkerd-viz
andlinkerd-jaeger
Helm charts (#11464; fixes #10680)