Buoyant Enterprise for Linkerd

enterprise-2.15.0

February 21, 2024

Linkerd 2.15 is a new major release that adds support for workloads outside of Kubernetes. This new “mesh expansion” feature allows Linkerd users to bring applications running on VMs, physical machines, and other non-Kubernetes locations into the mesh.

Linkerd 2.15 also introduces support for SPIFFE, a standard for workload identity which allows Linkerd to provide cryptographic identity and authentication to off-cluster workloads.

Finally, Linkerd 2.15 adds support for native sidecar containers, a new Kubernetes feature that eases some of the long-standing annoyances of the sidecar model in Kubernetes, especially with Job workloads.

See the BEL 2.15 announcement blog post for more details.

Who should upgrade?

This is a feature release that unlocks new capabilities. Users with non-Kubernetes workloads that they want to add to the mesh, or users who want to use Kubernetes 1.29, should upgrade.

Users with Job workloads, init container race conditions, or other situations that would benefit from native sidecar support, can upgrade to simplify their usage of Linkerd. Native sidecar support can obviate the need for linkerd-await in Job workloads and can allow Linkerd to work well with other init containers.

How to upgrade

To upgrade with BEL’s lifecycle automation operator, you will need Buoyant Extension version v0.27.1 or later.

Kubernetes version support

This release changes the minimum supported Kubernetes version to 1.22, and the update the maximum supported Kubernetes version to 1.29.

Changelog

Native sidecar containers

  • Introduced support for the native sidecar containers entering beta support in Kubernetes 1.29, improving the startup and shutdown ordering for the proxy relative to other containers, and fixing the long-standing shutdown issue with injected Jobs. Furthermore, traffic from other initContainers can now be proxied by Linkerd (#11465; fixes #11461).

Mesh expansion

  • Introduced a new ExternalWorkload CRD to support enrolling VMs into a meshed Kubernetes cluster
  • Introduced a new controller in the destination service that will manage EndpointSlice resources for Service objects that select external workloads

Control Plane

  • Fixed policy controller error when deleting a Gateway API HTTPRoute resource (#11471)
  • Fixed an issue where the Destination controller could stop processing service profile updates, if a proxy subscribed to those updates stops reading them (#11546)
  • Fixed an issue in the destination controller where the metadata API would not initialize a Job informer. The destination controller uses the metadata API to retrieve Job metadata, and relies mostly on informers. Without an initialized informer, an error message would be logged, and the controller relied on direct API calls (#11541; fixes #11531)
  • Fixed an issue in the destination controller that could cause outbound connections to hang indefinitely. (#11540 and #11556)
  • In the Destination controller, added informer lag histogram metrics to track whenever the Kubernetes objects watched by the controller are falling behind the state in the kube-apiserver (#11534)
  • Changed how the policy controller updates HTTPRoute status so that it doesn’t affect statuses from other non-linkerd controllers (#11705; fixes #11659)
  • Added a control plane metric to count errors talking to the Kubernetes API (#11774)
  • Fixed an issue causing spurious destination controller error messages for profile lookups on unmeshed pods with port in default opaque list (#11550)
  • Changed how Server updates are handled in the destination service. The change will ensure that during a cluster resync, consumers won’t be overloaded by redundant updates (#11907)
  • Updated the Destination controller to return INVALID_ARGUMENT status codes properly when a ServiceProfile is requested for a service that does not exist. (#11980)
  • Changed how updates to a Server selector are handled in the destination service. When a Server that marks a port as opaque no longer selects a resource, the resource’s opaqueness will reverted to default settings (#12031; fixes #11995)
  • Removed uses of OpenSSL v1.1.1 in favor of OpenSSL v3
  • Fixed a bug in the GetProfile API where the destination controller could become stuck and stop serving discovery responses.
  • Improved proxy logging so that all service discovery updates are logged at INFO.
  • Added externalWorkloadSelector to the Server resource to fascilitate policy for ExternalWorkloads` #11899
  • Added queue metrics to endpoints controller workqueue #11958
  • Implemented handling of EndpointSlices that point ExternalWorkload resources #11939
  • Enabled support for SPIFFE IDs in MeshTLSAuthentication #11882
  • Fixed a race condition in the destination service that could cause panics under very specific conditions #12022; fixes #12010

Proxy

  • Improved the load balancer so that service discovery updates are processed eagerly, ensuring that low-traffic services do not retain connections to defunct endpoints.
  • The proxy’s control plane clients now limit time they maintain discovery streams to ensure that load is more evenly distributed across control plane instances.
  • The proxy’s control plane clients now limit idle discovery streams to help prevent staleness.
  • Added a variety of load-balancer specific metrics to measure the queue’s state
  • Updated the hyper and h2 dependencies to address bugs
  • Fixed an issue where the default queue sizes were too small and could cause issues in high-traffic services
  • Fixed an issue where the ‘classification’ label was incorrectly applied on inbound gRPC traffic

Multi-cluster extension

  • Fixed a "duplicate metrics" warning in the multicluster service-mirror component #11875; fixes #11839
  • Fixed broken affinity rules for the multicluster service-mirror when running in HA mode
  • Added a new check to linkerd check that ensures all extension namespaces are configured properly
  • Fixed a bug where the linkerd multicluster link command’s --gateway-addresses flag was not respected when a remote gateway exists
  • Fixed an issue where an empty remoteDiscoverySelector field in a multicluster link would cause all services to be mirrored

Jaeger extension

  • Extended linkerd-jaeger’s imagePullSecrets Helm value to also apply to the namespace-metadata ServiceAccount #11504

Viz extension

  • Improved linkerd viz check to attempt to validate that the Prometheus scrape interval will work well with the CLI and Web query parameters (#11376)

Service Profiles

  • Fixed an issue in the ServiceProfile CRD schema. The schema incorrectly required that a not response match should be an array, which the service profile validator rejected since it expected an object. The schema has been updated to properly indicate that not values should be an object (#11510; fixes #11483)
  • Fixed an issue where trailing slashes wouldn’t be stripped when generating ServiceProfile resources through linkerd profile --open-api (#11519)

CLI

  • Improved CLI error handling to print differentiated error information when versioncheck.linkerd.io cannot be resolved (#11377; fixes #11349)
  • Introduced a new multicluster check --timeout flag to limit the time allowed for Kubernetes API calls (#11420; fixes #11266)
  • Changed linkerd install error output to add a newline when a Kubernetes client cannot be successfully initialised (#11917)

Manifests config / Helm

  • Introduced Helm configuration values for liveness and readiness probe timeouts and delays (#11458; fixes #11453)
  • Added probes to the debug container to appease environments requiring probes for all containers (#11308; fixes #11307)
  • Added a prometheusUrl field for the heartbeat job in the control plane Helm chart (#11343; fixes #11342)
  • Added a createNamespaceMetadataJob Helm value to control whether the namespace-metadata job is run during install (#11782)
  • Added a podAnnotations Helm value to allow adding additional annotations to the Linkerd-Viz Prometheus Deployment (#11374; fixes #11365)
  • Added namespaceSelector fields for the tap-injector and jaeger-injector webhooks. The webhooks are now configured to skip kube-system by default (#11649; fixes #11647)
  • Added a validating webhook config for httproutes.gateway.networking.k8s.io resources (#11150; fixes #11116)
  • Added support for config merge and Deployment environment to opentelemetry-collector in the jaeger extension (#11283)
  • Introduced support for arbitrary labels in the podMonitors field in the control plane Helm chart (#11222; fixes #11175)
  • Introduced PodDisruptionBudgets in the linkerd-viz Helm chart for tap and tap-injector (#11628; fixes #11248)
  • Allowed the MutatingWebhookConfig timeout value to be configured (#12028; fixes #12011)
  • Added nodeAffinity to deployment templates in the linkerd-viz and linkerd-jaeger Helm charts (#11464; fixes #10680)