Troubleshooting BEL's lifecycle automation

This page provides resolution steps for common problems with BEL’s lifecycle automation for Linkerd.

The linkerd-control-plane-operator Deployment manages installs and updates of the BEL control plane. It reads a ControlPlane resource and attempts to reconcile its desired state with the current state of BEL.

If an installation or update fails, the ControlPlane resource will change to a Failed state, and linkerd-control-plane-operator will halt operation. The steps below will help you diagnose and resolve the issue.

To quickly check the status of the ControlPlane resource, run kubectl get controlplane. Here is an example of a resource in a failed state:

$ kubectl get controlplane
NAME                    STATUS   DESIRED               CURRENT             AGE
linkerd-control-plane   Failed   bad-version-0.00.00   enterprise-2.15.5   9m48s

For more detailed information, check the status field of the ControlPlane resource:

$ kubectl get controlplane/linkerd-control-plane --output=yaml
  current: enterprise-2.15.5
  desired: bad-version-0.00.00
  lastUpdateAttempt: "2024-01-01T00:00:00Z"
  lastUpdateAttemptMessage: 'control-plane update failed: target version [bad-version-0.00.00]
    parsing error: invalid Linkerd version: bad-version-0.00.00'
  lastUpdateAttemptResult: Failed
  status: Failed

Note the lastUpdateAttemptMessage field communicates the error message from the most recent update. In this example, we have specified an invalid version string, bad-version-0.00.00. To fix, update the ControlPlane resource with a valid version string.

Under the hood, the linkerd-control-plane-operator uses Helm to install BEL. Like all Helm-based installs of Linkerd, you should expect to see two Helm releases, linkerd-crds, and linkerd-control-plane:

helm --namespace=linkerd list

You should see output similar to this:

NAME                 	NAMESPACE	REVISION	UPDATED                                	STATUS  	CHART                                  	APP VERSION
linkerd-control-plane	linkerd  	1       	2024-01-01 00:00:00.123456789 +0000 UTC	deployed	linkerd-enterprise-control-plane-2.15.x	enterprise-2.15.x
linkerd-crds         	linkerd  	1       	2024-01-01 00:00:00.123456789 +0000 UTC	deployed	linkerd-enterprise-crds-2.15.x         	enterprise-2.15.x

To view the state of the Linkerd control plane directly, check the pods running in the linkerd namespace:

kubectl --namespace=linkerd get po

You should see output similar to this:

NAME                                     READY   STATUS    RESTARTS   AGE
linkerd-destination-9547b64d4-wwvqw      4/4     Running   0          54m
linkerd-identity-5b65c58b77-97l6m        2/2     Running   0          54m
linkerd-proxy-injector-f84bf6bfb-x5g6f   2/2     Running   0          54m

For more detailed information about a failed operation, check the logs of the linkerd-control-plane-operator:

kubectl --namespace=linkerd-buoyant logs deploy/linkerd-control-plane-operator --container=linkerd-control-plane-operator

You may find log lines similar to this:

time="2024-01-01T00:00:00Z" level=error msg="target version [bad-version-0.00.00] parsing error: invalid Linkerd version: bad-version-0.00.00"

Note that the linkerd-control-plane-operator uses an exponential backoff when retrying a failed update. If you have resolved the issue and want to retry the update immediately, restart the operator:

kubectl --namespace=linkerd-buoyant rollout restart deploy/linkerd-control-plane-operator

For more information about using these components, see the Lifecycle automation reference page.