Monitoring Linkerd

Central to monitoring Linkerd’s health is monitoring its metrics. Since the Linkerd control plane runs on the data plane, you can use the same metrics pipeline you’ve already set up.

As a starting point, we recommend monitoring:

  • Existence of control plane components. Each component needs to be running in order for Linkerd to function.
  • Success rate of control plane components. This should never drop below 100%; any failure responses are a sign that something is going wrong.
  • Latency of control plane components. These levels should be set empirically and unexpected changes should be investigated.
  • Optionally, resource consumption of control plane components. This also requires tuning, as some components scale in memory and CPU usage with the overall level of traffic passing through the mesh. However, rapid changes are worth investigating, and consumption that approaches any resource limit should be addressed before it becomes a problem.

Monitoring of Linkerd’s proxies should focus primarily on resource usage, since the golden metrics reported will be that of the application pod. As with control plane components, the exact thresholds will be dependent on the traffic to the pod, and thus alerting should focus on rapid changes, or on situations where consumption approaches resource limits.

You can view logs from Linkerd’s control plane or data plane through the usual kubectl logs command. For the control plane, both the main container and the linkerd-proxy container for each pod may deliver usable information.

By default the control plane’s log level is set at the INFO level, which surfaces various events of interest, plus warnings and errors. For diagnostic purposes, it may be helpful to raise log levels to DEBUG.

Similarly, by default, the proxy’s log level is set to INFO. The log level of a proxy can be modified at runtime if necessary. Note that debug mode can be extremely verbose, especially for high-traffic proxies. Care should be taken to change the level back to INFO after debugging, especially in environments where increased log usage may have a financial impact!

Buoyant Cloud users can use the Send Diagnostics feature to send metrics and log information direction to Buoyant for debugging purposes.