Enabling High Availability Zonal Load Balancing (HAZL)

Buoyant Enterprise for Linkerd provides an opt-in High Availability Zonal Load balancer option that intelligently balances HTTP and gRPC traffic with built-in zone awareness in environments with multiple availability zones.

This feature is disabled by default and must be explicitly enabled in the configuration.

  • For preserving zone locality in multi-cluster situations, HAZL currently requires pod-to-pod multicluster and does not work with gateway-based multicluster deployments.
  • HAZL-enabled workloads will dynamically shift traffic out of the current zone any time the in-zone endpoint’s capability to handle the incoming request load is exceeded and threatens to impact the success rate of the requests.

Your cluster will need a minimum of three nodes, with the following labels applied:

  • topology.kubernetes.io/zone=zone-0
  • topology.kubernetes.io/zone=zone-1
  • topology.kubernetes.io/zone=zone-2

You will need a minimum of one node with each of the labels for the colorz application to deploy properly.

If you need a cluster with Buoyant Enterprise for Linkerd installed, you can use k3d to deploy one using the assets and instructions in the Deploying BEL with HAZL GitHub repository. Following the instructions in the repository, the cluster will have properly labeled nodes for the three zones, with Buoyant Enterprise for Linkerd deployed.

If you do not already have meshed workloads on your cluster, you can install and mesh the sample colorz app to test HAZL:

kubectl apply -k https://github.com/BuoyantIO/service-mesh-academy/deploying-bel-with-hazl/colorz/

Confirm all pods are running, meshed, and distributed across different nodes by running the following command:

kubectl get pods -n colorz -o wide

You should see the color pods distributed one-per-node, with the brush pod sharing a node with one of the color pods.

Before we enable HAZL we need to gather baseline zone distribution metrics to compare with metrics after HAZL activation. To do so, run the following command:

linkerd dg proxy-metrics deploy/brush -n colorz | \
grep 'request_total' | \
awk -F'[ ,}]' '{
    pod="";
    zone="";
    requests=0;
    for(i=1;i<=NF;i++) {
        if($i ~ /^dst_pod=/) pod=substr($i, 9);
        else if($i ~ /^dst_zone=/) zone=substr($i, 10);
        else if($i ~ /^[0-9]+$/) requests=$i
    }
    if($0 ~ /direction="outbound"/ && $0 ~ /tls="true"/ && pod != "" && zone != "")
        print "Pod: " pod " | Zone: " zone ": " requests
}'

You should see an output similar to this:

Pod: "green-8f8d79cb8-wmnr4" | Zone: "zone-1": 72483
Pod: "blue-657dcc787c-kgpsb" | Zone: "zone-2": 69010
Pod: "red-6d6b7c4f8f-ktg65"  | Zone: "zone-0": 66588

Note: If you’re using your own application for testing you will need to adjust the deployment name and namespace in the command above to reference your client’s deployment name and namespace.

To enable HAZL you will need to modify your Helm values file (if deployed with Helm) or your ControlPlane CRD (if deployed using the Buoyant Lifecycle Operator) like so:

controlPlaneConfig:
  ...
  destinationController:
    additionalArgs:
    - -ext-endpoint-zone-weights # This is the flag that enables HAZL - note the intentional double dash
  proxy:
    image:
      version: enterprise-2.15.2
    additionalEnv:
      - name: BUOYANT_BALANCER_LOAD_LOW
        value: "0.8"
      - name: BUOYANT_BALANCER_LOAD_HIGH
        value: "2.0"

Save the file and apply it to your cluster. After applying the new configuration you will want to wait for the updated control plane to come up, then run the following to restart the application proxies and reset metrics:

kubectl rollout restart deployment blue red green -n colorz && sleep 30 && kubectl rollout restart deploy brush -n colorz
You will not have to restart your application proxies in an actual production environment after the HAZL flag is enabled and the control plane is rolled. We are restarting them here and in a specific order to more clearly and visibly demonstrate the difference in request distribution via the metrics we’re comparing, but it is not a requirement for HAZL itself.

To confirm the settings have been applied confirm that new control-plane components have been created, and check the running configuration for the ext-endpoint-zone-weights flag like so:

kubectl get configmap linkerd-config -n linkerd -o yaml | grep 'ext-endpoint-zone-weights'

After confirming the configuration has been successfully applied we will need to confirm our new application pod distribution with the following command:

kubectl get pods -n colorz -o wide

Note which color pod shares a node with the new brush pod this time.

NAME                     READY   STATUS    RESTARTS   AGE    IP           NODE                       NOMINATED NODE   READINESS GATES
blue-5bbc87784c-bt4nd    2/2     Running   0          100s   10.42.2.11   k3d-demo-cluster-agent-2   <none>           <none>
red-6774947bbc-kvjn9     2/2     Running   0          100s   10.42.0.16   k3d-demo-cluster-agent-0   <none>           <none>
green-f4dccb54b-rpxpv    2/2     Running   0          100s   10.42.1.15   k3d-demo-cluster-agent-1   <none>           <none>
brush-6dd6db689f-x4474   2/2     Running   0          70s    10.42.0.17   k3d-demo-cluster-agent-0   <none>           <none>

We can now dump the metrics again and compare with the pre-HAZL ones we gathered previously:

linkerd dg proxy-metrics deploy/brush -n colorz | \
grep 'request_total' | \
awk -F'[ ,}]' '{
    pod="";
    zone="";
    requests=0;
    for(i=1;i<=NF;i++) {
        if($i ~ /^dst_pod=/) pod=substr($i, 9);
        else if($i ~ /^dst_zone=/) zone=substr($i, 10);
        else if($i ~ /^[0-9]+$/) requests=$i
    }
    if($0 ~ /direction="outbound"/ && $0 ~ /tls="true"/ && pod != "" && zone != "")
        print "Pod: " pod " | Zone: " zone ": " requests
}'

You should see an output similar to this:

Pod: "red-6774947bbc-kvjn9" | Zone: "zone-0": 3749

With HAZL enabled all requests are now routing to in-zone backend endpoints and we’ve completely eliminated all cross-zone traffic.

For more information about using this feature, see the HAZL reference page.