High Availability Zonal Load Balancing (HAZL)

High Availability Zonal Load Balancing (HAZL) is a dynamic request-level load balancer in Buoyant Enterprise for Linkerd that balances HTTP and gRPC traffic in environments with multiple availability zones. For Kubernetes clusters deployed across multiple zones, HAZL can dramatically reduce cloud spend by minimizing cross-zone traffic.

Unlike other zone-aware options that use Kubernetes’s native Topology Aware Routing feature (including open source Linkerd), HAZL never sacrifices reliability to achieve this cost reduction.

In multi-zone environments, HAZL can:

  • Cut cloud spend by eliminating cross-zone traffic both within and across cluster boundaries;
  • Improve system reliability by distributing traffic to additional zones as the system comes under stress;
  • Prevent failures before they happen by quickly reacting to increases in latency before the system begins to fail; and
  • Preserve zone affinity for cross-cluster calls, allowing for cost reduction in multi-cluster environments.

Like Linkerd itself, HAZL is designed to “just work”. It can be applied to any Kubernetes service that speaks HTTP / gRPC regardless of the number of endpoints or distribution of workloads and traffic load across zones, and in the majority of cases requires no tuning or configuration.

HAZL was designed in response to limitations seen by customers using Kubernetes’s native Topology Aware Routing feature (formerly Topology Aware Hints). These limitations are shared by native Kubernetes balancing (kubeproxy) as well as systems such as open source Linkerd and Istio that make use of Topology Hints to make routing decisions.

Within these systems, the endpoints for each service are allocated ahead of time to specific zones by the Topology Aware Routing mechanism. This distribution is done at the Kubernetes API level, and attempts to allocate endpoints within the same zone (but note this behavior isn’t guaranteed, and the Topology Aware Routing mechanism may allocate endpoints from other zones). Once this allocation is done, it is static until endpoints are added or removed. It does not take into account traffic volumes, latency, or service health (except indirectly, if failing endpoints get removed via health checks).

Systems that make use of Topology Aware Routing, including Linkerd and Istio, use this allocation to decide where to send traffic. This accomplishes the goal of keeping traffic within a zone but at the expense of reliability: Topology Aware Routing itself provides no mechanism for sending traffic across zones if reliability demands it. The closest approximation in (some of) these systems are manual failover controls that allow the operator to failover traffic to a new zone.

Finally, Topology Aware Routing has a set of well-known constraints, including:

  • It does not work well for services where a large proportion of traffic originates from a subset of zones.
  • It does not take into account tolerations, unready nodes, or nodes that are marked as control plane or master nodes.
  • It does not work well with autoscaling. The autoscaler may not respond to increases in traffic, or respond by adding endpoints in other zones.
  • No affordance is made for cross-cluster traffic.