AWS Load Balancer Controller (ALB)¶

Requirements¶

AWS Load Balancer Controller v1.1.5 or greater

Overview¶

AWS Load Balancer Controller (also known as AWS ALB Ingress Controller) enables traffic management through an Ingress object, which configures an AWS Application Load Balancer (ALB) to route traffic to one or more Kubernetes services. ALBs provides advanced traffic splitting capability through the concept of weighted target groups. This feature is supported by the AWS Load Balancer Controller through annotations made to the Ingress object to configure "actions."

How it works¶

ALBs are configured via Listeners, and Rules which contain Actions. Listeners define how traffic from a client comes in, and Rules define how to handle those requests with various Actions. One type of Action allows users to forward traffic to multiple TargetGroups (with each being defined as a Kubernetes service). You can read more about ALB concepts here.

An Ingress which is managed by the AWS Load Balancer Controller, controls an ALB's Listener and Rules through the Ingress' annotations and spec. In order to split traffic among multiple target groups (e.g. different Kubernetes services), the AWS Load Balancer controller looks to a specific "action" annotation on the Ingress, alb.ingress.kubernetes.io/actions.<service-name>. This annotation is injected and updated automatically by a Rollout during an update according to the desired traffic weights.

Usage¶

To configure a Rollout to use the ALB integration and split traffic between the canary and stable services during updates, the Rollout should be configured with the following fields:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
...
spec:
  strategy:
    canary:
      # canaryService and stableService are references to Services which the Rollout will modify
      # to target the canary ReplicaSet and stable ReplicaSet respectively (required).
      canaryService: canary-service
      stableService: stable-service
      trafficRouting:
        alb:
          # The referenced ingress will be injected with a custom action annotation, directing
          # the AWS Load Balancer Controller to split traffic between the canary and stable
          # Service, according to the desired traffic weight (required).
          ingress: ingress
          # If you want to control multiple ingress resources you can use the ingresses field, if ingresses is specified
          # the ingress field will need to be omitted.
          ingresses:
           - ingress-1
           - ingress-2
          # Reference to a Service that the Ingress must target in one of the rules (optional).
          # If omitted, uses canary.stableService.
          rootService: root-service
          # Service port is the port which the Service listens on (required).
          servicePort: 443

The referenced Ingress should be deployed with an ingress rule that matches the Rollout service:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/ingress.class: alb
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            # serviceName must match either: canary.trafficRouting.alb.rootService (if specified),
            # or canary.stableService (if rootService is omitted)
            name: root-service
            # servicePort must be the value: use-annotation
            # This instructs AWS Load Balancer Controller to look to annotations on how to direct traffic
            port:
              name: use-annotation

During an update, the rollout controller injects the alb.ingress.kubernetes.io/actions.<SERVICE-NAME> annotation, containing a JSON payload understood by the AWS Load Balancer Controller, directing it to split traffic between the canaryService and stableService according to the current canary weight.

The following is an example of our example Ingress after the rollout has injected the custom action annotation that splits traffic between the canary-service and stable-service, with a traffic weight of 10 and 90 respectively:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/actions.root-service: |
      {
        "Type":"forward",
        "ForwardConfig":{
          "TargetGroups":[
            {
                "Weight":10,
                "ServiceName":"canary-service",
                "ServicePort":"80"
            },
            {
                "Weight":90,
                "ServiceName":"stable-service",
                "ServicePort":"80"
            }
          ]
        }
      }
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: root-service
            port:
              name: use-annotation

Note

Argo rollouts additionally injects an annotation, rollouts.argoproj.io/managed-alb-actions, to the Ingress for bookkeeping purposes. The annotation indicates which actions are being managed by the Rollout object (since multiple Rollouts can reference one Ingress). Upon a rollout deletion, the rollout controller looks to this annotation to understand that this action is no longer managed, and is reset to point only the stable service with 100 weight.

rootService¶

By default, a rollout will inject the alb.ingress.kubernetes.io/actions.<SERVICE-NAME> annotation using the service/action name specified under spec.strategy.canary.stableService. However, it may be desirable to specify an explicit service/action name different from the stableService. For example, one pattern is to use a single Ingress containing three different rules to reach the canary, stable, and root service separately (e.g. for testing purposes). In this case, you may want to specify a "root" service as the service/action name instead of stable. To do so, reference a service under rootService under the alb specification:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      canaryService: guestbook-canary
      stableService: guestbook-stable
      trafficRouting:
        alb:
          rootService: guestbook-root
...

Sticky session¶

Because at least two target groups (canary and stable) are used, target group stickiness requires additional configuration: Sticky session must be activated on the target group via

apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
...
      trafficRouting:
        alb:
          stickinessConfig:
            enabled: true
            durationSeconds: 3600
...

More information can be found in the AWS ALB API

Zero-Downtime Updates with AWS TargetGroup Verification¶

Argo Rollouts contains two features to help ensure zero-downtime updates when used with the AWS LoadBalancer controller: TargetGroup IP verification and TargetGroup weight verification. Both features involve the Rollout controller performing additional safety checks to AWS, to verify the changes made to the Ingress object are reflected in the underlying AWS TargetGroup.

TargetGroup IP Verification¶

Note

Target Group IP verification available since Argo Rollouts v1.1

The AWS LoadBalancer controller can run in one of two modes:

TargetGroup IP Verification is only applicable when the AWS LoadBalancer controller in IP mode. When using the AWS LoadBalancer controller in IP mode (e.g. using the AWS CNI), the ALB LoadBalancer targets individual Pod IPs, as opposed to K8s node instances. Targeting Pod IPs comes with an increased risk of downtime during an update, because the Pod IPs behind the underlying AWS TargetGroup can more easily become outdated from the actual availability and status of pods, causing HTTP 502 errors when the TargetGroup points to pods which have already been scaled down.

To mitigate this risk, AWS recommends the use of pod readiness gate injection when running the AWS LoadBalancer in IP mode. Readiness gates allow for the AWS LoadBalancer controller to verify that TargetGroups are accurate before marking newly created Pods as "ready", preventing premature scale down of the older ReplicaSet.

Pod readiness gate injection uses a mutating webhook which decides to inject readiness gates when a pod is created based on the following conditions:

There exists a service matching the pod labels in the same namespace
There exists at least one target group binding that refers to the matching service

Another way to describe this is: the AWS LoadBalancer controller injects readiness gates onto Pods only if they are "reachable" from an ALB Ingress at the time of pod creation. A pod is considered reachable if an (ALB) Ingress references a Service which matches the pod labels. It ignores all other Pods.

One challenge with this manner of pod readiness gate injection, is that modifications to the Service selector labels (spec.selector) do not allow for the AWS LoadBalancer controller to inject the readiness gates, because by that time the Pod was already created (and readiness gates are immutable). Note that this is an issue when you change Service selectors of any ALB Service, not just ones involved in Argo Rollouts.

Because Argo Rollout's blue-green strategy works by modifying the activeService selector to the new ReplicaSet labels during promotion, it suffers from the issue where readiness gates for the spec.strategy.blueGreen.activeService fail to be injected. This means there is a possibility of downtime in the following problematic scenario during an update from V1 to V2:

Update is triggered and V2 ReplicaSet stack is scaled up
V2 ReplicaSet pods become fully available and ready to be promoted
Rollout promotes V2 by updating the label selectors of the active service to point to the V2 stack (from V1)
Due to unknown issues (e.g. AWS load balancer controller downtime, AWS rate limiting), registration of the V2 Pod IPs to the TargetGroup does not happen or is delayed.
V1 ReplicaSet is scaled down to complete the update

After step 5, when the V1 ReplicaSet is scaled down, the outdated TargetGroup would still be pointing to the V1 Pods IPs which no longer exist, causing downtime.

To allow for zero-downtime updates, Argo Rollouts has the ability to perform TargetGroup IP verification as an additional safety measure during an update. When this feature is enabled, whenever a service selector modification is made, the Rollout controller blocks progression of the update until it can verify the TargetGroup is accurately targeting the new Pod IPs of the bluegreen.activeService. Verification is achieved by querying AWS APIs to describe the underlying TargetGroup, iterating its registered IPs, and ensuring all Pod IPs of the activeService's Endpoints list are registered in the TargetGroup. Verification must succeed before running postPromotionAnalysis or scaling down the old ReplicaSet.

Similarly for the canary strategy, after updating the canary.stableService selector labels to point to the new ReplicaSet, the TargetGroup IP verification feature allows the controller to block the scale down of the old ReplicaSet until it verifies the Pods IP behind the stableService TargetGroup are accurate.

TargetGroup Weight Verification¶

Note

TargetGroup weight verification available since Argo Rollouts v1.0

TargetGroup weight verification addresses a similar problem to TargetGroup IP verification, but instead of verifying that the Pod IPs of a service are reflected accurately in the TargetGroup, the controller verifies that the traffic weights are accurate from what was set in the ingress annotations. Weight verification is applicable to AWS LoadBalancer controllers which are running either in IP mode or Instance mode.

After Argo Rollouts adjusts a canary weight by updating the Ingress annotation, it moves on to the next step. However, due to external factors (e.g. AWS rate limiting, AWS load balancer controller downtime) it is possible that the weight modifications made to the Ingress, did not take effect in the underlying TargetGroup. This is potentially dangerous as the controller will believe it is safe to scale down the old stable stack when in reality, the outdated TargetGroup may still be pointing to it.

Using the TargetGroup weight verification feature, the rollout controller will additionally verify the canary weight after a setWeight canary step. It accomplishes this by querying AWS LoadBalancer APIs directly, to confirm that the Rules, Actions, and TargetGroups reflect the desire of Ingress annotation.

Usage¶

To enable AWS target group verification, add --aws-verify-target-group flag to the rollout-controller flags:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argo-rollouts
spec:
  template:
    spec:
      containers:
      - name: argo-rollouts
        args: [--aws-verify-target-group]
        # NOTE: in v1.0, the --alb-verify-weight flag should be used instead

Note

The --aws-region flag is mandatory for enabling AWS integrations, including TargetGroup verification. If the Argo Rollouts controller does not have the correct AWS region specified, or lacks access to validate the AWS ALB, the promotion process will fail. Ensure that the necessary AWS API permissions are granted to the controller and that the region is correctly configured.

For this feature to work, the argo-rollouts deployment requires the following AWS API permissions under the Elastic Load Balancing API:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeRules",
                "elasticloadbalancing:DescribeTags",
                "elasticloadbalancing:DescribeTargetHealth"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

There are various ways of granting AWS privileges to the argo-rollouts pods, which is highly dependent to your cluster's AWS environment, and out-of-scope of this documentation. Some solutions include:

AWS access and secret keys
kiam
kube2iam
EKS ServiceAccount IAM Roles

Zero-Downtime Updates with Ping-Pong feature¶

Above there was described the recommended way by AWS to solve zero-downtime issue. Is a use a pod readiness gate injection when running the AWS LoadBalancer in IP mode. There is a challenge with that approach, modifications of the Service selector labels (spec.selector) not allowed the AWS LoadBalancer controller to mutate the readiness gates. And Ping-Pong feature helps to deal with that challenge. At some particular moment one of the services (e.g. ping) is "wearing a hat" of stable service another one (e.g. pong) is "wearing a hat" of canary. At the end of the promotion step all 100% of traffic sending to the "canary" (e.g. pong). And then the Rollout swapped the hats of ping and pong services so the pong became a stable one. The Rollout status object holds the value of who is currently the stable ping or pong (status.canary.currentPingPong). And this way allows the rollout to use pod readiness gate injection as the services are not changing their labels at the end of the rollout progress.

Important

Ping-Pong feature available since Argo Rollouts v1.2

Example¶

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: example-rollout
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.15.4
        ports:
        - containerPort: 80
  strategy:
    canary:
      pingPong: #Indicates that the ping-pong services enabled
        pingService: ping-service
        pongService: pong-service
      trafficRouting:
        alb:
          ingress: alb-ingress
          servicePort: 80
      steps:
      - setWeight: 20
      - pause: {}

Custom annotations-prefix¶

The AWS Load Balancer Controller allows users to customize the annotation prefix used by the Ingress controller using a flag to the controller, --annotations-prefix (from the default of alb.ingress.kubernetes.io). If your AWS Load Balancer Controller is customized to use a different annotation prefix, annotationPrefix field should be specified such that the Ingress object will be annotated in a manner understood by the cluster's aws load balancer controller.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      trafficRouting:
        alb:
          annotationPrefix: custom.alb.ingress.kubernetes.io

Custom Ingress Class¶

By default, Argo Rollout will operate on Ingresses with the annotation:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: alb

Or with the ingressClassName:

apiVersion: networking.k8s.io/v1
kind: Ingress
spec:
  ingressClassName: alb

To configure the controller to operate on Ingresses with a different class name, you can specify a different value through the --alb-ingress-classes flag in the controller command line arguments.

Note that the --alb-ingress-classes flag can be specified multiple times if the Argo Rollouts controller should operate on multiple values. This may be desired when a cluster has multiple Ingress controllers that operate on different kubernetes.io/ingress.class or spec.ingressClassName values.

If the controller needs to operate on any Ingress without the kubernetes.io/ingress.class annotation or spec.ingressClassName, the flag can be specified with an empty string (e.g. --alb-ingress-classes '').