Horizontal Pod Autoscaling¶
Argo Rollouts supports autoscaling using the standard Kubernetes Horizontal Pod Autoscaler (HPA) to manage the number of pods during a progressive rollout based on application load.
HPA Support in Argo Rollouts¶
Argo Rollout works with the Horizontal Pod Autoscaler (HPA) through autoscaling/v2beta
or higher and the stable autoscaling/v2
APIs (available in Kubernetes 1.23+). These APIs provide the functionality for the HPA to target and scale custom resources, including the Argo Rollout object.
How HPA Works with Argo Rollouts¶
Since version v0.3.0, Argo Rollouts exposes its /scale
subresource in the same way as a standard Kubernetes Deployment
does. This allows the Horizontal Pod Autoscaler (HPA) to discover the Rollout resource. The HPA accesses the /scale subresource to get the current number of replicas from the status.replicas
field of the Rollout.
Based on metrics the HPA monitors (e.g., CPU, memory, or custom metrics), the HPA decides if scaling is needed. If scaling is needed, the HPA writes the new desired replica count to the spec.replicas
field of the Rollout via the same /scale
subresource.
The HPA does not directly manage the number of replicas or ReplicaSets. Instead, it modifies the spec.replicas
field of the Rollout resource to set the desired number of replicas. The Argo Rollouts controller then detects the changes to the spec.replicas
field and, when it sees the replica count updated by the HPA, instructs the ReplicaSet to create or delete pods.
In short, the HPA makes scaling decisions by determining the total desired number of pods, while Argo Rollouts carries out the deployment changes, allocating pods according to the configured rollout strategy and properties.
Example Configuration:¶
The following YAML provides a base configuration that includes a Rollout, two Services (for blue/green or canary), and an HPA for autoscaling. Adapt this configuration for different deployment strategies and scenarios as explained in the subsequent sections.
1. Rollout¶
The core Argo Rollout object defines deployment strategy (e.g., blue/green or canary) and replaces the Kubernetes Deployment. The HPA targets this object to manage the total desired replica count.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollout-hpa-example
spec:
replicas: 3 # HPA will scale this between min/max replicas
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
containers:
- name: demo
image: https://github.com/argoproj/rollouts-demo
ports:
- containerPort: 8080
strategy:
blueGreen | canary: # Replace with either blueGreen or canary
autoPromotionEnabled: false
2. Service¶
Two Kubernetes Service
s are required: a stable service for live traffic and a preview service to test the new version before a full promotion.
apiVersion: v1
kind: Service
metadata:
name: argo-rollouts-stable-service
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: demo
---
apiVersion: v1
kind: Service
metadata:
name: argo-rollouts-preview-service
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: demo
3. HorizontalPodAutoscaler¶
The HPA targets a Rollout for scaling, monitors its pods' average metrics, and adjusts the total desired pod count.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: demo-hpa
labels:
app: demo
spec: # The min and max number of pods the HPA can scale
minReplicas: 1
maxReplicas: 10
scaleTargetRef: # The HPA targets the Rollout object for scaling.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
name: rollout-hpa-example
metrics: # Defines the scaling trigger
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 16Mi
Blue/Green Deployments with HPA¶
During a Blue/Green deployment, the Rollout controller manages two ReplicaSets: an active (Blue
) for the current version and a preview (Green
) one for the new version. When an HPA scales the deployment up or down, it updates the total desired replica count in the Rollout's spec.replicas
field. The Rollout controller detects this change and applies it to both the active and preview ReplicaSets. As a result, both blue and green versions scale in unison, maintaining equal pod counts.
After full promotion, the preview ReplicaSet becomes the new active ReplicaSet, and the old active ReplicaSet scales to zero. From then on, the Rollout manages only the single active ReplicaSet, and any further HPA scaling adjustments will be applied only to this active ReplicaSet.
To implement a Blue/Green deployment, add the following strategy
configuration to the base Rollout resource:
strategy:
blueGreen:
previewService: argo-rollouts-preview-service
activeService: argo-rollouts-stable-service
autoPromotionEnabled: false
previewReplicaCount
property described in the next section.
Blue/Green with previewReplicaCount
¶
When using the previewReplicaCount
field in the Blue/Green strategy, the Rollout controller will change the stable ReplicaSet count as instructed by the HPA, while keeping the preview ReplicaSet pinned to the number of pods specified in the previewReplicaCount
field in the Rollout manifest.
After rollout is fully promoted, the preview version becomes the new stable ReplicaSet. At that point, the HPA can manage the Rollout to scale up or down the number of pods in the stable version.
strategy:
blueGreen:
previewService: argo-rollouts-preview-service
activeService: argo-rollouts-stable-service
previewReplicaCount: 1 # Pins the number of pods in the preview
autoPromotionEnabled: false
previewReplicaCount
prevents the HPA from scaling the preview ReplicaSet, keeping it fixed during testing. Only the stable ReplicaSet responds to HPA scaling decisions. Skipping this previewReplicaCount
in the configuration allows HPA to scale both stable and preview ReplicaSets in unison, which incurs additional resource usage and cost.
Warning: Use previewReplicaCount
with caution, as it always takes precedence regardless of the current load or HPA scaling decisions.
Canary Deployments with HPA¶
Unlike a Blue/Green deployment, autoscaling a Canary deployment is more complex because traffic is split between two active versions: canary and stable.
Canary without Traffic Manager (setWeight
only)¶
Without a traffic manager, Kubernetes Services distribute traffic evenly across all available pods, and the Rollout controller uses the pod count to split traffic between the preview (canary) and stable versions. As a result, 20% canary pods receive approximately 20% of the traffic.
As the application load changes continuously, the HPA scales the total number of pods up or down. During promotion, when the setWeight
increases (e.g., 20% -> 50%), the Rollout controller decreases the stable pods and increases the canary pods to match the new ratio.
The HPA monitors a set of pods through a label selector (e.g., app: demo
) to identify which pods to monitor. Since the pods for both the stable and canary ReplicaSets share this label, the HPA sees them as a single group and calculates the average metric value across all of these pods combined.
The HPA calculates the desired number of replicas using this formula:
desired_replicas = ⌈ current_replicas x ( current_metric_value / desired_metric_value) ⌉
The Rollout controller then decides how the total desired replicas should be distributed based on the setWeight
defined in the Rollout strategy. It calculates the number of canary and stable pods using this formula:
canary_replicas = ⌈ HPA_total_desired_replicas x (setWeight / 100)⌉
stable_replicas = HPA_total_desired_replicas - canary_replicas
Example with setWeight: 20
¶
When HPA decides that 10 pods are needed to handle the current load, the Rollout controller sets:
Canary Pods: ceil(10 * 0.2) = 2 pods
Stable Pods: 10 - 2 = 8 pods
Canary Pods: ceil( 20 * 0.2) = 4 pods
Stable Pods: 20 - 4 = 16 pods
setWeight
property is defined within the strategy.canary.steps
field of the Rollout
manifest.
strategy:
canary:
canaryService: canary-service
stableService: stable-service
steps:
- setWeight: 20
- pause: {}
- setWeight: 50
- pause: {}
- setWeight: 100
- pause: {}
Canary with Traffic Manager¶
When using a traffic manager (e.g., Traefik, Istio, Ingress etc), the responsibility for traffic splitting shifts from Argo Rollouts to the traffic manager. Instead of adjusting pod counts, the Rollouts controller updates the traffic manager’s configuration at each setWeight
step, specifying what percentage of traffic should go to the new canary version.
At each setWeight
step, the controller edits the traffic manager’s custom configuration resource (e.g., a TraefikService
for Traefik or a VirtualService
for Istio) to update the weight
field. The traffic manager detects this configuration change and immediately begins routing the specified percentage of traffic to the canary service, while the remaining traffic continues flowing to the stable service.
For example, setWeight: 20
results in 20% of traffic going to the canary and 80% to the stable version, no matter how many pods are running for each version.
The key difference from the default behavior (without a traffic manager) is that Argo Rollouts would achieve a 20% traffic shift by adjusting pod counts to a 20/80 ratio. With a traffic manager, traffic is split independently of pod counts. To manage both traffic distribution and pod counts, see the next section.
Warning: Lack of Scaling Isolation
Using a single HPA for both stable and canary pods has a drawback: scaling isn’t decoupled between the stable and canary versions. If the canary has a performance issue (e.g. a memory leak or CPU spike), the HPA will see a high average metric across all pods and scale up the entire application. This means the stable version also gets scaled up due to a problem in the canary. Therefore, it is crucial to develop the applications free of memory leaks and performance issues for smooth canary releases.
Example Configuration:¶
In this example, Traefik is used as the traffic manager. The setup requires: an IngressRoute
for Traefik to expose the service, a TraefikService
to handle the weighted load balancing between canary and stable, and a Rollout
configured to use Traefik's trafficRouting
.
The HPA still targets the Rollouts resource as in previous examples.
1. IngressRoute¶
The IngressRoute resource exposes an application to the outside world through Traefik and forwards incoming traffic to the TraefikService
.
# ingress-route.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: demo-ingress
namespace: default
spec:
entryPoints:
- web
routes:
- match: PathPrefix(`/`)
kind: Rule
services:
- name: traefik-service # Points to the TraefikService
namespace: default
kind: TraefikService
2. TraefikService¶
The custom resource TraefikService
defines how Traefik distributes traffic between two Kubernetes Services
(e.g., stable and canary).
# traefik-service.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: TraefikService
metadata:
name: traefik-service
spec:
weighted:
services:
- name: rollout-canary-stable
port: 80
- name: rollout-canary-preview
port: 80
3. Rollout with Traefik trafficRouting
¶
Include the trafficRouting
property in the Rollout
manifest. The trafficRouting.traefik.name
must match the TraefikService
name to route traffic to the Rollout strategy.
strategy:
canary:
canaryService: canary-service
stableService: stable-service
trafficRouting: # Add trafficRouting in the Rollout manifest
traefik:
weightedTraefikServiceName: traefik-service # This name MUST match
steps:
- setWeight: 20
- pause: {}
Decoupling Canary with Traffic Manager (setCanaryScale
)¶
The setCanaryScale
field decouples canary scaling from the HPA by pinning the canary ReplicaSet to a fixed number of pods at each rollout step. The HPA continues to manage the total replica count, while the Rollout controller ensures the canary pods number remains fixed, so the HPA scaling is applied only to the stable pods by calculating the number of stable pods as:
stable_replicas = HPA_total_desired_replicas - pinned_canary_replicas
setWeight
. It prevents the canary from consuming excessive resources or triggering unnecessary scaling of the stable environment due to faulty canary behavior during tests.
This strategy decouples traffic weight from pod counts:
With setWeight: 20
and setCanaryScale.replicas: 1
, the traffic manager will send 20% of traffic to the single pinned canary pod, while the remaining 80% of traffic goes to the stable service (backed by all the autoscaled stable pods).
With setWeight: 80
and setCanaryScale.replicas: 1
, the traffic manager sends 80% of total traffic to that single pinned canary pod, no matter how many stable pods are running.
It allows you to both control traffic and keep the canary resource usage minimal.
strategy:
canary:
canaryService: canary-service
stableService: stable-service
trafficRouting:
traefik:
weightedTraefikServiceName: traefik-service
steps:
- setWeight: 20
- setCanaryScale: replicas: 1 # one canary pod receives 20% traffic
- pause: {}
- setWeight: 50
- setCanaryScale: replicas: 3 # three canary pods receive 50% traffic
- pause: {}
- setWeight: 90
- setCanaryScale: replicas: 5 # five canary pods receive 90% traffic
- pause: {}
- setWeight: 100
- setCanaryScale: replicas: 8 # eight canary pods receive 100% traffic
- pause: {}
Best Practices¶
- Choose the right strategy: use standard Blue/Green for simple deployments, add
previewReplicaCount
for cost optimization, and consider canary withsetCanaryScale
for maximum control and isolation. - Monitor both versions during deployments: make sure your monitoring covers both stable and canary/preview versions to detect any performance anomalies early.
- Set appropriate HPA thresholds: configure your HPA min/max replicas and target metrics to align with your application’s specific performance characteristics.
- Test rollback scenarios: in some canary scenarios, manual scaling back is required
- Implement proper resource requests and limits: set appropriate resource requests and limits on your pods to help the HPA make accurate scaling decisions.
- Use traffic managers for controlled deployments rather than relying on pod ratios alone.