Rollout Pod Restarts¶
For various reasons, applications sometimes need a restart. Since the restart is not changing the version, the application should not have to go through the entire BlueGreen or canary deployment process. The Rollout object supports restarting an application by having the controller do a rolling recreate of all the Pods in a Rollout without going through all the regular BlueGreen or Canary deployments. The controller kills one Pod at a time and relies on the ReplicaSet to scale up new Pods until all the Pods are newer than the restarted time.
How it works¶
The Rollout object has a field called
.spec.restartAt that takes in a RFC 3339 formatted string (ie. 2020-03-30T21:19:35Z). If the current time is past the
restartAt time, the controller knows it needs to restart all the Pods in the Rollout. The controller goes through each ReplicaSet to see if all the Pods have a creation timestamp newer than the
restartAt time. To prevent too many Pods from restarting at once, the controller limits itself to deleting one Pod at a time and checks that no other Pods in that ReplicaSet have a deletion timestamp and that ReplicaSet is fully available. The controller checks the ReplicaSets in the following order: 1. stable ReplicaSet, 2. new ReplicaSet, and 3rd. all the other ReplicaSets starting with the oldest. Once the controller has confirmed all the Pods are newer than the
restartAt time, the controller sets the
status.restartedAt field to indicate that the Rollout has been successfully restarted. If a change occurs to the
spec.template during a restart, the restart is canceled (by setting the
status.restartedAt to the
spec.restartAt) since the Rollout has to bring up a new stack.
Note: Unlike deployments, whose restarts are essentially a normal rolling upgrade which happened to be triggered by a timestamp in the pod spec annotation, Rollouts facilitates restarts by sequentially and individually terminating pods one by one. This design choice was made in order to allow a restart to occur even when a Rollout was in the middle of a long-running blue-green/canary update. However, some consequences of this are:
- Restarting a Rollout with a single replica will cause downtime since the Rollout would only have a single one Pod which would be brought down during the restart.
- Restarting a rollout occurs slower than a deployment's rolling update, since the sequential restarting of pods is more conservative than the maxSurge, maxUnavailable fields of a rolling update.
The Argo Rollouts kubectl plugin has a command for restarting Rollouts. Check it out here.
Users can reschedule a restart on their Rollout by setting the
.spec.restartAt field to a time in the future. The controller only starts the restart after the current time is after the restartAt time.