HA/DR Recommendations¶
EventBus¶
A simple EventBus used for non-prod deployment or testing purpose could be:
apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
name: default
spec:
nats:
native:
auth: token
However this is not good enough to run your production deployment, following settings are recommended to make it more reliable, and achieve high availability.
Persistent Volumes¶
Even though the EventBus PODs already have data sync mechanism between them, persistent volumes are still recommended to be used to avoid any events data lost when the PODs crash.
An EventBus with persistent volumes looks like below:
spec:
nats:
native:
auth: token
persistence:
storageClassName: standard
accessMode: ReadWriteOnce
volumeSize: 20Gi
Anti-Affinity¶
You can run the EventBus PODs with anti-affinity, to avoid the situation that all PODs are gone when a disaster happens.
An EventBus with best effort node anti-affinity:
spec:
nats:
native:
auth: token
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
controller: eventbus-controller
eventbus-name: default
topologyKey: kubernetes.io/hostname
weight: 100
An EventBus with hard requirement node anti-affinity:
spec:
nats:
native:
auth: token
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
controller: eventbus-controller
eventbus-name: default
topologyKey: kubernetes.io/hostname
To do AZ (Availability Zone) anti-affinity, change the value of topologyKey
from kubernetes.io/hostname
to topology.kubernetes.io/zone
.
Besides affinity
,
nodeSelector
and
tolerations
also could be set through spec.nats.native.nodeSelector
and
spec.nats.native.tolerations
.
POD Priority¶
Setting POD Priority could reduce the chance of PODs being evicted.
Priority could be set through spec.nats.native.priorityClassName
or
spec.nats.native.priority
.
PDB¶
EventBus service is essential to EventSource and Sensor Pods, it would be better to have a PodDisruptionBudget
to prevent it from Pod Disruptions. The following PDB object states maxUnavailable
is 1, which is suitable for a 3 replica EventBus object.
If your EventBus has a name other than default
, change it accordingly in the yaml.
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: eventbus-default-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
controller: eventbus-controller
eventbus-name: default
EventSources¶
Replicas¶
EventSources can run with HA by setting spec.replicas
to a number >1
, see
more detail here.
EventSource POD Node Selection¶
EventSource POD affinity
, nodeSelector
and tolerations
could be set
through spec.template.affinity
, spec.template.nodeSelector
and
spec.template.tolerations
.
EventSource POD Priority¶
Priority could be set through spec.template.priorityClassName
or
spec.template.priority
.
Sensors¶
Replicas¶
Sensors can run with HA by setting spec.replicas
to a number >1
, see more
detail here.
Sensor POD Node Selection¶
Sensor POD affinity
, nodeSelector
and tolerations
could also be set
through spec.template.affinity
, spec.template.nodeSelector
and
spec.template.tolerations
.
Sensor POD Priority¶
Priority could be set through spec.template.priorityClassName
or
spec.template.priority
.