Prometheus Metrics¶
v1.3 and after
User Metrics¶
Each of generated EventSource, Sensor and EventBus PODs exposes an HTTP endpoint for its metrics, which include things like how many events were generated, how many actions were triggered, and so on. To let your Prometheus server discover those user metrics, add following to your configuration.
- job_name: 'argo-events'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: 'controller in (eventsource-controller,sensor-controller,eventbus-controller)'
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_eventbus_name, __meta_kubernetes_pod_label_controller]
action: replace
regex: (.+);eventbus-controller
replacement: $1
target_label: 'eventbus_name'
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_controller]
action: replace
regex: (.+);eventbus-controller
replacement: $1
target_label: 'namespace'
- source_labels: [__address__, __meta_kubernetes_pod_label_controller]
action: drop
regex: (.+):(\d222);eventbus-controller
Also please make sure your Prometheus Service Account has the permission to do
POD discovery. A sample ClusterRole
like below needs to be added or merged,
and grant it to your Service Account.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: pod-discovery
rules:
- apiGroups: [""]
resources:
- pods
verbs: ["get", "list", "watch"]
EventSource¶
argo_events_event_service_running_total¶
How many configured events in the EventSource object are actively running.
argo_events_events_sent_total¶
How many events have been sent successfully.
argo_events_events_sent_failed_total¶
How many events failed to send to EventBus.
argo_events_events_processing_failed_total¶
How many events failed to process due to all the reasons, it includes
argo_events_events_sent_failed_total
.
argo_events_event_processing_duration_milliseconds¶
Event processing duration (from getting the event to send it to EventBus) in milliseconds.
Sensor¶
argo_events_action_triggered_total¶
How many actions have been triggered successfully.
argo_events_action_failed_total¶
How many actions failed.
argo_events_action_retries_failed_total¶
How many actions failed after the retries have been exhausted.
This is also incremented if there is no retryStrategy
specified.
argo_events_action_duration_milliseconds¶
Action triggering duration.
EventBus¶
For native
NATS EventBus, check this
link for the metrics
explanation.
Controller Metrics¶
If you are interested in Argo Events controller metrics, add following to your Prometheus configuration.
- job_name: 'argo-events-controllers'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: 'app in (eventsource-controller,sensor-controller,eventbus-controller)'
relabel_configs:
- source_labels: [__address__, __meta_kubernetes_pod_label_app]
action: replace
regex: (.+);(eventsource-controller|sensor-controller|eventbus-controller)
replacement: $1:7777
target_label: '__address__'
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app]
action: replace
regex: (.+);(eventsource-controller|sensor-controller|eventbus-controller)
replacement: $1
target_label: 'namespace'
Golden Signals¶
Following metrics are considered as Golden Signals of monitoring your applications running with Argo Events.
-
Latency
-
argo_events_event_processing_duration_milliseconds
-
argo_events_action_duration_milliseconds
-
Traffic
-
argo_events_events_sent_total
-
argo_events_action_triggered_total
-
Errors
-
argo_events_events_processing_failed_total
argo_events_events_sent_failed_total
argo_events_action_failed_total
-
argo_events_action_retries_failed_total
-
Saturation
-
argo_events_event_service_running_total
. - Other Kubernetes metrics such as CPU or memory.