Prometheus Metrics¶

A Prometheus query can be used to obtain measurements for analysis.

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
  - name: service-name
  metrics:
  - name: success-rate
    interval: 5m
    # NOTE: prometheus queries return results in the form of a vector.
    # So it is common to access the index 0 of the returned array to obtain the value
    successCondition: result[0] >= 0.95
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus.example.com:9090
        # timeout is expressed in seconds
        timeout: 40
        headers:
        - key: X-Scope-OrgID
          value: tenant_a
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
          )) /
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
          ))

The example shows Istio metrics, but you can use any kind of metric available to your prometheus instance. We suggest you validate your PromQL expression using the Prometheus GUI first.

See the Analysis Overview page for more details on the available options.

Range queries¶

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: range-query-example
spec:
  args:
  - name: service-name
  - name: lookback-duration
    value: 5m
  metrics:
  - name: success-rate
    # checks that all returned values are under 1000ms
    successCondition: "all(result, # < 1000)"
    failureLimit: 3
    provider:
      prometheus:
        rangeQuery:
          # See https://expr-lang.org/docs/language-definition#date-functions
          # for value date functions
          # The start point to query from
          start: 'now() - duration("{{args.lookback-duration}}")'
          # The end point to query to
          end: 'now()'
          # Query resolution width 
          step: 1m
        address: http://prometheus.example.com:9090
        query: http_latency_ms{service="{{args.service-name}}"}

Range query and successCondition/failureCondition¶

Since range queries will usually return multiple values from prometheus. It is important to assert on every value returned. See the following examples:

❌ result[0] < 1000 - this will only check the first value returned
✅ all(result, # < 1000) - checks every value returns from prometheus

See expr for more expression options.

Authorization¶

Utilizing Amazon Managed Prometheus¶

Amazon Managed Prometheus can be used as the prometheus data source for analysis. In order to do this the namespace where your analysis is running will have to have the appropriate IRSA attached to allow for prometheus queries. Once you ensure the proper permissions are in place to access AMP, you can use an AMP workspace url in your provider block and add a SigV4 config for Sigv4 signing:

provider:
  prometheus:
    address: https://aps-workspaces.$REGION.amazonaws.com/workspaces/$WORKSPACEID
    query: |
      sum(irate(
        istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
      )) /
      sum(irate(
        istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
      ))
    authentication:
      sigv4:
        region: $REGION
        profile: $PROFILE
        roleArn: $ROLEARN

With OAuth2¶

You can setup an OAuth2 client credential flow using the following values:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
  - name: service-name
  # from secret
  - name: oauthSecret  # This is the OAuth2 shared secret
    valueFrom:
      secretKeyRef:
        name: oauth-secret
        key: secret
  metrics:
  - name: success-rate
    interval: 5m
    # NOTE: prometheus queries return results in the form of a vector.
    # So it is common to access the index 0 of the returned array to obtain the value
    successCondition: result[0] >= 0.95
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus.example.com:9090
        # timeout is expressed in seconds
        timeout: 40
        authentication:
          oauth2:
            tokenUrl: https://my-oauth2-provider/token
            clientId: my-client-id
            clientSecret: "{{ args.oauthSecret }}"
            scopes: [
              "my-oauth2-scope"
            ]
        query: |
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
          )) /
          sum(irate(
            istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
          ))

The AnalysisRun will first get an access token using that information, and provide it as an Authorization: Bearer header for the metric provider call.

Additional Metadata¶

Any additional metadata from the Prometheus controller, like the resolved queries after substituting the template's arguments, etc. will appear under the Metadata map in the MetricsResult object of AnalysisRun.

Skip TLS verification¶

You can skip the TLS verification of the prometheus host provided by setting the options insecure: true.

provider:
  prometheus:
    address: https://prometheus.example.com
    insecure: true
    query: |
      sum(irate(
        istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
      )) /
      sum(irate(
        istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
      ))