Skip to main content

This is for internal use by the PaaS team. Public-facing documentation is located at docs.cloud.service.gov.uk.

Testing Alertmanager

This is a guide to testing Alertmanager

We use Alertmanager for broadcasting and silencing our Prometheus alerts.

Requirements

  • AWS access
  • CredHub CLI
  • amtool - the Alertmanager CLI

Install amtool with go get github.com/prometheus/alertmanager/cmd/amtool

Before you start

1. Get credentials to talk to Alertmanager

2. Navigate to Alertmanager’s web UI

For example:

Env URL
Dev https://alertmanager-1.tlwr.cloudpipeline.digital
Staging https://alertmanager-1.london.staging.cloudpipeline.digital
London https://alertmanager-1.london.cloud.service.gov.uk
Ireland https://alertmanager-1.cloud.service.gov.uk

3. Check you can talk to Alertmanager

amtool \
  --alertmanager.url=https://admin:<PASSWORD>@alertmanager-1.tlwr.dev.cloudpipeline.digital \
  config show

where <PASSWORD> is the Alertmanager password from CredHub

The following amtool commands assume that you specify --alertmanager.url

 Checking where alerts go

Alertmanager has a concept of “routes”, which dictate where alerts are routed

You can list routes:

amtool config routes show

which will print:

Routing tree:
.
└── default-route  receiver: critical-receiver
    ├── {severity="warning"}  continue: true  receiver: warning-receiver
    ├── {severity="critical"}  continue: true  receiver: critical-receiver
    ├── {notify="pagerduty-24-7"}  continue: true  receiver: pagerduty-24-7-receiver
    └── {notify="pagerduty-in-hours"}  receiver: pagerduty-in-hours-receiver

You can see which route would be selected for an imaginary alert:

amtool config routes test severity=warning otherlabel=mylabelvalue

prints:

warning-receiver

and

amtool config routes test notify=pagerduty-24-7 otherlabel=foo

prints

pagerduty-24-7-receiver

Simulating an alert

1. Notify your colleagues that you are going to be testing alerts

2. Add an override in PagerDuty

You may wish to override both the engineer and the comms schedules, so that only you are called

3. Create an alert that will expire in 5 minutes

amtool alert add \
  this-is-a-test-alert \
  notify=pagerduty-in-hours \
  --end="$(date -v +5M -u '+%Y-%m-%dT%H:%M:%SZ')"

4. Await your phone call from PagerDuty, and resolve the incident