Skip to main content

This is for internal use by the PaaS team. Public-facing documentation is located at docs.cloud.service.gov.uk.

Comms lead role - PaaS Incidents

What to do first:

Learn about Pagerduty

  1. Pagerduty calls/alerts you
  2. Go on Slack - PaaS incident channel (and PaaS internal channel)
  3. Set up a hangout for you and the engineer (add tenant only if it will be useful)
  4. Make a copy of the incident report name the document and start to fill it in
  5. Record the timeline of events as they happen in the incident report

Statuspage - creating an incident:

Learn about Statuspage

  1. StatusPage.io - log in and create an incident (name it)
  2. Select ‘apply template’ (for example, possible issue being investigated/we’re having an incident)
  3. Fill in the template with relevant details
  4. You can choose the components (for example, API, apps, billing) that are affected
  5. Select ‘send notifications’ to email tenants. If it’s a small issue, you may not want to send them a notification (in this case, only our status page will be updated)
  6. The subscribers on statuspage will get the notifications

If you need to escalate to SMT on call:

  1. If you need to escalate to SMT (for example, if its affecting coronavirus services) - go to rotas app and select the current on call individual to get their contact info

Don’t forget:

  1. Your aim is to do just enough support out of hours to get through to working hours :)
  2. You can update the x-gov slack paas channel if relevant

Response times for P1 incidents

During working hours (9am to 5pm Monday to Friday)

Start work and respond: 20 minutes Tenant updated: 1hr

Outside working hours

Start work and respond: 40 minutes Tenant updated: 1hr