Support roles and responsibilities

What each role does on support and when it’s needed.

You can refer to Incident Process for information on How to manage incidents and incident communication.

Rota

The support rota is on Pagerduty.

We have two rotas; in hours engineering, in hours comms.

Our escalation rota is GaaP SCS Escalation.

If you are the engineer on support, add your shift onto the #paas slack channel description so your colleagues can see who is on support and that you are currently looking at the issue. If you need to swap shifts, ask your colleagues if they can swap or cover, then update Pagerduty using an override.

In hours engineering support role and responsibility

The in hours support lead is responsible for:

monitoring system alerts and system health
recording the number and nature of build fails on pivotal (label: fixing the build) so we can identify higher impact/frequency ones
ensuring that each ticket in ZenDesk is picked up and responded to appropriately
initial triage - providing or confirming the initial assessment of priority - there may be some discussion with the initiator (tenant) required here to clarify impact.
involving other people as needed - supported by the delivery or product manager
assigning an incident lead if the item is an incident.
picking up ‘small tasks’ in Pivotal if there is time in between support tasks
having a support handover meeting at the beginning and end of the support week

When you need to take a break for lunch or essential meetings etc, make sure you tell people ahead of time, and arrange for a colleague to cover for you. All other members of the team are responsible for providing assistance to the support person as needed. If you don’t feel comfortable asking your colleagues, talk to the delivery manager.

In hours comms lead role and responsibility

Checks the on call engineer is ok and finds out if they need any additional support.
Responsible for any communications (internal or external) required throughout the duration of the incident.
Protects the support engineer from unnecessary distractions or questions.
Opens an incident template with view permissions to all GDS staff. Posts it in #paas-incident channel.
Records a timeline of events through the incident.
Drafts and sends regular updates to tenants via Statuspage.

GaaP SCS Escalation role and responsibility

If the out of hours 1st line support has decided that they need help with tenant communications so they can focus on fixing the issue, you should contact the person on the GaaP SCS Escalation rota. They will then be responsible for:

Making decisions and assisting with tenant communications (if appropriate). This includes Slack #paas-incident channel, and Statuspage.
To provide leadership-level backup for engineers if an incident requires leadership decision making or a broader response involving updating comms or activating other engineers.

You can find more information on the PaaS incident Process.