How to Integrate with PagerDuty
This document provides a step-by-step guide to integrate Versus Incident with PagerDuty for on-call management. The integration enables automated escalation of alerts to on-call teams when incidents are not acknowledged within a specified time.
We'll cover setting up PagerDuty, configuring the integration with Versus, deploying Versus, and testing the integration with practical examples.
Prerequisites
Before you begin, ensure you have:
- A PagerDuty account (you can start with a free trial if needed)
- Versus Incident deployed (instructions provided later)
- Prometheus Alert Manager set up to monitor your systems
Setting Up PagerDuty for On-Call
Let's configure a practical example in PagerDuty with services, schedules, and escalation policies.
1. Create Users in PagerDuty
First, we need to set up the users who will be part of the on-call rotation:
- Log in to your PagerDuty account
- Navigate to People > Users > Add User
- For each user, enter:
- Name (e.g., "Natsu Dragneel")
- Email address
- Role (User)
- Time Zone
- PagerDuty will send an email invitation to each user
- Users should complete their profiles by:
- Setting up notification rules (SMS, email, push notifications)
- Downloading the PagerDuty mobile app
- Setting contact details
Repeat to create multiple users (e.g., Natsu, Zeref, Igneel, Gray, Gajeel, Laxus).
2. Create On-Call Schedules
Now, let's create schedules for who is on-call and when:
- Navigate to People > Schedules > Create Schedule
- Name the schedule (e.g., "Team A Primary")
- Set up the rotation:
- Choose a rotation type (Weekly is common)
- Add users to the rotation (e.g., Natsu, Zeref, Igneel)
- Set handoff time (e.g., Mondays at 9:00 AM)
- Set time zone
- Save the schedule
- Create a second schedule (e.g., "Team B Secondary") for other team members
3. Create Escalation Policies
An escalation policy defines who gets notified when an incident occurs:
- Navigate to Configuration > Escalation Policies > New Escalation Policy
- Name the policy (e.g., "Critical Incident Response")
- Add escalation rules:
- Level 1: Select the "Team A Primary" schedule with a 5-minute timeout
- Level 2: Select the "Team B Secondary" schedule
- Optionally, add a Level 3 to notify a manager
- Save the policy
4. Create a PagerDuty Service
A service is what receives incidents from monitoring systems:
- Navigate to Configuration > Services > New Service
- Name the service (e.g., "Versus Incident Integration")
- Select "Events API V2" as the integration type
- Select the escalation policy you created in step 3
- Configure incident settings (Auto-resolve, urgency, etc.)
- Save the service
5. Get the Integration Key
After creating the service, you'll need the integration key (also called routing key):
- Navigate to Configuration > Services
- Click on your newly created service
- Go to the Integrations tab
- Find the "Events API V2" integration
- Copy the Integration Key (it looks something like:
12345678abcdef0123456789abcdef0
) - Keep this key secure - you'll need it for Versus configuration
Deploy Versus Incident
Now let's deploy Versus with PagerDuty integration. You can use Docker or Kubernetes.
Docker Deployment
Create a directory for your configuration files:
mkdir -p ./config
Create config/config.yaml
with the following content:
name: versus
host: 0.0.0.0
port: 3000
public_host: https://your-ack-host.example
alert:
debug_body: true
slack:
enable: true
token: ${SLACK_TOKEN}
channel_id: ${SLACK_CHANNEL_ID}
template_path: "config/slack_message.tmpl"
oncall:
enable: true
wait_minutes: 3
provider: pagerduty
pagerduty:
routing_key: ${PAGERDUTY_ROUTING_KEY} # The Integration Key from step 5
other_routing_keys:
infra: ${PAGERDUTY_OTHER_ROUTING_KEY_INFRA}
app: ${PAGERDUTY_OTHER_ROUTING_KEY_APP}
db: ${PAGERDUTY_OTHER_ROUTING_KEY_DB}
redis: # Required for on-call functionality
insecure_skip_verify: true # dev only
host: ${REDIS_HOST}
port: ${REDIS_PORT}
password: ${REDIS_PASSWORD}
db: 0
Create a Slack template in config/slack_message.tmpl
:
🔥 *{{ .commonLabels.severity | upper }} Alert: {{ .commonLabels.alertname }}*
🌐 *Instance*: `{{ .commonLabels.instance }}`
🚨 *Status*: `{{ .status }}`
{{ range .alerts }}
📝 {{ .annotations.description }}
{{ end }}
{{ if .AckURL }}
----------
<{{.AckURL}}|Click here to acknowledge>
{{ end }}
About the ACK URL Generation
- When an incident is created (e.g., via a POST to
/api/incidents
), Versus generates an acknowledgment URL if on-call is enabled. - The URL is constructed using the
public_host
value:https://your-host.example/api/incidents/ack/<incident-id>
. - This URL is injected into the alert data as
.AckURL
for use in templates.
Create the docker-compose.yml
file:
version: '3.8'
services:
versus:
image: ghcr.io/versuscontrol/versus-incident
ports:
- "3000:3000"
environment:
- SLACK_TOKEN=your_slack_token
- SLACK_CHANNEL_ID=your_channel_id
- PAGERDUTY_ROUTING_KEY=your_pagerduty_integration_key
- REDIS_HOST=redis
- REDIS_PORT=6379
- REDIS_PASSWORD=your_redis_password
depends_on:
- redis
redis:
image: redis:6.2-alpine
command: redis-server --requirepass your_redis_password
ports:
- "6379:6379"
volumes:
- redis_data:/data
volumes:
redis_data:
Run Docker Compose:
docker-compose up -d
Alert Manager Routing Configuration
Now, let's configure Alert Manager to route alerts to Versus with different behaviors:
Send Alert Only (No On-Call)
receivers:
- name: 'versus-no-oncall'
webhook_configs:
- url: 'http://versus-service:3000/api/incidents?oncall_enable=false'
send_resolved: false
route:
receiver: 'versus-no-oncall'
group_by: ['alertname', 'service']
routes:
- match:
severity: warning
receiver: 'versus-no-oncall'
Send Alert with Acknowledgment Wait
receivers:
- name: 'versus-with-ack'
webhook_configs:
- url: 'http://versus-service:3000/api/incidents?oncall_wait_minutes=5'
send_resolved: false
route:
routes:
- match:
severity: critical
receiver: 'versus-with-ack'
This configuration waits 5 minutes for acknowledgment before triggering PagerDuty if the user doesn't click the ACK link in Slack.
Send Alert with Immediate On-Call Trigger
receivers:
- name: 'versus-immediate'
webhook_configs:
- url: 'http://versus-service:3000/api/incidents?oncall_wait_minutes=0'
send_resolved: false
route:
routes:
- match:
severity: urgent
receiver: 'versus-immediate'
This will trigger PagerDuty immediately without waiting.
Override the PagerDuty Routing Key per Alert
You can configure Alert Manager to use different PagerDuty services for specific alerts by using named routing keys instead of exposing sensitive routing keys directly in URLs:
receivers:
- name: 'versus-pagerduty-infra'
webhook_configs:
- url: 'http://versus-service:3000/api/incidents?pagerduty_other_routing_key=infra'
send_resolved: false
route:
routes:
- match:
team: infrastructure
receiver: 'versus-pagerduty-infra'
This routes infrastructure team alerts to a different PagerDuty service using the named routing key "infra", which is securely mapped to the actual integration key in your configuration file:
oncall:
provider: pagerduty
pagerduty:
routing_key: ${PAGERDUTY_ROUTING_KEY}
other_routing_keys:
infra: ${PAGERDUTY_OTHER_ROUTING_KEY_INFRA}
app: ${PAGERDUTY_OTHER_ROUTING_KEY_APP}
db: ${PAGERDUTY_OTHER_ROUTING_KEY_DB}
This approach keeps your sensitive PagerDuty integration keys secure by never exposing them in URLs or logs.
Testing the Integration
Let's test the complete workflow:
-
Trigger an Alert:
- Simulate a critical alert in Prometheus to match the Alert Manager rule.
-
Verify Versus:
- Check that Versus receives the alert and sends it to Slack.
- You should see a message with an acknowledgment link.
-
Check Escalation:
- Option 1: Click the ACK link to acknowledge the incident - PagerDuty should not be notified.
- Option 2: Wait for the acknowledgment timeout (e.g., 5 minutes) without clicking the link.
- In PagerDuty, verify that an incident is created and the on-call person is notified.
- Confirm that escalation happens according to your policy if the incident remains unacknowledged.
-
Immediate Trigger Test:
- Send an urgent alert and confirm that PagerDuty is triggered instantly.
How It Works Under the Hood
When Versus integrates with PagerDuty, the following occurs:
- Versus receives an alert from Alert Manager
- If on-call is enabled and the acknowledgment period passes without an ACK, Versus:
- Constructs a PagerDuty Events API v2 payload
- Sends a "trigger" event to PagerDuty with your routing key
- Includes incident details as custom properties
The PagerDuty service processes this event according to your escalation policy, notifying the appropriate on-call personnel.
Conclusion
You've now integrated Versus Incident with PagerDuty for on-call management! Alerts from Prometheus Alert Manager can trigger notifications via Versus, with escalations handled by PagerDuty based on your escalation policy.
This integration provides:
- A delay period for engineers to acknowledge incidents before escalation
- Slack notifications with one-click acknowledgment
- Structured escalation through PagerDuty's robust notification system
- Multiple notification channels to ensure critical alerts reach the right people
Adjust configurations as needed for your environment and incident response processes. If you encounter any issues or have further questions, feel free to reach out!