Configuration

Centralised reference for config/config.yaml, environment variables, and per-request overrides used by Versus Incident.

A sample configuration file is located at config/config.yaml:

name: versus
host: 0.0.0.0
port: 3000
public_host: https://your-ack-host.example # Required for on-call ack & dashboard links

# Shared secret required for ALL admin endpoints (`/api/admin/*` and
# `/api/agent/*`) and the embedded admin dashboard. Sent by clients
# (and the dashboard) in the `X-Gateway-Secret` header. When empty,
# admin endpoints are not registered and the agent refuses to start.
gateway_secret: ${GATEWAY_SECRET}

# Storage backend used by BOTH the agent (catalog, shadow log, services)
# and the incident service (history shown in the UI). Only `file` is
# implemented today; `redis` and `database` are config stubs.
storage:
  type: file              # file | redis | database (env: STORAGE_TYPE)
  file:
    data_dir: ./data
    max_incidents: 1000   # rolling cap on persisted incidents

# Optional global proxy applied per-channel via `use_proxy: true` below
# (Telegram, Viber, Lark). Unset to disable.
proxy:
  url: ${PROXY_URL}           # HTTP/HTTPS/SOCKS5, e.g. http://proxy.example.com:8080
  username: ${PROXY_USERNAME}
  password: ${PROXY_PASSWORD}

alert:
  debug_body: true  # Default value, will be overridden by DEBUG_BODY env var

  slack:
    enable: false  # Default value, will be overridden by SLACK_ENABLE env var
    token: ${SLACK_TOKEN}            # From environment
    channel_id: ${SLACK_CHANNEL_ID}  # From environment
    template_path: "config/slack_message.tmpl"
    message_properties:
      button_text: "Acknowledge Alert" # Custom text for the acknowledgment button
      button_style: "primary" # Button style: "primary" (default blue), "danger" (red), or empty for default gray
      disable_button: false # Set to true to disable the button, if you want to handle acknowledgment differently

  telegram:
    enable: false  # Default value, will be overridden by TELEGRAM_ENABLE env var
    bot_token: ${TELEGRAM_BOT_TOKEN} # From environment
    chat_id: ${TELEGRAM_CHAT_ID} # From environment
    template_path: "config/telegram_message.tmpl"
    use_proxy: false # Set to true to use the global proxy block above

  viber:
    enable: false  # Default value, will be overridden by VIBER_ENABLE env var
    api_type: ${VIBER_API_TYPE} # From environment - "channel" (default) or "bot"
    bot_token: ${VIBER_BOT_TOKEN} # From environment (token for bot or channel)
    # Channel API (recommended for incident management)
    channel_id: ${VIBER_CHANNEL_ID} # From environment (required for channel API)
    # Bot API (for individual user notifications)
    user_id: ${VIBER_USER_ID} # From environment (required for bot API)
    template_path: "config/viber_message.tmpl"
    use_proxy: false

  email:
    enable: false # Default value, will be overridden by EMAIL_ENABLE env var
    smtp_host: ${SMTP_HOST} # From environment
    smtp_port: ${SMTP_PORT} # From environment
    username: ${SMTP_USERNAME} # From environment
    password: ${SMTP_PASSWORD} # From environment
    to: ${EMAIL_TO} # From environment
    subject: ${EMAIL_SUBJECT} # From environment
    template_path: "config/email_message.tmpl"

  msteams:
    enable: false # Default value, will be overridden by MSTEAMS_ENABLE env var
    power_automate_url: ${MSTEAMS_POWER_AUTOMATE_URL} # Power Automate HTTP trigger URL (required)
    template_path: "config/msteams_message.tmpl"
    other_power_urls: # Optional: Define additional Power Automate URLs for multiple MS Teams channels
      qc: ${MSTEAMS_OTHER_POWER_URL_QC} # Power Automate URL for QC team
      ops: ${MSTEAMS_OTHER_POWER_URL_OPS} # Power Automate URL for Ops team
      dev: ${MSTEAMS_OTHER_POWER_URL_DEV} # Power Automate URL for Dev team

  lark:
    enable: false # Default value, will be overridden by LARK_ENABLE env var
    webhook_url: ${LARK_WEBHOOK_URL} # Lark webhook URL (required)
    template_path: "config/lark_message.tmpl"
    use_proxy: false
    other_webhook_urls: # Optional: Enable overriding the default webhook URL using query parameters, eg /api/incidents?lark_other_webhook_url=dev
      dev: ${LARK_OTHER_WEBHOOK_URL_DEV}
      prod: ${LARK_OTHER_WEBHOOK_URL_PROD}

queue:
  enable: true
  debug_body: true

  # AWS SNS
  sns:
    enable: false
    https_endpoint_subscription_path: /sns # URI to receive SNS messages, e.g. ${host}:${port}/sns or ${https_endpoint_subscription}/sns
    # Options If you want to automatically create an sns subscription
    https_endpoint_subscription: ${SNS_HTTPS_ENDPOINT_SUBSCRIPTION} # If the user configures an HTTPS endpoint, then an SNS subscription will be automatically created, e.g. https://your-domain.com
    topic_arn: ${SNS_TOPIC_ARN}

  # AWS SQS
  sqs:
    enable: false
    queue_url: ${SQS_QUEUE_URL}

  # GCP Pub Sub (config stub — not yet implemented)
  pubsub:
    enable: false

  # Azure Service Bus (config stub — not yet implemented)
  azbus:
    enable: false

oncall:
  ### Enable overriding using query parameters
  # /api/incidents?oncall_enable=false => Set to `true` or `false` to enable or disable on-call for a specific alert
  # /api/incidents?oncall_wait_minutes=0 => Set the number of minutes to wait for acknowledgment before triggering on-call. Set to `0` to trigger immediately
  initialized_only: true  # Initialize on-call feature but don't enable by default; use query param oncall_enable=true to enable for specific requests
  enable: false # Use this to enable or disable on-call for all alerts
  wait_minutes: 3 # If you set it to 0, it means there's no need to check for an acknowledgment, and the on-call will trigger immediately
  provider: aws_incident_manager # Valid values: "aws_incident_manager" or "pagerduty"

  aws_incident_manager: # Used when provider is "aws_incident_manager"
    response_plan_arn: ${AWS_INCIDENT_MANAGER_RESPONSE_PLAN_ARN}
    other_response_plan_arns: # Optional: Enable overriding the default response plan ARN using query parameters, eg /api/incidents?awsim_other_response_plan=prod
      prod: ${AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_PROD}
      dev: ${AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_DEV}
      staging: ${AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_STAGING}

  pagerduty: # Used when provider is "pagerduty"
    routing_key: ${PAGERDUTY_ROUTING_KEY} # Integration/Routing key for Events API v2 (REQUIRED)
    other_routing_keys: # Optional: Enable overriding the default routing key using query parameters, eg /api/incidents?pagerduty_other_routing_key=infra
      infra: ${PAGERDUTY_OTHER_ROUTING_KEY_INFRA}
      app: ${PAGERDUTY_OTHER_ROUTING_KEY_APP}
      db: ${PAGERDUTY_OTHER_ROUTING_KEY_DB}

redis: # Required for on-call functionality and the AI agent
  insecure_skip_verify: true # dev only
  host: ${REDIS_HOST}
  port: ${REDIS_PORT}
  password: ${REDIS_PASSWORD}
  db: 0

# -----------------------------------------------------------------------------
# AI agent (training | shadow | detect) — opt-in.
# When agent.enable=false (the default) nothing extra runs.
# Source list lives in a separate file (sources_path).
# -----------------------------------------------------------------------------
agent:
  enable: false              # master switch (env: AGENT_ENABLE)
  mode: training             # training | shadow | detect (env: AGENT_MODE)
  poll_interval: 30s         # how often each source is pulled
  lookback: 5m               # initial backfill window on startup
  batch_max: 5000            # safety cap per tick
  signal_max_bytes: 65536    # cap on Signal.Raw

  # Path to the YAML file listing log sources (resolved relative to this
  # config file). Override via env: AGENT_SOURCES_PATH.
  sources_path: ./agent_sources.yaml

  # Grace period for newly discovered services in shadow/detect mode.
  # During grace, signals are observed and clustered but never surfaced
  # as would-have-alerted (shadow) or sent to the AI analyzer (detect).
  # Set to "0" to disable.
  new_service_grace: 30m

  # Regexes used to extract a service name from each log message. The
  # first capture group of the first matching pattern wins. Empty list
  # disables service detection (everything attributed to "_unknown").
  service_patterns:
    - '(?i)\bservice[._-]?name["\s:=]+"?([A-Za-z0-9._-]+)'
    - '(?i)\b(?:service|svc|app|component)\s*=\s*"?([A-Za-z0-9._-]+)'
    - '(?i)"(?:service|svc|app|component)"\s*:\s*"([A-Za-z0-9._-]+)"'
    - '\[([A-Za-z0-9._-]+)\]'

  redaction:
    enable: true
    redact_ips: false        # IPs are usually useful context; opt-in
    extra_patterns:
      - "(?i)password=\\S+"
      - "Authorization:\\s*Bearer\\s+\\S+"

  catalog:
    persist_interval: 30s
    auto_promote_after: 50    # in detect mode, this many sightings = "known"
    # Spike detection: a known pattern is re-flagged when its tick-level
    # frequency exceeds the EWMA baseline by `spike_multiplier`.
    spike_multiplier: 5.0
    spike_min_frequency: 5
    spike_min_baseline_count: 20

  miner:
    similarity_threshold: 0.4
    tree_depth: 4
    max_children: 100

  regex:
    # Pre-filter: only signals matching at least one rule (named or
    # default) are forwarded to the miner. Set to ".*" to train on
    # every line, or leave empty to require an explicit named match.
    default_pattern: "(?i).*error.*"
    rules:
      - name: oom-killer
        pattern: "Out of memory: Killed process"
      - name: panic
        pattern: "(?i)panic:"
      - name: 5xx-burst
        pattern: "HTTP/[0-9.]+\\s+5\\d\\d"

  # AI analyzer — used in detect mode to assess unknown/spiking patterns.
  ai:
    enable: false                     # master switch (env: AGENT_AI_ENABLE)
    base_url: ${AGENT_AI_BASE_URL}    # OpenAI-compatible chat/completions endpoint
    api_key: ${AGENT_AI_API_KEY}
    model: "gpt-4o-mini"
    temperature: 0.2
    max_tokens: 512
    max_calls_per_hour: 60            # 0 = unlimited
    cache_ttl: "1h"

The runtime list of agent sources lives in the file referenced by agent.sources_path (default ./agent_sources.yaml):

sources:
  - name: prod-app
    type: elasticsearch
    enable: true
    elasticsearch:
      addresses:
        - https://es.example.internal:9200
      username: ${ES_USERNAME}
      password: ${ES_PASSWORD}
      index: "logs-app-*"
      time_field: "@timestamp"
      query: 'log.level:(error OR warn)'
      message_field: message
      page_size: 500

  - name: sample-app
    type: file
    enable: true
    file:
      path: ./local/resource/sample-app.log
      format: text
      from_beginning: true

Environment Variables

The application relies on several environment variables to configure alerting services. Below is an explanation of each variable:

Common

VariableDescription
DEBUG_BODYSet to true to enable print body send to Versus Incident.

Admin & Gateway

VariableDescription
GATEWAY_SECRETShared secret required to access the admin dashboard and every /api/admin/* and /api/agent/* endpoint. Sent by clients in the X-Gateway-Secret header. When unset the admin endpoints are not registered at all.

Storage

VariableDescription
STORAGE_TYPEStorage backend for incidents and agent state. One of file (default and the only implemented backend today), redis, database.
STORAGE_FILE_DATA_DIRDirectory for the file backend. Default ./data. Files written: incidents.json, patterns.json, shadow.json.

Slack Configuration

VariableDescription
SLACK_ENABLESet to true to enable Slack notifications.
SLACK_TOKENThe authentication token for your Slack bot.
SLACK_CHANNEL_IDThe ID of the Slack channel where alerts will be sent. Can be overridden per request using the slack_channel_id query parameter.

Slack also supports interactive acknowledgment buttons that can be configured using the following properties in the config.yaml file:

alert:
  slack:
    # ...other slack configuration...
    message_properties:
      button_text: "Acknowledge Alert" # Custom text for the acknowledgment button
      button_style: "primary" # Button style: "primary" (default blue), "danger" (red), or empty for default gray
      disable_button: false # Set to true to disable the button, if you want to handle acknowledgment differently

These properties allow you to:

  • Customize the text of the acknowledgment button (button_text)
  • Change the style of the button (button_style) - options are "primary" (blue), "danger" (red), or leave empty for default gray
  • Disable the interactive button entirely (disable_button) if you want to handle acknowledgment through other means

Telegram Configuration

VariableDescription
TELEGRAM_ENABLESet to true to enable Telegram notifications.
TELEGRAM_BOT_TOKENThe authentication token for your Telegram bot.
TELEGRAM_CHAT_IDThe chat ID where alerts will be sent. Can be overridden per request using the telegram_chat_id query parameter.

Viber Configuration

Viber supports two types of API integrations:

  • Channel API (default): Send messages to Viber channels for team notifications
  • Bot API: Send messages to individual users for personal notifications

When to use Channel API:

  • ✅ Broadcasting to team channels
  • ✅ Public incident notifications
  • ✅ Automated system alerts
  • ✅ Better for most incident management scenarios
  • ✅ No individual user setup required

When to use Bot API:

  • ✅ Personal notifications to specific users
  • ✅ Direct messaging for individual alerts
  • ⚠️ Limited to individual users only
  • ⚠️ Requires users to interact with bot first
  • ⚠️ User IDs can be hard to obtain
VariableDescription
VIBER_ENABLESet to true to enable Viber notifications.
VIBER_BOT_TOKENThe authentication token for your Viber bot or channel.
VIBER_API_TYPEAPI type: "channel" (default) for team notifications or "bot" for individual messaging.
VIBER_CHANNEL_IDThe channel ID where alerts will be posted (required for channel API). Can be overridden per request using the viber_channel_id query parameter.
VIBER_USER_IDThe user ID where alerts will be sent (required for bot API). Can be overridden per request using the viber_user_id query parameter.

Email Configuration

VariableDescription
EMAIL_ENABLESet to true to enable email notifications.
SMTP_HOSTThe SMTP server hostname (e.g., smtp.gmail.com).
SMTP_PORTThe SMTP server port (e.g., 587 for TLS).
SMTP_USERNAMEThe username/email for SMTP authentication.
SMTP_PASSWORDThe password or app-specific password for SMTP authentication.
EMAIL_TOThe recipient email address(es) for incident notifications. Can be multiple addresses separated by commas. Can be overridden per request using the email_to query parameter.
EMAIL_SUBJECTThe subject line for email notifications. Can be overridden per request using the email_subject query parameter.

Microsoft Teams Configuration

The Microsoft Teams integration now supports both legacy Office 365 webhooks and modern Power Automate workflows with a single configuration option:

alert:
  msteams:
    enable: true
    power_automate_url: ${MSTEAMS_POWER_AUTOMATE_URL}
    template_path: "config/msteams_message.tmpl"

Automatic URL Detection (April 2025 Update)

As of the April 2025 update, Versus Incident automatically detects the type of URL provided in the power_automate_url setting:

  • Legacy Office 365 Webhook URLs: If the URL contains "webhook.office.com" (e.g., https://yourcompany.webhook.office.com/...), the system will use the legacy format with a simple "text" field containing your rendered Markdown.

  • Power Automate Workflow URLs: For newer Power Automate HTTP trigger URLs, the system converts your Markdown template to an Adaptive Card with rich formatting features.

This automatic detection provides backward compatibility while supporting newer features, eliminating the need for separate configuration options.

VariableDescription
MSTEAMS_ENABLESet to true to enable Microsoft Teams notifications.
MSTEAMS_POWER_AUTOMATE_URLThe Power Automate HTTP trigger URL for your Teams channel. Automatically works with both Power Automate workflow URLs and legacy Office 365 webhooks.
MSTEAMS_OTHER_POWER_URL_QC(Optional) Power Automate URL for the QC team channel. Can be selected per request using the msteams_other_power_url=qc query parameter.
MSTEAMS_OTHER_POWER_URL_OPS(Optional) Power Automate URL for the Ops team channel. Can be selected per request using the msteams_other_power_url=ops query parameter.
MSTEAMS_OTHER_POWER_URL_DEV(Optional) Power Automate URL for the Dev team channel. Can be selected per request using the msteams_other_power_url=dev query parameter.

Lark Configuration

VariableDescription
LARK_ENABLESet to true to enable Lark notifications.
LARK_WEBHOOK_URLThe webhook URL for your Lark channel.
LARK_OTHER_WEBHOOK_URL_DEV(Optional) Webhook URL for the development team. Can be selected per request using the lark_other_webhook_url=dev query parameter.
LARK_OTHER_WEBHOOK_URL_PROD(Optional) Webhook URL for the production team. Can be selected per request using the lark_other_webhook_url=prod query parameter.

Queue Services Configuration

VariableDescription
SNS_ENABLESet to true to enable receive Alert Messages from SNS.
SNS_HTTPS_ENDPOINT_SUBSCRIPTIONThis specifies the HTTPS endpoint to which SNS sends messages. When an HTTPS endpoint is configured, an SNS subscription is automatically created. If no endpoint is configured, you must create the SNS subscription manually using the CLI or AWS Console. E.g. https://your-domain.com.
SNS_TOPIC_ARNAWS ARN of the SNS topic to subscribe to.
SQS_ENABLESet to true to enable receive Alert Messages from AWS SQS.
SQS_QUEUE_URLURL of the AWS SQS queue to receive messages from.

On-Call Configuration

VariableDescription
ONCALL_ENABLESet to true to enable on-call functionality for all incidents by default. Can be overridden per request using the oncall_enable query parameter.
ONCALL_INITIALIZED_ONLYSet to true to initialize on-call feature but keep it disabled by default. When set to true, on-call is triggered only for requests that explicitly include ?oncall_enable=true in the URL.
ONCALL_WAIT_MINUTESTime in minutes to wait for acknowledgment before escalating (default: 3). Can be overridden per request using the oncall_wait_minutes query parameter.
ONCALL_PROVIDERSpecify the on-call provider to use ("aws_incident_manager" or "pagerduty").
AWS_INCIDENT_MANAGER_RESPONSE_PLAN_ARNThe ARN of the AWS Incident Manager response plan to use for on-call escalations. Required if on-call provider is "aws_incident_manager".
AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_PROD(Optional) AWS Incident Manager response plan ARN for production environment. Can be selected per request using the awsim_other_response_plan=prod query parameter.
AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_DEV(Optional) AWS Incident Manager response plan ARN for development environment. Can be selected per request using the awsim_other_response_plan=dev query parameter.
AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_STAGING(Optional) AWS Incident Manager response plan ARN for staging environment. Can be selected per request using the awsim_other_response_plan=staging query parameter.
PAGERDUTY_ROUTING_KEYIntegration/Routing key for PagerDuty Events API v2. Required if on-call provider is "pagerduty".
PAGERDUTY_OTHER_ROUTING_KEY_INFRA(Optional) PagerDuty routing key for feature team. Can be selected per request using the pagerduty_other_routing_key=infra query parameter.
PAGERDUTY_OTHER_ROUTING_KEY_APP(Optional) PagerDuty routing key for application team. Can be selected per request using the pagerduty_other_routing_key=app query parameter.
PAGERDUTY_OTHER_ROUTING_KEY_DB(Optional) PagerDuty routing key for database team. Can be selected per request using the pagerduty_other_routing_key=db query parameter.

Enabling On-Call for Specific Incidents with initialized_only

When you have initialized_only: true in your configuration (rather than enable: true), on-call is only triggered for incidents that explicitly request it. This is useful when:

  1. You want the on-call feature ready but not active for all alerts
  2. You need to selectively enable on-call only for high-priority services or incidents
  3. You want to let your monitoring system decide which alerts should trigger on-call

Example configuration:

oncall:
  enable: false
  initialized_only: true  # feature ready but not active by default
  wait_minutes: 3
  provider: aws_incident_manager
  # ... provider configuration ...

With this configuration, on-call is only triggered when requested via query parameter:

# This alert will send notifications but NOT trigger on-call escalation
curl -X POST "http://localhost:3000/api/incidents" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[WARNING] Non-critical database latency increase.",
    "ServiceName": "database-monitoring",
    "UserID": "U12345"
  }'

# This alert WILL trigger on-call escalation because of the query parameter
curl -X POST "http://localhost:3000/api/incidents?oncall_enable=true" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[CRITICAL] Production database is down.",
    "ServiceName": "core-database",
    "UserID": "U12345"
  }'

Understanding On-Call Modes:

ModeConfigurationBehavior
Disabledenable: false
initialized_only: false
On-call feature is not initialized. No on-call functionality is available.
Always Enabledenable: trueOn-call is active for all incidents by default. Can be disabled per request with ?oncall_enable=false.
Opt-In Onlyenable: false
initialized_only: true
On-call feature is initialized but inactive by default. Must be explicitly enabled per request with ?oncall_enable=true.

Redis Configuration

VariableDescription
REDIS_HOSTThe hostname or IP address of the Redis server. Required if on-call is enabled.
REDIS_PORTThe port number of the Redis server. Required if on-call is enabled.
REDIS_PASSWORDThe password for authenticating with the Redis server. Required if on-call is enabled and Redis requires authentication.

AI Agent Configuration

VariableDescription
AGENT_ENABLESet to true to start the AI agent worker. When false (default) no agent goroutines, files, or Redis keys are created.
AGENT_MODEWorker mode: training (observe and learn only), shadow (classify and log "would-have-alerted" events), or detect (classify + emit).
AGENT_SOURCES_PATHPath to the YAML file listing the agent's log sources. Resolved relative to the main config file. Default ./agent_sources.yaml.
AGENT_NEW_SERVICE_GRACEDuration a newly discovered service stays in implicit training before detect-mode AI analysis begins (e.g. 30m). 0 disables the grace window.
AGENT_SERVICE_PATTERNSComma-separated list of regexes used to extract the service name from each log line. Each pattern must contain at least one capture group. Overrides the YAML list when set.
AGENT_AI_ENABLESet to true to call the configured LLM in detect mode. When false, detect mode classifies but never calls the model (dry-run).
AGENT_AI_BASE_URLOpenAI-compatible chat/completions endpoint, e.g. https://api.openai.com/v1.
AGENT_AI_API_KEYBearer token sent in the Authorization header when calling the LLM.
AGENT_AI_MODELModel identifier, e.g. gpt-4o-mini.

The agent also requires the root-level GATEWAY_SECRET (see Admin & Gateway) and the root-level redis block — Redis is used to remember per-source cursors so the agent resumes from where it left off after a restart.

Ensure these environment variables are properly set before running the application.

Dynamic Configuration with Query Parameters

We provide a way to overwrite configuration values using query parameters, allowing you to send alerts to different channels and customize notification behavior on a per-request basis.

Query ParameterDescription
slack_channel_idThe ID of the Slack channel where alerts will be sent. Use: /api/incidents?slack_channel_id=<your_value>.
telegram_chat_idThe chat ID where Telegram alerts will be sent. Use: /api/incidents?telegram_chat_id=<your_chat_id>.
viber_channel_idThe channel ID where Viber alerts will be posted (for Channel API). Use: /api/incidents?viber_channel_id=<your_channel_id>.
viber_user_idThe user ID where Viber alerts will be sent (for Bot API). Use: /api/incidents?viber_user_id=<your_user_id>.
email_toOverrides the default recipient email address for email notifications. Use: /api/incidents?email_to=<recipient_email>.
email_subjectOverrides the default subject line for email notifications. Use: /api/incidents?email_subject=<custom_subject>.
msteams_other_power_urlOverrides the default Microsoft Teams Power Automate flow by specifying an alternative key (e.g., qc, ops, dev). Use: /api/incidents?msteams_other_power_url=qc.
lark_other_webhook_urlOverrides the default Lark webhook URL by specifying an alternative key (e.g., dev, prod). Use: /api/incidents?lark_other_webhook_url=dev.
oncall_enableSet to true or false to enable or disable on-call for a specific alert. Use: /api/incidents?oncall_enable=false.
oncall_wait_minutesSet the number of minutes to wait for acknowledgment before triggering on-call. Set to 0 to trigger immediately. Use: /api/incidents?oncall_wait_minutes=0.
awsim_other_response_planOverrides the default AWS Incident Manager response plan ARN by specifying an alternative key (e.g., prod, dev, staging). Use: /api/incidents?awsim_other_response_plan=prod.
pagerduty_other_routing_keyOverrides the default PagerDuty routing key by specifying an alternative key (e.g., infra, app, db). Use: /api/incidents?pagerduty_other_routing_key=infra.

Examples for Each Query Parameter

Slack Channel Override

To send an alert to a specific Slack channel (e.g., a dedicated channel for database issues):

curl -X POST "http://localhost:3000/api/incidents?slack_channel_id=C01DB2ISSUES" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[ERROR] Database connection pool exhausted.",
    "ServiceName": "database-service",
    "UserID": "U12345"
  }'

Telegram Chat Override

To send an alert to a different Telegram chat (e.g., for network monitoring):

curl -X POST "http://localhost:3000/api/incidents?telegram_chat_id=-1001234567890" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[ERROR] Network latency exceeding thresholds.",
    "ServiceName": "network-monitor",
    "UserID": "U12345"
  }'

Viber Channel Override

To send an alert to a specific Viber channel (recommended for team notifications):

curl -X POST "http://localhost:3000/api/incidents?viber_channel_id=01234567890A=" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[ERROR] Mobile service experiencing high error rates.",
    "ServiceName": "mobile-api",
    "UserID": "U12345"
  }'

Viber User Override

To send an alert to a specific Viber user (for individual notifications):

curl -X POST "http://localhost:3000/api/incidents?viber_user_id=01234567890A=" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[ERROR] Personal alert for mobile service issue.",
    "ServiceName": "mobile-api",
    "UserID": "U12345"
  }'

Email Recipient Override

To send an email alert to a specific recipient with a custom subject:

curl -X POST "http://localhost:3000/api/incidents?email_to=network-team@yourdomain.com&email_subject=Urgent%20Network%20Issue" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[ERROR] Load balancer failing health checks.",
    "ServiceName": "load-balancer",
    "UserID": "U12345"
  }'

Microsoft Teams Channel Override

You can configure multiple Microsoft Teams channels using the other_power_urls setting:

alert:
  msteams:
    enable: true
    power_automate_url: ${MSTEAMS_POWER_AUTOMATE_URL}
    template_path: "config/msteams_message.tmpl"
    other_power_urls:
      qc: ${MSTEAMS_OTHER_POWER_URL_QC}
      ops: ${MSTEAMS_OTHER_POWER_URL_OPS}
      dev: ${MSTEAMS_OTHER_POWER_URL_DEV}

Then, to send an alert to the QC team's Microsoft Teams channel:

curl -X POST "http://localhost:3000/api/incidents?msteams_other_power_url=qc" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[ERROR] Quality check failed for latest deployment.",
    "ServiceName": "quality-service",
    "UserID": "U12345"
  }'

Lark Webhook Override

You can configure multiple Lark webhook URLs using the other_webhook_urls setting:

alert:
  lark:
    enable: true
    webhook_url: ${LARK_WEBHOOK_URL}
    template_path: "config/lark_message.tmpl"
    other_webhook_urls:
      dev: ${LARK_OTHER_WEBHOOK_URL_DEV}
      prod: ${LARK_OTHER_WEBHOOK_URL_PROD}

Then, to send an alert to the development team's Lark channel:

curl -X POST "http://localhost:3000/api/incidents?lark_other_webhook_url=dev" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[ERROR] Development server crash detected.",
    "ServiceName": "dev-server",
    "UserID": "U12345"
  }'

On-Call Controls

To disable on-call escalation for a non-critical alert:

curl -X POST "http://localhost:3000/api/incidents?oncall_enable=false" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[WARNING] This is a minor issue that doesn't require on-call response.",
    "ServiceName": "monitoring-service",
    "UserID": "U12345"
  }'

To trigger on-call immediately without the normal wait period for a critical issue:

curl -X POST "http://localhost:3000/api/incidents?oncall_wait_minutes=0" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[CRITICAL] Payment processing system down.",
    "ServiceName": "payment-service",
    "UserID": "U12345"
  }'

AWS Incident Manager Response Plan Override

You can configure multiple AWS Incident Manager response plans using the other_response_plan_arns setting:

oncall:
  enable: true
  wait_minutes: 3
  provider: aws_incident_manager
  
  aws_incident_manager:
    response_plan_arn: ${AWS_INCIDENT_MANAGER_RESPONSE_PLAN_ARN}  # Default response plan
    other_response_plan_arns:
      prod: ${AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_PROD}  # Production environment
      dev: ${AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_DEV}    # Development environment
      staging: ${AWS_INCIDENT_MANAGER_OTHER_RESPONSE_PLAN_ARN_STAGING}  # Staging environment

Then, to use a specific AWS Incident Manager response plan for a production environment issue:

curl -X POST "http://localhost:3000/api/incidents?awsim_other_response_plan=prod" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[CRITICAL] Production database cluster failure.",
    "ServiceName": "prod-database",
    "UserID": "U12345"
  }'

PagerDuty Routing Key Override

You can configure multiple PagerDuty routing keys using the other_routing_keys setting:

oncall:
  enable: true
  wait_minutes: 3
  provider: pagerduty
  
  pagerduty:
    routing_key: ${PAGERDUTY_ROUTING_KEY}  # Default routing key
    other_routing_keys:
      infra: ${PAGERDUTY_OTHER_ROUTING_KEY_INFRA}  # Infrastructure team
      app: ${PAGERDUTY_OTHER_ROUTING_KEY_APP}      # Application team
      db: ${PAGERDUTY_OTHER_ROUTING_KEY_DB}        # Database team

Then, to use a specific PagerDuty routing key for the infrastructure team:

curl -X POST "http://localhost:3000/api/incidents?pagerduty_other_routing_key=infra" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[ERROR] Server load balancer failure in us-west-2.",
    "ServiceName": "infrastructure",
    "UserID": "U12345"
  }'

Combining Multiple Parameters

You can combine multiple query parameters to customize exactly how an incident is handled:

curl -X POST "http://localhost:3000/api/incidents?slack_channel_id=C01PROD&telegram_chat_id=-987654321&oncall_enable=true&oncall_wait_minutes=1" \
  -H "Content-Type: application/json" \
  -d '{
    "Logs": "[CRITICAL] Multiple service failures detected in production environment.",
    "ServiceName": "core-infrastructure",
    "UserID": "U12345",
    "Severity": "CRITICAL"
  }'

This will:

  1. Send the alert to a specific Slack channel (C01PROD)
  2. Send the alert to a specific Telegram chat (-987654321)
  3. Enable on-call escalation with a shortened 1-minute wait time

SNS Listener

Versus can subscribe to an SNS topic and treat each message as an incoming incident. This is useful for CloudWatch Alarms which can publish to SNS on state changes.

docker run -d \
  -p 3000:3000 \
  -e SLACK_ENABLE=true \
  -e SLACK_TOKEN=your_slack_token \
  -e SLACK_CHANNEL_ID=your_channel_id \
  -e SNS_ENABLE=true \
  -e SNS_TOPIC_ARN=$SNS_TOPIC_ARN \
  -e SNS_HTTPS_ENDPOINT_SUBSCRIPTION=https://your-domain.com \
  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY \
  -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_KEY \
  --name versus \
  ghcr.io/versuscontrol/versus-incident

Test with the AWS CLI:

aws sns publish \
  --topic-arn $SNS_TOPIC_ARN \
  --message '{"ServiceName":"test-service","Logs":"[ERROR] Test error","UserID":"U12345"}' \
  --region $AWS_REGION

A common real-world setup: CloudWatch Alarms → SNS topic → Versus → Slack/Telegram/Email with a custom CloudWatch-aware template.

AI Agent

Versus supports an opt-in AI SRE agent that reads your logs, metrics, and traces, learns what normal looks like, and only alerts you when something new and unexpected appears.

Configuration example with agent features:

name: versus
host: 0.0.0.0
port: 3000

# ... existing alert configurations ...

# Shared secret required for ALL admin endpoints (`/api/admin/*` and
# `/api/agent/*`). Sent by clients in the `X-Gateway-Secret` header.
gateway_secret: ${GATEWAY_SECRET}

# Storage backend for the pattern catalog, shadow log, and incident
# history. Only `file` is implemented today; `redis` and `database`
# are config stubs.
storage:
  type: file              # file | redis | database (env: STORAGE_TYPE)
  file:
    data_dir: ./data
    max_incidents: 1000   # rolling cap on persisted incidents

agent:
  enable: false # Use this to enable or disable the agent for all sources
  mode: training # Valid values: "training", "shadow", or "detect"
  poll_interval: 30s

  # Sources are kept in a separate file so they can be managed independently
  # (e.g. swap fixtures, per-environment lists). Path is resolved relative to
  # this config file. Override via env: AGENT_SOURCES_PATH.
  sources_path: ./agent_sources.yaml

  catalog:
    persist_interval: 30s
    auto_promote_after: 100 # In detect mode, this many sightings = "known"

  redaction:
    enable: true
    redact_ips: false
    extra_patterns: # Optional: extra regex rules to scrub before clustering
      - "(?i)password=\\S+"
      - "Authorization:\\s*Bearer\\s+\\S+"

  miner:
    similarity_threshold: 0.4
    tree_depth: 4
    max_children: 100

  regex:
    # Optional: tag any signal whose message matches this pattern
    # if none of the named rules below hit. Leave empty to disable.
    default_pattern: "(?i)error|exception|fatal|panic"
    # Named rules are tried first, in order. The first match wins.
    rules:
      - name: oom
        pattern: "(?i)out of memory|OOMKilled|java\\.lang\\.OutOfMemoryError"
      - name: db-timeout
        pattern: "(?i)(connection|query) timeout|deadlock detected"
      - name: auth-failure
        pattern: "(?i)401 unauthorized|invalid credentials|permission denied"

redis: # Required for the agent to persist source cursors across restarts
  host: ${REDIS_HOST}
  port: ${REDIS_PORT}
  password: ${REDIS_PASSWORD}
  db: 0

Explanation:

The agent section includes:

  1. enable: Turn the agent on or off (default: false). When disabled, nothing extra runs.
  2. mode: How the agent behaves after it has learned your log patterns:
    • training: observation only — the agent learns patterns and saves them, but sends no alerts.
    • shadow: same as training, but also logs a note every time it would have sent an alert. Good for reviewing before going live.
    • detect: the agent actively sends alerts for any pattern it has never seen before.
  3. poll_interval: How often the agent checks your log sources for new entries.
  4. catalog: Where the agent stores the list of known patterns and how often to write updates. Storage is selected by the root storage: block.

Admin secret. All admin endpoints (/api/admin/* and /api/agent/*) are protected by the root-level gateway_secret (env GATEWAY_SECRET). Set it to any value you choose; clients send the same value in the X-Gateway-Secret header. When no secret is configured the admin endpoints are not registered and the agent refuses to start.

  1. redaction: Rules for automatically removing sensitive information (passwords, tokens, emails, etc.) from logs before the agent processes them.

  2. miner: Controls how aggressively the agent groups similar log lines together. The defaults work well for most setups.

  3. regex: Acts as a pre-filter for the agent. Only signals whose message matches at least one rule (a named entry under rules or default_pattern) are forwarded to the pattern miner and stored in the catalog.

    • Named rules are tried in order; the first match wins and tags the signal with that name (stored as rule_name on the pattern).
    • If no named rule hits, default_pattern is tried. Matches there are tagged with name=default.
    • To learn from every line, set default_pattern: ".*".
    • To filter aggressively, set default_pattern: "" (empty) and rely on your named rules.
  4. sources_path: Path to a separate YAML file that lists the log sources the agent should read from. Resolved relative to the main config file. Override via AGENT_SOURCES_PATH.

The sources file (default ./agent_sources.yaml) has a single top-level sources: list. Each entry needs name, type (file or elasticsearch), enable, plus a matching file: or elasticsearch: block:

sources:
  - name: prod-app
    type: elasticsearch
    enable: true
    elasticsearch:
      addresses:
        - https://es.example.internal:9200
      username: ${ES_USERNAME}
      password: ${ES_PASSWORD}
      index: "logs-app-*"
      time_field: "@timestamp"
      query: 'log.level:(error OR warn)'
      message_field: message
      page_size: 500

  - name: sample-app
    type: file
    enable: true
    file:
      path: ./local/resource/sample-app.log
      format: text
      from_beginning: true

The redis section is required when agent.enable is true. Redis stores the per-source cursor so the agent picks up where it left off after a restart.

For full integration walkthroughs see Enable AI Agent.

On-Call

Versus supports On-Call integrations with AWS Incident Manager and PagerDuty. Configuration example with on-call features:

name: versus
host: 0.0.0.0
port: 3000
public_host: https://your-ack-host.example # Required for on-call ack

# ... existing alert configurations ...

oncall:
  ### Enable overriding using query parameters
  # /api/incidents?oncall_enable=false => Set to `true` or `false` to enable or disable on-call for a specific alert
  # /api/incidents?oncall_wait_minutes=0 => Set the number of minutes to wait for acknowledgment before triggering on-call. Set to `0` to trigger immediately
  enable: false
  wait_minutes: 3 # If you set it to 0, on-call triggers immediately without checking for an acknowledgment

  aws_incident_manager:
    response_plan_arn: ${AWS_INCIDENT_MANAGER_RESPONSE_PLAN_ARN}

redis: # Required for on-call functionality
  insecure_skip_verify: true # dev only
  host: ${REDIS_HOST}
  port: ${REDIS_PORT}
  password: ${REDIS_PASSWORD}
  db: 0

The oncall section includes:

  1. enable: A boolean to toggle on-call functionality for all incidents (default: false).
  2. initialized_only: Initialize the on-call subsystem but keep it disabled by default. With true, on-call is triggered only for requests that explicitly include ?oncall_enable=true.
  3. wait_minutes: Time in minutes to wait for an acknowledgment before escalating (default: 3). Set to 0 to trigger immediately.
  4. provider: Which on-call provider to use ("aws_incident_manager" or "pagerduty").
  5. aws_incident_manager: Configuration for AWS Incident Manager when selected, including response_plan_arn and other_response_plan_arns.
  6. pagerduty: Configuration for PagerDuty when selected, including routing_key and other_routing_keys.

The redis section is required when oncall.enable or oncall.initialized_only is true. It stores the open-incident state needed for ack-or-escalate.

For provider-specific walkthroughs see On-Call setup with Versus.