Get started with incidents

An incident is a record of a service disruption or performance issue. It serves as the central point for alerting, collaboration, and resolution. When a service fails, an incident is created to track its entire lifecycle, from detection to post-mortem.

How are incidents created?

Incidents can be triggered in several ways:

  • Automatically from monitors: When a monitor, heartbeat, or integration detects a problem, an incident is automatically created. This is the most common way incidents are started.
  • Manually by users: Anyone on your team can report an incident manually from the dashboard or a secret URL. This is useful for issues that automated monitoring might not catch, like "payment processing is failing."
  • Via the API: You can programmatically create incidents using our API, allowing for custom integrations with your internal tools.

The incident lifecycle

Every incident follows a clear path from creation to resolution, ensuring your team is always in sync.

1. Started

When an incident is created, we immediately notify the right people based on your on-call schedule and escalation policies.

2. Acknowledged

A team member takes ownership by acknowledging the incident. This action stops further escalations, letting the rest of the team know that someone is working on the issue.

3. Resolved

Once the underlying problem is fixed, the incident is marked as resolved. This can happen automatically (when a monitor check passes again) or be done manually by a team member.

Collaborating on incidents

The incident detail page is your central hub for collaboration.

  • View details: Access critical information like error messages, response bodies, and screenshots to quickly diagnose the problem. Learn more about incident details.
  • Leave comments: Use comments to share updates, ask questions, or coordinate with your team. You can use Markdown and attach files.
  • Write a post-mortem: After an incident is resolved, document what happened, what the impact was, and what steps will be taken to prevent it from happening again. Read our guide on writing post-mortems.

Advanced incident management

As your team grows, you can leverage more advanced features to manage incidents more effectively:

  • Incident Grouping - Automatically group related alerts into a single incident to reduce noise and alert fatigue.
  • Incident Silencing - Use machine learning to silence repetitive, low-impact alerts, so your team can focus on what matters.
  • Incident Metadata - Attach custom data like severity, service, or team ownership to incidents for smarter routing and filtering with escalation policies.
  • Runbooks - Create step-by-step guides within your escalation policies to ensure a consistent and efficient response to common incidents.