Cron and heartbeat monitor

The way heartbeat monitoring works is that you need to periodically make requests to a monitor's unique URL in order for the monitor NOT to create a new incident.

Example usage

We have a background job that makes a daily database backup of our primary PostgreSQL database. We want to get alerted if the background job doesn't run successfully.

Creating a heartbeat monitor:

  • Go to Integrations → Create heartbeat.
  • Name your heartbeat — "Daily database backup".
  • Change the Expect a heartbeat every selection to "1 day".
  • Optionally configure the On-call escalation settings.
  • Click Save heartbeat.
  • Copy the secret URL on the Heartbeat detail page; you will need it later.

This heartbeat will expect us to make a GET or POST request to the URL provided to us every day after the first request.

Set up a CRON and a background job

  1. Add a CRON task that will execute backup_database.sh background job every day at midnight:

    Cron setup for DB backup
    0 0 * * * ruby /home/deploy/backup_database.sh >/dev/null 2>&1
    
  2. Include the curl call to the heartbeat URL at the end of your backup script:

    Example DB backup script with heartbeat call
    #!/usr/bin/env bash
    
    set -o errexit
    set -o xtrace
    
    date=`date "+%Y-%m-%d_%H:%M:%S"`
    file="/dumps/uptime.betterstack.$date.dump"
    
    time dokku postgres:export uptime > "$file"
    
    /usr/local/bin/aws s3 cp "$file" s3://uptime-dbbackups/
    
    rm "$file"
    
    # you get this URL in the Uptime dashboard
    curl "https://uptime.betterstack.com/api/v1/heartbeat/XYZ1234"
    

What happens here is that the Heartbeat URL we've created above expects a GET or POST request every day since having made the first request.

If the code above fails, our background job won't make the request to the Heartbeat URL. In that case, the Heartbeat will alert the current on-call person and create an Incident.

Reporting failures

In addition to raising an incident when an expected heartbeat is missed, you can also trigger a new incident by explicitly reporting a failure when you make a request to the heartbeat url.

To do this, simply add /fail to the end of your URL, like so:

Report failure using cUrl
curl "https://uptime.betterstack.com/api/v1/heartbeat/XYZ1234/fail"

This will immediately raise an incident, that will be automatically resolved when the next 'regular' heartbeat is received.

Exit codes

As well as specifying 'fail', you can also append a program/script exit code to the URL. Any non-zero exit code value will cause an incident to be raised.

To get the last exit code on unix systems, you can use $?. For example:

Send exit code of DB backup script
./run-backups.sh
curl "https://uptime.betterstack.com/api/v1/heartbeat/XYZ1234/$?"

This will execute a successful heartbeat if the exit code is 0 (the backup ran OK), but raise an incident if the backup fails, and the exit code is non-zero.

The above example will not raise a failure if you have set -o errexit in your script, since the script will exit before the curl command.

Capturing output

When reporting a failure, you can send a message in plain-text or json as the body of the request to attach that output to the raised incident. You can combine this with stdout/err capture and our previous exit code example to include the output of your command in the incident when the command fails:

Send exit code and output of DB backup script
backup_output=$(./run-backups.sh 2>&1)
curl -d $backup_output \
  "https://uptime.betterstack.com/api/v1/heartbeat/XYZ1234/$?"