Guides
What is Cron Job Monitoring?

What is Cron Job Monitoring?

Better Stack Team
Updated on May 4, 2022

Cron job or heartbeat monitoring is an automated way of checking whether scheduled tasks run correctly. When a cron job fails the monitor spots the issue and alerts the right person on the development team. If your service performs a vital process periodically, this is the ideal monitoring solution.

In this article, you will learn the following:

  • What is cron job and heartbeat monitoring and do they work.
  • Overview of on-call alerting and incident management process of cron job incidents.
  • What are the best practices for cron job monitoring.
  • What are the benefits and drawbacks of using cron job monitoring.
  • How to set up basic cron job monitoring.

How does cron job monitoring work?

The cron monitoring process works by setting up a remote monitoring service with a dedicated URL to which the scheduled task sends a GET ,HEAD or POST request after it has run correctly. This tracking of a system's health by sending regular requests (heartbeats) is also called heartbeat monitoring. Cron job and heartbeat monitoring are often used interchangeably.

The heartbeat monitor is set up to expect a heartbeat once every x minutes, hours, or days. There is also a grace period that assures that alerting doesn't start immediately if the job is delayed.

When the monitor receives a heartbeat within the pre-set time window, no action is taken, and the monitoring continues. However, when no heartbeat is received when it’s expected, the monitor starts what is called an incident and starts alerting according to the on-call calendar.

What is a cron job incident?

A cron job incident is a period of time during which the given monitor doesn’t receive heartbeats from the monitored service. This situation means that the monitored service didn’t run correctly as all the correct runs send a heartbeat to the monitor before finishing, keeping it from creating an incident.

How to receive cron job incident alerts?

After an incident is spotted by the cron job monitor, it needs to be communicated to the service admins. This process is called incident alerting or on-call alerting. In case of an incident, the person from a team who is currently on-call (has scheduled duty) receives the incident alert.

The most common types of getting alerted by an cron job monitor include automated phone calls, SMS, Slack, and Microsoft Teams messages. Ways of alerting depend on factors like the importance of the monitored service, time of the day, and team preference.

What information do incident alerts include?

The incident alert for cron jobs and hearbeats in general is very basic because the monitoring provides only simple up/down information. Implementing logging into the monitored services and forwarding those logs into a log aggregation tool is great way of getting in-depth insights about any potential scheduled jobs incidents.

Process after receiving an alert? The cron job incident resolution process

After an alert is received, it should be acknowledged immediately. If the alert is not acknowledged in a specified time frame (usually 3 minutes), the person next in line on the on-call duty is alerted. This process could continue further until the whole team is alerted. However, the best practice is to have the on-call schedule set up in a way that the first team member is always ready to solve incoming incidents.

Once the incident is acknowledged the escalation process is paused and the team can fully focus on solving it. The speed by which an alert is acknowledged is called Time to acknowledge (TTA). Its average from different incidents called Mean Time to Acknowledge (MTTA) is a widely used incident management metric.

The following steps in the downtime resolution process are individual to different teams and apps. For larger teams, they can include collaborations between a few developers or even teams of developers, delegations of incidents to dedicated team members, and more. There are some best practices that all teams managing incidents should use. These include incident communication (both internal and external) and incident post-mortems.

What are the best practices for cron job monitoring?

Human alert tolerance

The heartbeat monitor will create an alert whenever it detects an issue. However, if the monitor sends an alert (for example, SMS or email) to all team members about the same incident ten times every day, they will very likely ignore it.

This situation when alerts are ignored or not treated with the necessary care is called Alert fatigue and poses a serious issue. To prevent alert fatigue, only vital services should be connected to the on-call alerting and notify the team immediatelly.

Grace time configuration

Grace time is the short time period after the time the heartbeat was expected when no incident will be started. This prevents delayed jobs from causing incidents and also helps to decrease the possibility of alert fatigue. However, when grace period is too long, it will delay the incident alerting in case of actual incident as well, so it needs to be set up carefully.

Synchronise monitor and cron job timezone

In many cases, your server running cron jobs will not be in the same timezone as the monitoring service. To prevent any timezone differences and faulty alerting, both should have the same time. Command-line utility timedatectl shows the server timezone, and monitors typically offer the option to change timezones, so both can be synced.

Encrypt communication between monitor and cron job

The communication between the service and heartbeat monitor typically uses HTTP GET or POST methods. The cron job usually includes a unique token assigned by the monitor to each request. The token is an authorisation measure. Without an authorisation token, anyone can send a fake heartbeat and your monitor won't detect an incident. However, the cron job must use TLS encryption (HTTPS). Otherwise, anyone on the Internet can capture your authorisation token.

What are the main benefits and drawbacks of cron job monitoring?

Benefits

  • Automated and running continously: Heartbeat monitoring tool is listening on its dedicated URL continuously and once set it needs little to no maintenance, while still providing the same valuable information.
  • Simple to set up and use: Heartbeat monitors for any service can be set up in minutes while providing the incident information right from the start. Since it provides simple up/down information it can be applied widely across different services and use cases.

Drawbacks

  • Limited incident cause reporting: Heartbeat monitoring lacks the information that could answer why the incident happened. Since it only monitors the final output and not the actual workings of the service. To get a better idea about the root cause, application performance management (APM) or a log management service needs to be used.
  • Custom code dependency: Since the sending of the heartbeat needs to be custom coded into a given script or app, there is a possibility for error and misconfiguration. This is why any heartbeat setup needs to be checked properly.

Where does cron job monitoring fit in the synthetic monitoring setup?

Cron job monitoring is a great addition to the synthetic monitoring toolbox. Ideally it’s combined with with regular uptime checks as well as SSL certificate checks and Domain expiration checks to prevent any security issues or loss of valuable business assets respectively.

Synthetic monitoring also offers monitoring options like checking an API, DNS or Transaction monitoring.

How to start cron job monitoring in 5 minutes with Better Uptime?

Better Uptime is an infrastructure monitoring tool that offers cron job monitoring. Here is how to get notified whenever a service fails to run correctly, let’s set it up to get alerted whenever a database backup fails.

Creating a heartbeat monitor

  • Once signed up, head to Heartbeats → Create heartbeat
  • Enter a name of the heartbeat, let’s say Daily database backup
  • Set the expected heartbeat every selection to 24 hours
  • Set the grace period to the time you expect the database backup to run, let’s make it 15 minutes
  • Select the way how you want to get alerted, be it a phone call, Slack notification or an email
  • Click create monitor

For more information, explore Better Uptime docs.

Configuring cron job

Let’s say that to do the database backup you would run the following script:

$ bash /database/backup/script
Copied!

Now, you can create a cron job by executing utility crontab with parameter -e:

$ crontab -e
Copied!

The -e option is used to edit the file crontab using your default environment text editor. You will be redirected to the file. At the end of the file append the following line of code (make sure to copy your heartbeat URL and replace it in the code below):

0 0 * * * bash /database/backup/script && curl https://betteruptime.com/api/v1/heartbeat/<your-heartbeat-monitor-id>
Copied!

We set up a heartbeat interval for 1 day, so we must set up the cron job to the same time period, the cron expression for that is 0 0 * * *. The curl utility sends the heartbeat if the backup script runs successfully.

Once the crontab sends the first heartbeat to the monitor the monitoring will start - expecting the next request in 24 hours.

For more information, explore Better Uptime docs.

Check Uptime, Ping, Ports, SSL and more.
Get Slack, SMS and phone incident alerts.
Easy on-call duty scheduling.
Create free status page on your domain.
Got an article suggestion? Let us know
Next article
What is API Monitoring?
Learn what is API monitoring, how does it work, what are the benefits and drawbacks and how to set it up.
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.