Side note: Monitor your cron jobs with Better Stack
Scheduled jobs are easy to forget about until they silently stop running. With Better Stack, you can monitor cron jobs using Heartbeats and get alerted the moment a task misses a check-in.
In our daily life and work, certain tasks need to be repeated over a specific period. For example, you may need to back up your databases and files, check the availability of a service, or generate reports of certain activities. Since these tasks need to be repeated based on a schedule, it is better to automate them using a task scheduler. Many programming languages offer their task scheduling solution, and in this tutorial, we will discuss how to schedule tasks using Python.
To get started with this tutorial, ensure that you have a computer with Linux and the latest version of Python installed. You can either set up a PC, a virtual machine, a virtual private server, or WSL (if you are using Windows). Also, make sure you log in as a user with root privileges, and you need to have some basic knowledge about Python and using command-line utilities on Linux systems.
There are two main ways to schedule tasks using Python. The first method
involves using Python scripts to create jobs that are executed using the cron
command, while the second involves scheduling the task directly with Python. We
will explore both methods in this tutorial.
Start by creating a new working directory on your machine:
To start creating Cron Jobs with Python, you need to use a package called
python-crontab. It allows you to
read, write, and access system cron jobs in a Python script using a simplified
syntax. You can install the package with the following command.
Once installed, create a cron.py file in your working directory. This is where
the code to schedule various tasks will be placed.
Here is an example of how python-crontab can be used used to create Cron Jobs:
First, the CronTab class is imported and initializes a cron object. Setting
the user argument to True ensures that the current user's crontab file is
read and manipulated. You can also manipulate other users' crontab file, but
you need the proper permissions to do so.
A new Cron Job is created by calling the new() method on the cron object,
and its command parameter specifies the shell command you wish to execute.
After creating the job, you need to specify its schedule. In this example, the
job is scheduled to run once every minute. Finally, you must save the job using
the write() method to write it to the corresponding crontab file.
Go ahead and execute the program using the following command:
You can check if the Cron Job has been created by running this command:
You should observe the following line at the bottom of the file:
Notice how the readable Python scheduling syntax gets translated to Cron's
cryptic syntax. This is one of the main advantages of using the python-crontab
package instead of editing the crontab file yourself.
Let's take a closer look at the scheduling options that the python-crontab
package exposes for automating tasks. Recall that a Cron expression has the
following syntax:
The minute() method that we used in the previous example corresponds to the
first field. Each of the other fields (except command) has their corresponding
method as shown in the list below:
minute: minute()hour: hour()day_of_month: day()month: month()day_of_week: dow()The command field corresponds to the command parameter in the new()
method.
Once you've specified the unit of time that should be used for scheduling (minute, hour, etc), you must define how often the job should be repeated. This could be a time interval, a frequency, or specific values. There are three different methods to help you with this.
on(): defines specific values for the task to be repeated and it takes
different values for different units. For instance, if the unit is minute,
integer values between 0-59 may be supplied as arguments. If the unit is day
of week (dow), integer values between 0-6 or string values SUN-SAT may
be provided. Below is a summary of how the on() method works for various units, and the
corresponding crontab output:
You can also specify multiple values in the on() method to form a list. This
corresponds to the comma character in a Cron expression.
every(): defines the frequency of repetition. Corresponds to the forward
slash (/) in a Cron expression.during(): specifies a time interval, which corresponds to the dash (-)
character in a Cron expression. It takes two values to form an interval, and
just like the on() method, the allowable set of values varies according to
the unit. You can also combine during() with every(), which allows you to define a
range and then specify the frequency of repetition. For example:
You need to remember that every time you set a schedule, the previous schedule (if any) will be cleared. For instance:
However, if you need to combine multiple schedules for a simple task, you must
append use the also() method as shown below:
If you are comfortable using Cron expressions,
there is also a setall() method that allows you to use either Cron expressions
or Python datetime objects
like this:
In this section, you will create a Python scrapper that scrapes the Dev.to Community for the latest Python articles, sorts them according to their reactions, and saves them to a markdown file. Afterward, you will schedule this scrapper to run once every week using the concepts introduced in prior sections.
Create a scrapper.py file with the following command:
This scrapper first uses the requests package to retrieve the desired webpage.
Next, the BeautifulSoup package parses the resulting HTML and extracts the
title, link, number of reactions, and the publication date of each article.
Afterward, the scrapper filters out articles that have less than five reactions
or is published over a week ago, and finally, it writes all the remaining
articles into the python-latest.md file.
Before you execute the program, install the required dependencies using the command below:
Then run the program as follows:
A python-latest.md file will be generated in the current directory. You can
view its contents to verify that the program works:
Now that the script is ready and proven to work, let's discuss how you can
schedule this script to run periodically, for example, every Saturday at 7:00
AM. Modify your cron.py file as follows:
Execute the cron.py file, and confirm that a corresponding Cron Job was
created using the command below:
You can verify the validity of this Cron expression using a handy website called Crontab Guru which allows you to check if your Cron expression is correct.
Scheduled jobs are easy to forget about until they silently stop running. With Better Stack, you can monitor cron jobs using Heartbeats and get alerted the moment a task misses a check-in.
Imagine this scenario: you've created several Cron Jobs on your machine, and you
need to update one of them. How can you locate that particular Cron Job and make
the update? While you can edit the crontab file directly, the python-crontab
package also provides a few different ways to update existing jobs.
First, you can iterate over all the Cron Job lines like this:
This method is highly inefficient since you'll have to iterate over many Jobs or
lines just to find the one you're looking for. The second way to do this is by
using the find_time() method:
You need to pass the schedule of the Cron Job to the find_time() method, and
Python will find that Cron Job for you. However, this method also has problems.
You are unlikely to remember the exact schedule when you have tons of Cron Jobs
on your machine.
The best way to locate a Cron Job is by utilizing comments. To do this, you need to associate each Cron Job with a comment like this:
Execute the cron.py file again, and a new Cron Job with a comment will be
created.
And then, you can locate this Cron Job using the find_comment() method. If you
don't remember the exact comment, you can fuzzy match with
regular expression like this:
One thing to note is that cron.find_comment() will return a set of objects,
and you need to create a loop to access each of them.
After you've located the Cron Job, you can modify its command or the comment:
You can also clear a job of its schedule and set a new one:
It's also possible to disable the Cron Job (by commenting it out), or enable it like this:
You can also remove a Cron Job entirely:
Support for batch deleting Cron Jobs is also provided with the remove_all()
method:
The second way to schedule tasks with Python is by utilizing a package called
schedule. This package does not create Cron Jobs on your system, but it
requires you to create a Python script that runs continuously. The advantage of
using schedule is that you can use it on any operating system, including
Windows, as long as Python is installed on your computer.
To schedule tasks using the schedule package, create another file called
scheduler.py:
Place the following code into the file:
Notice that there is an infinite loop at the end of this program. This means
that the script will not stop until you manually interrupt it (with Ctrl-C).
The time.sleep(1) line will also suspend the execution of the current thread
for one second, meaning this scheduler will run once every second. That's how
the schedule package is able to determine how many seconds have passed and
when it is time to execute the scheduled tasks. In this example, the task() is
scheduled to run every minute.
Use the following command to start this scheduler:
Wait for the task to execute and you'll get the output:
To prevent the script from terminating when the terminal is closed, you can opt
to run it in the background. First, ensure you are not using print() to
display your output because it will only run in the foreground. Instead, you can
save the program's output to a file like this:
Next, execute this script with an ampersand (&) at the end of the command:
This will start scheduler.py as a new process in the background, and you will
receive its process ID (PID) afterward. Then, wait for the scheduled task to
execute, and you should observe that a task.txt file is now present in your
working directory.
You can use the returned PID to kill the process like this:
Besides every minute, you have several options to choose from when it comes to how often the task should be executed.
You can also schedule the task to run every n seconds/minutes/hours/days/weeks
like this (notice that the units need to be in plural form in this case):
Sometimes, it is more convenient to schedule tasks based on the day of the week. For example:
By default, these tasks will run at 00:00 on the scheduled day, but it is
possible to define a specific time by chaining an at() method like this:
Lastly, you can set an end repeat time using the until() method. The method
takes either a string or a DateTime:
Now that we've covered the basics of schedule, let's see how we can use it to
schedule our previous example, scrapper.py.
As you can see, schedule has a much simpler API, as it only takes one line of
code to create the same schedule, allowing the scrapper to run automatically
every Saturday at 7:00.
The schedule package also offers a few different methods to manage previously
scheduled tasks. For example:
Or you can cancel a specific job by assigning it to a variable and then use the
cancel_job() method:
If you need to create a job that only runs once, make the scheduled job return
CancelJob. After task() has been executed the first time, it will be
canceled automatically.
If you need to manage multiple jobs together, you can give them tags using the
tag() method. Notice that you can assign multiple tags to one job.
Then you can manage them using the get_jobs() method, or cancel them using the
clear() method:
Regardless of the method you choose to schedule your recurring tasks, you need to set up a monitoring system that will notify you when a job fails to run as scheduled. There are several ways to do this, but one of the most accessible options involves using a cloud monitoring tool, and that is what we'll explore in this section.
Better Uptime is a server monitoring tool that offers scheduled task monitoring services. It allows you to set up an alert for your scheduled jobs, and when the job is down for some reason, Better Uptime will notify you and your team through emails, SMS texts, or phone calls based on your selection. This section will discuss using Better Uptime to monitor your scheduled Python tasks.
First, you must create a free Better Uptime account if you don't have one already. Once signed in, go to Heartbeats, which is Better Uptime's Cron monitoring service, and create a new heartbeat.
Then choose an appropriate name for your monitor and select how often you expect this scheduled job to be repeated. Next, in the On-call escalation section, pick how you wish to be notified when the scheduled task fails to execute. After you are done, click Create heartbeat.
Next, you should see this page:
Notice the highlighted section in the middle. This URL is what Better Uptime uses to monitor your scheduled jobs. Every time a scheduled task executes, you should ensure that a HEAD, GET, or POST request is made to this URL.
For example, if you are using python-crontab, head back to the cron.py file
and edit the command part of the Cron Job (ensure curl is installed on your
machine):
Now, execute the cron.py file and go to the Better Uptime monitor you just
created. Wait for a minute, and once Better Uptime starts receiving requests,
the monitor will be marked as "Up", which means the Cron Job is up and running.
If you disable this Cron Job, which simulates an incident:
Execute the script again and wait for a few minutes. If Better Uptime does not receive a request within the time frame you just configured, the monitor will be marked as "Down", which means an incident occurred.
You will also receive an alert in the configured channels:
Since this particular Cron Job will execute a Python script, you can also make a request to Better Uptime in the Python script instead:
This way, when you are scheduling multiple scripts in one Cron Job, you will be able to monitor each script separately. So that you'll know exactly which script is not working when something goes wrong.
If you are using the schedule package, you need to make a request to
betteruptime.com within the task() method:
Or you can make a request in the scrapper.py file, as discussed before.
Cron jobs and background workers can fail without anyone noticing. With Better Stack Heartbeats, you can get alerted if a scheduled task does not run on time, so you catch missed backups, stuck workers, and broken automation early.
In this tutorial, we discussed scheduling Python scripts using python-crontab
and schedule. The python-crontab package utilizes Cron under the hood, which
is only available on Unix-like systems. On the other hand, schedule has a much
simpler API, and it will work on any operating system with Python installed. It
does have a few
limitations
of its own, making it less powerful overall than python-crontab.
To dig deeper into task automation using Cron, we recommend learning more about job scheduling on Linux systems. If you are unsure about the best database match check our PostgreSQL guide. Thanks for reading, and happy scheduling!
We use cookies to authenticate users, improve the product user experience, and for personalized ads. Learn more.