- What log levels are and how they work.
- The history of log levels.
- Common log levels and how to use them.
- Using log levels for filtering purposes.
- Configuring alerts based on log levels.
Log levels are labels that indicate the severity or urgency of a log entry. Their primary purpose is to separate messages that are merely informational (meaning that the system is working normally) from those that describe a problem or potential problem, such as when recurring errors are detected in the system. Log levels also provide a way to dynamically control your application's volume of log output (more on this later).
In this article, we will discuss the following concepts that should help you get a handle on what log levels are and how to use them to log more effectively.
The history of log levels
Syslog, a logging solution initially developed for the Sendmail project, first introduced the concept of log levels in the 1980s. It came with severity levels that are attached to each log entry to describe the severity of the event in question:
- Emergency (
emerg): system is unusable.
- Alert (
alert): immediate action required.
- Critical (
crit): critical conditions.
- Error (
error): error conditions.
- Warning (
warn): warning conditions.
- Notice (
notice): normal but significant conditions.
- Informational (
info): informational messages.
- Debug (
debug): messages helpful for debugging.
In the following years, Syslog was adopted by various software applications and eventually became a standard for message logging on Unix-like systems. Its severity levels were also adapted and refined by various application logging frameworks such as log4net and log4j, evolving into the various log levels that are commonplace today.
Common log levels and their use cases
The log levels available to you will vary depending on the programming language,
framework, or service in use. Still, most will include some or all of the
can usually override the defaults in your logging framework of choice with
custom log levels, but we recommend sticking to the ones discussed below. They
are arranged in decreasing order of urgency:
FATAL log level annotates messages with the greatest severity. It usually
means that something critical is broken, and the application cannot continue to
do any more useful work without the intervention of an engineer. Typically, such
entries are logged before the application is shut down (with exit code
prevent further data corruption. If you use a
log management service, you can configure it
such that you get instant alerts when such entries are logged so that someone
can react to them as quickly as possible.
Examples of situations that may be logged as
FATAL errors include the
- Crucial configuration information is missing without fallback defaults.
- Unable to connect to a service crucial to the application's primary function (such as the database).
- Running out of disk space on the server.
ERROR log level is used to represent error conditions in an application
that prevent a specific operation from running, but the application itself can
continue working even if it is at a reduced level of functionality or
ERROR logs should be investigated as soon as possible
but they don't carry the same urgency as
FATAL messages since the application
can continue working.
The occurrence of an error condition in the application does not necessarily
mean that it should be logged at the
ERROR level. For example, if an exception
is expected behavior and does not indicate degradation in application
functionality or performance, it can be logged as
INFO. Also, errors with a
possibility of recovery (such as network connectivity errors) can be labeled as
INFO if an automatic recovery strategy is in place (e.g retries). Such
conditions can be promoted to the
ERROR level if recovery isn't possible after
a predetermined time.
Logging significant error conditions is also useful for generating metrics such
Mean Time Between Failures (MTBF)
which can be used to assess the quality of the application or to compare
different systems or designs. Examples of situations that are typically logged
ERROR level include the following:
- A persistent connection failure to some external resource (after automated recovery attempts have failed).
- Failure to create or update a resource in the system.
- An unexpected error (e.g failed to decode a JSON object).
Messages logged at the
WARN level typically indicate that something unexpected
happened, but the application can recover and continue to function normally. It
is mainly used to draw attention to situations that should be addressed soon
before they pose a problem for the application.
Events that may be logged at the
WARN level include the following:
- The disk usage on the server is above a configured threshold.
- Memory usage is above a configured threshold.
- The application is taking longer than usual to complete some important tasks (degraded performance).
INFO-level messages indicate events in the system that are significant to the
business purpose of the application. Such events are logged to show that the
system is operating normally. For example, a service was started or stopped,
some resource was created, accessed, updated, or deleted in the database, and so
on. Production systems typically default to logging at this level so that a
summary of the application's normal behavior is visible to anyone reading the
Other events that are typically logged at the
INFO level include the
- The state of an operation has changed (e.g from "PENDING" to "IN PROGRESS").
- The application is listening on a specific port.
- A scheduled job was completed successfully.
DEBUG level is used for logging messages that help developers find out
what went wrong during a debugging session. While the specifics of what messages
to log at the
DEBUG level is dependent on your application, you generally want
to include detailed information that can help developers troubleshoot an issue
quickly. This can include variable state in the surrounding scope, or relevant
error codes. Unlike
DEBUG level logging can be turned on in
production without making the application unusable, but it should not be left on
indefinitely to ensure optimal performance of the system.
TRACE level is used for tracing the path of code execution in a program.
For example, you may use it to trace the processing of a incoming request or an
algorithm's steps to solve a problem. Generally,
TRACE is used for showing the
flow of the program, and to provide a detailed breakdown of the sequence of
events that led to a crash, a silent failure, an error, or some other event
logged at a different level. Concrete examples of messages that should be logged
TRACE level include the following:
- Entered or exited a function or method, perhaps with the processing duration.
- Calculation x + y produced output z.
- Starting or ending an operation and any intermediate state changes.
As you can see, the information logged at this level generally tries to capture
every possible detail about the program's execution. Therefore,
should only be enabled for short periods due to the significant performance
degradation that it often causes. You will typically enable it only in
development and testing environments.
Controlling your application's log volume
Log levels are the primary way to control your application's volume of log
entries. Once you select your default level, all log entries that are labeled
with a severity lower than the default will not be recorded. For example,
logging at the
WARN level will cause
TRACE messages to
As you go down in default severity, the number of entries that are produced will
increase, so it's a good idea to turn on only what is necessary to avoid being
flooded with too much information. A typical default for production environments
INFO, which records messages logged at the
INFO level or higher priority
FATAL). You can change this to
WARN if you only want to
record events that indicate problems or potential problems.
When troubleshooting a problem in production, you might want to reduce the
default severity of recorded messages to
DEBUG. This level will typically
produce a voluminous output filled with enough context that will help developers
debug the issue, but it should be turned off afterward to prevent flooding the
system with irrelevant log entries during normal operation of the application.
TRACE level produces even more logs than
DEBUG so it shouldn't be used
in production for sustained periods. It's better utilized in a development or
testing environment where system performance degradation isn't a critical
Control your default log level is best done through an environmental variable so that you can change it without modifying the code. However, you might need to restart the application each time the log level needs to be updated. There are also several ways to update the log level at runtime, but the specific technique will depend on the application environment and framework used. Ensure to thoroughly investigate the options available if this is something that interests you.
How to use log levels for monitoring and analysis
After you've configured your application to produce logs with the severity levels included, you might be wondering how to use the recorded labels to make sense of the log messages. The three main ways to use log levels for post-logging analysis are discussed below:
Log levels allow you to quickly sift your logs such that only the relevant ones
are displayed. If you use a cloud log management service like
Logtail, it's easy to
specify filters that
display only the
ERROR level entries that occurred in a time period.
Another useful way to use log levels is for creating alerts in various
scenarios. You can notify relevant members of your team if a notable event
occurs on the system, or if an expected event didn't occur within a specified
time frame. The example below sends an alert to configured email addresses when
more than five
ERROR entries are logged within a 30 second period.
Aside from sending alerts to email addresses, you can configure various integrations so that you can receive alerts in Slack or other services in your stack.
3. Calculating various metrics
Log levels are also a useful tool for generating various metrics about the
application, especially those that help gauge its reliability. For example, the
FATAL entries recorded in a specific period is valuable
data that could help inform if some sort of "bug squashing sprint" should be
next up on the calendar.
Using the right log level is a crucial step for effective log management. If your log levels are sound, it will be easy to filter your logs by priority, and you can create alerts for notable events. We hope this article has provided enough information to help you understand log levels and when to use them. For more details on logging techniques and practices to follow, check out the other articles in our logging guide.
Thanks for reading, and happy logging!