Log levels are labels that indicate the severity or urgency of a log entry.
Their primary purpose is to separate messages that are merely informational
(meaning that the system is working normally) from those that describe a problem
or potential problem, such as when recurring errors are detected in the system.
Log levels also provide a way to dynamically control your application's volume
of log output (more on this later).
In this article, we will discuss the following concepts that should help you get
a handle on what log levels are and how to use them to log more effectively.
What log levels are and how they work.
The history of log levels.
Common log levels and how to use them.
Using log levels for filtering purposes.
Configuring alerts based on log levels.
The history of log levels
Syslog, a logging solution initially
developed for the Sendmail project,
first introduced the concept of log levels in the 1980s. It came with
severity levels that are
attached to each log entry to describe the severity of the event in question:
Emergency (emerg): system is unusable.
Alert (alert): immediate action required.
Critical (crit): critical conditions.
Error (error): error conditions.
Warning (warn): warning conditions.
Notice (notice): normal but significant conditions.
Informational (info): informational messages.
Debug (debug): messages helpful for debugging.
In the following years, Syslog was adopted by various software applications and
eventually became a standard for message logging on Unix-like systems. Its
severity levels were also adapted and refined by various application logging
frameworks such as log4net and
log4j, evolving into the various log
levels that are commonplace today.
🔠Want to centralize and monitor your application logs?
Head over to Logtail and start ingesting your logs in 5 minutes.
Common log levels and their use cases
The log levels available to you will vary depending on the programming language,
framework, or service in use. Still, most will include some or all of the
following levels: FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. You
can usually override the defaults in your logging framework of choice with
custom log levels, but we recommend sticking to the ones discussed below. They
are arranged in decreasing order of urgency:
FATAL
The FATAL log level annotates messages with the greatest severity. It usually
means that something critical is broken, and the application cannot continue to
do any more useful work without the intervention of an engineer. Typically, such
entries are logged before the application is shut down (with exit code 1) to
prevent further data corruption. If you use a
log management service, you can configure it
such that you get instant alerts when such entries are logged so that someone
can react to them as quickly as possible.
Examples of situations that may be logged as FATAL errors include the
following:
Crucial configuration information is missing without fallback defaults.
Unable to connect to a service crucial to the application's primary function
(such as the database).
Running out of disk space on the server.
ERROR
The ERROR log level is used to represent error conditions in an application
that prevent a specific operation from running, but the application itself can
continue working even if it is at a reduced level of functionality or
performance. Generally, ERROR logs should be investigated as soon as possible
but they don't carry the same urgency as FATAL messages since the application
can continue working.
The occurrence of an error condition in the application does not necessarily
mean that it should be logged at the ERROR level. For example, if an exception
is expected behavior and does not indicate degradation in application
functionality or performance, it can be logged as INFO. Also, errors with a
possibility of recovery (such as network connectivity errors) can be labeled as
INFO if an automatic recovery strategy is in place (e.g retries). Such
conditions can be promoted to the ERROR level if recovery isn't possible after
a predetermined time.
Logging significant error conditions is also useful for generating metrics such
as
Mean Time Between Failures (MTBF)
which can be used to assess the quality of the application or to compare
different systems or designs. Examples of situations that are typically logged
at the ERROR level include the following:
A persistent connection failure to some external resource (after automated
recovery attempts have failed).
Failure to create or update a resource in the system.
An unexpected error (e.g failed to decode a JSON object).
WARN
Messages logged at the WARN level typically indicate that something unexpected
happened, but the application can recover and continue to function normally. It
is mainly used to draw attention to situations that should be addressed soon
before they pose a problem for the application.
Events that may be logged at the WARN level include the following:
The disk usage on the server is above a configured threshold.
Memory usage is above a configured threshold.
The application is taking longer than usual to complete some important tasks
(degraded performance).
INFO
INFO-level messages indicate events in the system that are significant to the
business purpose of the application. Such events are logged to show that the
system is operating normally. For example, a service was started or stopped,
some resource was created, accessed, updated, or deleted in the database, and so
on. Production systems typically default to logging at this level so that a
summary of the application's normal behavior is visible to anyone reading the
logs.
Other events that are typically logged at the INFO level include the
following:
The state of an operation has changed (e.g from "PENDING" to "IN PROGRESS").
The application is listening on a specific port.
A scheduled job was completed successfully.
DEBUG
The DEBUG level is used for logging messages that help developers find out
what went wrong during a debugging session. While the specifics of what messages
to log at the DEBUG level is dependent on your application, you generally want
to include detailed information that can help developers troubleshoot an issue
quickly. This can include variable state in the surrounding scope, or relevant
error codes. Unlike TRACE (below), DEBUG level logging can be turned on in
production without making the application unusable, but it should not be left on
indefinitely to ensure optimal performance of the system.
TRACE
The TRACE level is used for tracing the path of code execution in a program.
For example, you may use it to trace the processing of a incoming request or an
algorithm's steps to solve a problem. Generally, TRACE is used for showing the
flow of the program, and to provide a detailed breakdown of the sequence of
events that led to a crash, a silent failure, an error, or some other event
logged at a different level. Concrete examples of messages that should be logged
at the TRACE level include the following:
Entered or exited a function or method, perhaps with the processing duration.
Calculation x + y produced output z.
Starting or ending an operation and any intermediate state changes.
As you can see, the information logged at this level generally tries to capture
every possible detail about the program's execution. Therefore, TRACE logging
should only be enabled for short periods due to the significant performance
degradation that it often causes. You will typically enable it only in
development and testing environments.
Controlling your application's log volume
Log levels are the primary way to control your application's volume of log
entries. Once you select your default level, all log entries that are labeled
with a severity lower than the default will not be recorded. For example,
logging at the WARN level will cause INFO, DEBUG and TRACE messages to
be ignored.
As you go down in default severity, the number of entries that are produced will
increase, so it's a good idea to turn on only what is necessary to avoid being
flooded with too much information. A typical default for production environments
is INFO, which records messages logged at the INFO level or higher priority
(WARN, ERROR and FATAL). You can change this to WARN if you only want to
record events that indicate problems or potential problems.
When troubleshooting a problem in production, you might want to reduce the
default severity of recorded messages to DEBUG. This level will typically
produce a voluminous output filled with enough context that will help developers
debug the issue, but it should be turned off afterward to prevent flooding the
system with irrelevant log entries during normal operation of the application.
The TRACE level produces even more logs than DEBUG so it shouldn't be used
in production for sustained periods. It's better utilized in a development or
testing environment where system performance degradation isn't a critical
consideration.
Control your default log level is best done through an environmental variable so
that you can change it without modifying the code. However, you might need to
restart the application each time the log level needs to be updated. There are
also several ways to update the log level at runtime, but the specific technique
will depend on the application environment and framework used. Ensure to
thoroughly investigate the options available if this is something that interests
you.
How to use log levels for monitoring and analysis
After you've configured your application to produce logs with the severity
levels included, you might be wondering how to use the recorded labels to make
sense of the log messages. The three main ways to use log levels for
post-logging analysis are discussed below:
1. Filtering
Log levels allow you to quickly sift your logs such that only the relevant ones
are displayed. If you use a cloud log management service like
Logtail, it's easy to
specify filters that
display only the ERROR level entries that occurred in a time period.
2. Alerting
Another useful way to use log levels is for creating alerts in various
scenarios. You can notify relevant members of your team if a notable event
occurs on the system, or if an expected event didn't occur within a specified
time frame. The example below sends an alert to configured email addresses when
more than five ERROR entries are logged within a 30 second period.
Aside from sending alerts to email addresses, you can configure various
integrations so that you can receive alerts in Slack or other services in your
stack.
3. Calculating various metrics
Log levels are also a useful tool for generating various metrics about the
application, especially those that help gauge its reliability. For example, the
number of ERROR or FATAL entries recorded in a specific period is valuable
data that could help inform if some sort of "bug squashing sprint" should be
next up on the calendar.
Final thoughts
Using the right log level is a crucial step for effective log management. If
your log levels are sound, it will be easy to filter your logs by priority, and
you can create alerts for notable events. We hope this article has provided
enough information to help you understand log levels and when to use them. For
more details on logging techniques and practices to follow, check out the other
articles in our logging guide.