Log levels are essentially labels that indicate the severity or urgency of the
various events in your application. Their primary purpose is to separate
messages that are merely informational (meaning that the system is working
normally) from those that describe a problem or potential problem, such as when
recurring errors are detected in the system. Log levels also provide a way to
regulate the amount of log data generated by your application so that only the
relevant and necessary information is recorded, minimizing the storage
requirements and ensuring efficient log management.
Developing a comprehensive production logging strategy for your applications
hinges on a thorough understanding and proper utilization of log levels.
Therefore, this article aims to provide valuable insights into the concept of
log levels and equip you with the necessary techniques for using them
effectively. Here are some of the key concepts you will learn by following
through with this article:
The significance of log levels.
A brief history of log levels.
Common log levels and how to use them effectively.
Employing levels for regulating generated log volume.
How to use log levels for post-logging analysis.
A brief history of log levels
Log levels for software applications have a rich history dating back to the
1980s. One of the earliest and most influential logging solutions for Unix
systems, Syslog, introduced a range of
severity levels, which provided the first standardized framework for
categorizing log entries based on their impact or urgency.
The following are the levels defined by Syslog in descending order of severity:
Emergency (emerg): indicates that the system is unusable and requires
immediate attention.
Alert (alert): indicates that immediate action is necessary to resolve
a critical issue.
Critical (crit): signifies critical conditions in the program that
demand intervention to prevent system failure.
Error (error): indicates error conditions that impair some operation
but are less severe than critical situations.
Warning (warn): signifies potential issues that may lead to errors or
unexpected behavior in the future if not addressed.
Notice (notice): applies to normal but significant conditions that may
require monitoring.
Informational (info): includes messages that provide a record of the
normal operation of the system.
Debug (debug): intended for logging detailed information about the
system for debugging purposes.
Various application logging frameworks, such as
Log4net and
Log4j recognized the significance of severity
and further refined the concept. These frameworks refined the Syslog levels to
cater to the specific needs of various applications and environments. The log
levels we commonly encounter today have evolved from this iterative process, and
they have now become a fundamental aspect of effective logging across various
software disciplines.
The specific log levels available to you may defer depending on the programming
language, logging framework, or service in use. However, in
most cases, you can expect to encounter levels such as FATAL, ERROR, WARN,
INFO, DEBUG, and TRACE. While many logging frameworks allow you to
override the default log levels and
define custom ones, we generally advise
sticking to the levels outlined below. They are arranged in decreasing order of
urgency.
🔭 Want to centralize and monitor your application logs?
Head over to Better Stack and start ingesting
your logs in 5 minutes.
1. FATAL
The FATAL log level is reserved for recording the most severe issues in an
application. When an entry is logged at this level, it indicates a critical
failure that prevents the application from doing any further useful work.
Typically, such entries are logged just before shutting down the application to
prevent data corruption or other detrimental effects.
To ensure timely awareness and swift response to fatal errors, you can configure
a log management service, such as Better Stack,
to provide instant alerts whenever such entries are logged. This allows you or
relevant team members to react promptly and take necessary actions to address
the critical situation before it affects your customers.
Examples of events that may be logged as FATAL errors include the following:
Crucial configuration information is missing without fallback defaults.
Loss of essential external dependencies or services required for core
application operations (such as the database).
Running out of disk space or memory on the server, causing the application to
halt or become unresponsive.
When a security breach or unauthorized access to sensitive data is detected.
By recoding such severe incidents at the FATAL log level to these severe
incidents, you'll ensure they receive the utmost attention and prompt resolution
from the relevant stakeholders. Here's a minimal example of how to log at the
FATAL level using the Pino framework for Node.js
applications:
new Error('no space available for write operations'),
'Disk space critically low'
);
process.exit(1);
Output
{"level":"fatal","time":1685998234741,"env":"production","err":{"type":"Error","message":"no space available for write operations","stack":"Error: no space available for write operations\n at Object.<anonymous> (/home/ayo/dev/betterstack/demo/nodejs-logging/index.js:21:3)\n at Module._compile (node:internal/modules/cjs/loader:1254:14)\n at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)\n at Module.load (node:internal/modules/cjs/loader:1117:32)\n at Module._load (node:internal/modules/cjs/loader:958:12)\n at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)\n at node:internal/main/run_main_module:23:47"},"msg":"Disk space critically low"}
When logging at the FATAL level, include as much information as possible to
help diagnose and fix the problem quickly. At the bare minimum, include a stack
trace, as demonstrated in the example above. However, including other relevant
details that can assist in the troubleshooting and resolution process such as
error messages, input data, or additional contextual information is highly
encouraged!
2. ERROR
The ERROR log level indicates error conditions within an application that
hinder the execution of a specific operation. While the application can continue
functioning at a reduced level of functionality or performance, ERROR logs
signify issues that should be investigated promptly. Unlike FATAL messages,
ERROR logs do not have the same sense of urgency, as the application can
continue to do useful work.
However, not all occurrences of errors or exceptions in your application should
be logged at the ERROR level. For instance, if an exception is expected
behavior and does not result in a degradation of application functionality or
performance, it can be logged at a lower level, such as DEBUG. Similarly,
errors with a potential for recovery, such as network connectivity issues with
automated retry mechanisms, can be logged at the WARN level. Such conditions
may be elevated to the ERROR level if recovery is unsuccessful after several
attempts.
Some examples of situations that are typically logged at the ERROR level
include the following:
External API or service failures impacting the application's functionality
(after automated recovery attempts have failed).
Network communication errors, such as connection timeouts or DNS resolution
failures.
Failure to create or update a resource in the system.
An unexpected error, such as the failure to decode a JSON object.
The example below demonstrates how you might record an error condition in your
application. It uses the Zap logging framework for Go:
"an error occurred while doing something that can fail",
zap.Error(err),
)
}
}
Output
{"level":"error","ts":1686002351.601908,"caller":"slog/main.go:20","msg":"an error occurred while doing something that can fail","error":"Unhandled exception: division by zero","stacktrace":"main.main\n\t/home/ayo/dev/demo/slog/main.go:20\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Notice how the stack trace is also included here to help you quickly determine
where the error is coming from in your program. If your current framework does
not automatically generate stack traces in error logs, ensure to configure it to
do so. Alternatively, consider choosing a better logging
framework that offers this functionality. Alternatively you can explore 8 go-to logging libraries tested by us.
3. WARN
Events logged at the WARN level typically indicate that something unexpected
has occurred, but the application can continue to function normally for the time
being. It is also used to signify conditions that should be promptly addressed
before they escalate into problems for the application.
Some examples of events that may be logged at the WARN level include the
following:
Resource consumption nearing predefined thresholds (such as memory, CPU, or
bandwidth).
Errors that the application can recover from without any significant impact.
Outdated configuration settings that are not in line with recommended
practices.
An excessive number of failed login attempts indicating potential security
threats.
External API response times exceed acceptable thresholds.
Before you can log an event as a warning, you need to establish predefined
thresholds for various conditions that, when exceeded, trigger warning logs. For
example, you can set an acceptable threshold for disk usage or failed login
attempts and then log a warning on cue.
The INFO level captures events in the system that are significant to the
application's business purpose. Such events are logged to show that the system
is operating normally. Production systems typically default to logging at this
level so that a summary of the application's normal behavior is visible to
anyone reviewing the logs.
Some events that are typically logged at the INFO level include the following:
Changes in the state of an operation, such as transitioning from "PENDING" to
"IN PROGRESS".
The successful completion of scheduled jobs or tasks.
Starting or stopping a service or application component.
Records of important milestones or significant events within the application.
Progress updates during long-running processes or tasks
These examples demonstrate the broad range of events that can be logged at the
INFO level. While it is good to log the events that are significant to the
business purpose, ensure that you do not log excessive details that may clutter
the logs or compromise security.
Here's an example of how to log at the INFO level using the
Semantic Logger framework
for Ruby programs:
{"host":"fedora","application":"Semantic Logger","timestamp":"2023-06-05T22:59:17.869874Z","level":"info","level_index":2,"pid":3824579,"thread":"60","name":"MyApp","message":"API request to /api/v1/users completed successfully","payload":{"method":"GET","status_code":200,"elapsed_ms":212,"endpoint":"/api/v1/users"}}
5. DEBUG
The DEBUG level is used for logging messages that aid developers in
identifying issues during a debugging session. The content of the messages
logged at the DEBUG level will vary depending on your application, but they
typically contain detailed information that assists its developers in
troubleshooting problems efficiently. This can include variables' state within
the surrounding scope or relevant error codes.
Unlike the other levels discussed so far, DEBUG logs should typically not be
enabled in production environments chiefly due to the volume of the records
produced, as this results in increased disk I/O and storage requirements,
especially under heavy load. If you must use DEBUG in production for a
debugging session, turn it off immediately afterward to minimize the performance
impact.
Some everyday events that are usually logged at the DEBUG level include the
following:
Database queries, which can aid in identifying performance bottlenecks or
issues related to data retrieval or manipulation.
The details of external API calls and their responses.
Configuration values to aid in troubleshooting misconfigured or mismatched
settings.
Timing information such as the duration of specific operations or method
executions.
Here's an example of how to write DEBUG logs:
Copied!
logger.debug(
{
query: 'SELECT * FROM users WHERE age >= 18',
time_taken_ms: 2.57,
},
'Select 18+ users from the database'
);
When DEBUG logging is enabled
(see below), you will observe the
following output:
Copied!
{"level":"debug","time":1686043946948,"env":"production","query":"SELECT * FROM users WHERE age >= 18","time_taken_ms":2.57,"msg":"Select 18+ users from the database"}
Please be aware that DEBUG logs often contain sensitive information such as
usernames, passwords, application secrets, and more, so ensure that access is
restricted to authorized personnel only. You can employ log redaction or similar
techniques to reduce the risk of exposing such details in your logs.
6. TRACE
The TRACE level is designed specifically for tracing the path of code
execution within a program. It is primarily used to provide a detailed breakdown
of the events leading up to a crash, error, or other logged events at higher
levels. For example, you may use it when analyzing complex algorithms or
decision-making processes. Logging the intermediate steps, input values, and
output results allows you to validate and evaluate the algorithm's behavior to
confirm if it works as intended.
Concrete examples of events that should be logged at the TRACE level include
the following:
When entering or exiting from a function or method, along with any relevant
parameters or return values.
The iteration details within loops, such as the current index or the values
being processed.
Calculations or operations that produce specific output or intermediate
results.
As you can see, the TRACE level provides a comprehensive view of the program's
Consequently, enabling TRACE logging generates a significant output volume,
which can substantially increase log file size, I/O operations, and
computational overhead. It can also obscure relevant information and make it
harder to identify critical events.
Therefore, TRACE logging should only be reserved for development and other
non-production environments where the performance impact isn't a primary
concern. This way, you can leverage its detailed insights without compromising
the performance or efficiency of the application in live deployments.
Here's an example that uses the Zerolog framework for Go to produce
TRACE logs within a function that does some calculations:
Copied!
package main
import (
logger "github.com/rs/zerolog/log"
)
func complexCalculation(input int) int {
logger.Trace().Msg("Entering complexCalculation() function")
logger.Trace().Int("input", input).Msgf("Received input: %d", input)
// step 1
result := input * 2
logger.Trace().
Int("step1-result", result).
Msg("Intermediate value after step 1")
// step 2
result += 10
logger.Trace().
Int("step2-result", result).
Msg("Intermediate value after step 2")
// final result
result *= 3
logger.Trace().
Int("final-result", result).
Msgf("Final result is: %d", result)
logger.Trace().Msg("Exiting complexCalculation() function")
return result
}
func main() {
result := complexCalculation(5)
logger.Info().Int("result", result).Msg("Calculation completed")
}
Output
{"level":"trace","time":"2023-06-06T11:01:55+01:00","message":"Entering complexCalculation() function"}
{"level":"trace","input":5,"time":"2023-06-06T11:01:55+01:00","message":"Received input: 5"}
{"level":"trace","step1-result":10,"time":"2023-06-06T11:01:55+01:00","message":"Intermediate value after step 1"}
{"level":"trace","step2-result":20,"time":"2023-06-06T11:01:55+01:00","message":"Intermediate value after step 2"}
{"level":"trace","final-result":60,"time":"2023-06-06T11:01:55+01:00","message":"Final result is: 60"}
{"level":"trace","time":"2023-06-06T11:01:55+01:00","message":"Exiting complexCalculation() function"}
{"level":"info","result":60,"time":"2023-06-06T11:01:55+01:00","message":"Calculation completed"}
Now that we've examined the most common log levels and their typical usage,
used, let's now dive into another crucial aspect of log levels: controlling the
volume of your logs.
Controlling your application's log volume
Log levels provide a means to regulate the amount of log data produced by your
application. By setting the appropriate log level, you can determine which log
entries should be recorded and which should be ignored. This allows you to
balance capturing essential information and avoiding an overwhelming flood of
logs.
Controlling log volume is essential for a few reasons. First, excessive logging
can lead to performance degradation and increased storage requirements,
impacting the overall efficiency of your application. Second, recording an
overwhelming number of logs can make identifying and analyzing critical events
or issues difficult. Thirdly, log management services usually charge based on
log volume, so this can lead to significantly increased associated costs.
Once you select your default level, all log entries with a severity lower than
the default will be excluded. For example, in production environments, it's
common to set the default level to INFO, capturing only messages of INFO
level or higher (WARN, ERROR, and FATAL). If you want to focus solely on
problem indicators, you can set the default level to WARN:
Copied!
func main() {
// this logger is set to the INFO level
logger := zerolog.New(os.Stdout).
Level(zerolog.InfoLevel).
With().
Timestamp().
Logger()
logger.Trace().Msg("Trace message")
logger.Debug().Msg("Debug message")
logger.Info().Msg("Info message")
logger.Warn().Msg("Warn message")
logger.Warn().Msg("Error message")
logger.Fatal().Msg("Fatal message")
}
Notice how the DEBUG and INFO logs are missing here:
During a troubleshooting session, you may temporarily reduce the default
severity level to DEBUG so that you can have the necessary context to debug
the issue. Ensure to turn off DEBUG logging afterward to prevent an influx of
irrelevant log entries during normal application operation.
Lowering the default level to TRACE produces an even greater amount of logs
compared to DEBUG and is generally unsuitable for production use. It is better
suited for development or testing environments where performance degradation is
not a critical concern.
Controlling the default log level is commonly achieved through environmental
variables, allowing you to modify it without altering the code. However, keep in
mind that restarting the application may be necessary to update the level to a
different value.
Copied!
// set the log level from the environment
logLevel, err := zerolog.ParseLevel(os.Getenv("APP_LOG_LEVEL"))
if err != nil {
// default to INFO if log level is not set in the environment
logLevel = zerolog.InfoLevel
}
logger := zerolog.New(os.Stdout).
Level(logLevel).
With().
Timestamp().
Logger()
So far, we've discussed the typical log levels that you're likely to encounter
by default in most logging frameworks. While you should generally stick to these
levels to maintain consistency and adhere to established logging conventions,
they sometimes do not fully address specific business needs.
In such cases, creating custom log levels can prove valuable. For example, you
can create a custom SECURITY level for logging events related authentication
and access control so that notable security events can be quickly identified and
investigated.
Before introducing custom log levels to your project, carefully evaluate the
necessity and impact of customization in terms of standardization, consistency,
readability, and operational overhead. It's advisable to stick to established
conventions and only leverage custom log levels sparingly.
How to use log levels for monitoring and analysis
Once your application is configured to include severity levels in its generated
logs, you may be curious about how to utilize these labels for effectively
analyzing the log records. In this section, we will explore three main
approaches for leveraging log levels in post-logging analysis:
1. Filtering and searching
Log levels serve as a valuable filtering mechanism to narrow your analysis to
specific severity levels. By filtering logs based on severity, you can reduce
the scope of analysis to only the relevant events and prioritize areas that
require attention. With Better Stack, you can
easily
specify filters
that display only entries of a specific severity in the live tail interface.
2. Alerting
Log levels can trigger alerting and notification mechanisms to address critical
events proactively. You can receive immediate notifications when these events
occur by setting up alerts based on specific log levels, such as FATAL or
ERROR. The example below sends an alert to configured email addresses when
more than five ERROR entries are logged within a 30-second period indicating
that they could be a persistent problem that needs attention.
3. Calculating various metrics
Log levels are also a useful tool for generating various metrics about the
application, especially in the absence of specialized tools like
Prometheus. For example, a significant
increase in ERROR or FATAL logs may indicate underlying issues or bugs that
require immediate attention. This information can help you prioritize debugging
efforts or plan a dedicated "bug squashing sprint" to address the identified
issues.
Final thoughts
Understanding and using log levels correctly is a fundamental aspect of
effective log management. With well-defined log levels, you can easily filter
and prioritize your logs based on their severity, allowing you to focus on
critical events and create meaningful alerts. I hope that this article has
equipped you with the necessary knowledge to grasp the concept of log levels and
their appropriate usage.
For further insights into logging techniques and recommended practices, we
encourage you to explore the additional articles available in our comprehensive
logging guide.
Thanks for reading, and happy logging!
Article by
Ayooluwa Isaiah
Ayo is the Head of Content at Better Stack. His passion is simplifying and communicating complex technical ideas effectively. His work was featured on several esteemed publications including LWN.net, Digital Ocean, and CSS-Tricks. When he’s not writing or coding, he loves to travel, bike, and play tennis.