- The significance of log levels.
- A brief history of log levels.
- Common log levels and how to use them effectively.
- Employing levels for regulating generated log volume.
- How to use log levels for post-logging analysis.
Log Levels Explained and How to Use Them
Log levels are essentially labels that indicate the severity or urgency of the various events in your application. Their primary purpose is to separate messages that are merely informational (meaning that the system is working normally) from those that describe a problem or potential problem, such as when recurring errors are detected in the system. Log levels also provide a way to regulate the amount of log data generated by your application so that only the relevant and necessary information is recorded, minimizing the storage requirements and ensuring efficient log management.
Developing a comprehensive production logging strategy for your applications hinges on a thorough understanding and proper utilization of log levels. Therefore, this article aims to provide valuable insights into the concept of log levels and equip you with the necessary techniques for using them effectively. Here are some of the key concepts you will learn by following through with this article:
A brief history of log levels
Log levels for software applications have a rich history dating back to the 1980s. One of the earliest and most influential logging solutions for Unix systems, Syslog, introduced a range of severity levels, which provided the first standardized framework for categorizing log entries based on their impact or urgency.
The following are the levels defined by Syslog in descending order of severity:
- Emergency (emerg): indicates that the system is unusable and requires immediate attention.
- Alert (alert): indicates that immediate action is necessary to resolve a critical issue.
- Critical (crit): signifies critical conditions in the program that demand intervention to prevent system failure.
- Error (error): indicates error conditions that impair some operation but are less severe than critical situations.
- Warning (warn): signifies potential issues that may lead to errors or unexpected behavior in the future if not addressed.
- Notice (notice): applies to normal but significant conditions that may require monitoring.
- Informational (info): includes messages that provide a record of the normal operation of the system.
- Debug (debug): intended for logging detailed information about the system for debugging purposes.
Various application logging frameworks, such as Log4net and Log4j recognized the significance of severity and further refined the concept. These frameworks refined the Syslog levels to cater to the specific needs of various applications and environments. The log levels we commonly encounter today have evolved from this iterative process, and they have now become a fundamental aspect of effective logging across various software disciplines.
The specific log levels available to you may defer depending on the programming
language, logging framework, or service in use. However, in
most cases, you can expect to encounter levels such as FATAL, ERROR, WARN,
INFO, DEBUG, and TRACE. While many logging frameworks allow you to
override the default log levels and
define custom ones, we generally advise
sticking to the levels outlined below. They are arranged in decreasing order of
urgency.
🔭 Want to centralize and monitor your application logs?
Head over to Better Stack and start ingesting your logs in 5 minutes.
1. FATAL
The FATAL log level is reserved for recording the most severe issues in an
application. When an entry is logged at this level, it indicates a critical
failure that prevents the application from doing any further useful work.
Typically, such entries are logged just before shutting down the application to
prevent data corruption or other detrimental effects.
To ensure timely awareness and swift response to fatal errors, you can configure a log management service, such as Better Stack, to provide instant alerts whenever such entries are logged. This allows you or relevant team members to react promptly and take necessary actions to address the critical situation before it affects your customers.
Examples of events that may be logged as FATAL errors include the following:
- Crucial configuration information is missing without fallback defaults.
- Loss of essential external dependencies or services required for core application operations (such as the database).
- Running out of disk space or memory on the server, causing the application to halt or become unresponsive.
- When a security breach or unauthorized access to sensitive data is detected.
By recoding such severe incidents at the FATAL log level to these severe
incidents, you'll ensure they receive the utmost attention and prompt resolution
from the relevant stakeholders. Here's a minimal example of how to log at the
FATAL level using the Pino framework for Node.js
applications:
const pino = require('pino');
const logger = pino({
  formatters: {
    bindings: (bindings) => {
      return {
        env: process.env.NODE_ENV || 'production',
        server: process.env.server,
      };
    },
    level: (label) => {
      return { level: label };
    },
  },
});
logger.fatal(
  new Error('no space available for write operations'),
  'Disk space critically low'
);
process.exit(1);
{"level":"fatal","time":1685998234741,"env":"production","err":{"type":"Error","message":"no space available for write operations","stack":"Error: no space available for write operations\n    at Object.<anonymous> (/home/ayo/dev/betterstack/demo/nodejs-logging/index.js:21:3)\n    at Module._compile (node:internal/modules/cjs/loader:1254:14)\n    at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)\n    at Module.load (node:internal/modules/cjs/loader:1117:32)\n    at Module._load (node:internal/modules/cjs/loader:958:12)\n    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)\n    at node:internal/main/run_main_module:23:47"},"msg":"Disk space critically low"}
When logging at the FATAL level, include as much information as possible to
help diagnose and fix the problem quickly. At the bare minimum, include a stack
trace, as demonstrated in the example above. However, including other relevant
details that can assist in the troubleshooting and resolution process such as
error messages, input data, or additional contextual information is highly
encouraged!
2. ERROR
The ERROR log level indicates error conditions within an application that
hinder the execution of a specific operation. While the application can continue
functioning at a reduced level of functionality or performance, ERROR logs
signify issues that should be investigated promptly. Unlike FATAL messages,
ERROR logs do not have the same sense of urgency, as the application can
continue to do useful work.
However, not all occurrences of errors or exceptions in your application should
be logged at the ERROR level. For instance, if an exception is expected
behavior and does not result in a degradation of application functionality or
performance, it can be logged at a lower level, such as DEBUG. Similarly,
errors with a potential for recovery, such as network connectivity issues with
automated retry mechanisms, can be logged at the WARN level. Such conditions
may be elevated to the ERROR level if recovery is unsuccessful after several
attempts.
Some examples of situations that are typically logged at the ERROR level
include the following:
- External API or service failures impacting the application's functionality (after automated recovery attempts have failed).
- Network communication errors, such as connection timeouts or DNS resolution failures.
- Failure to create or update a resource in the system.
- An unexpected error, such as the failure to decode a JSON object.
The example below demonstrates how you might record an error condition in your application. It uses the Zap logging framework for Go:
func main() {
    logger := zap.Must(zap.NewProduction())
    defer logger.Sync()
    err := someThingThatCanFail()
    if err != nil {
        logger.Error(
            "an error occurred while doing something that can fail",
            zap.Error(err),
        )
    }
}
{"level":"error","ts":1686002351.601908,"caller":"slog/main.go:20","msg":"an error occurred while doing something that can fail","error":"Unhandled exception: division by zero","stacktrace":"main.main\n\t/home/ayo/dev/demo/slog/main.go:20\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Notice how the stack trace is also included here to help you quickly determine where the error is coming from in your program. If your current framework does not automatically generate stack traces in error logs, ensure to configure it to do so. Alternatively, consider choosing a better logging framework that offers this functionality. Alternatively you can explore 8 go-to logging libraries tested by us.
3. WARN
Events logged at the WARN level typically indicate that something unexpected
has occurred, but the application can continue to function normally for the time
being. It is also used to signify conditions that should be promptly addressed
before they escalate into problems for the application.
Some examples of events that may be logged at the WARN level include the
following:
- Resource consumption nearing predefined thresholds (such as memory, CPU, or bandwidth).
- Errors that the application can recover from without any significant impact.
- Outdated configuration settings that are not in line with recommended practices.
- An excessive number of failed login attempts indicating potential security threats.
- External API response times exceed acceptable thresholds.
Before you can log an event as a warning, you need to establish predefined thresholds for various conditions that, when exceeded, trigger warning logs. For example, you can set an acceptable threshold for disk usage or failed login attempts and then log a warning on cue.
Here's an example of a warning log created using Python's logging module coupled with python-json-logger:
import logging
from pythonjsonlogger import jsonlogger
logHandler = logging.StreamHandler()
jsonHandler = logHandler.setFormatter(
    jsonlogger.JsonFormatter(
        "%(name)s %(asctime)s %(levelname)s %(filename)s %(lineno)s %(message)s",
        rename_fields={"levelname": "level", "asctime": "timestamp"},
    )
)
logger = logging.getLogger(__name__)
logger.addHandler(logHandler)
logger.warning("Disk usage warning", extra={"disk_usage": 85.2, "threshold": 80})
{"name": "__main__", "filename": "main.py", "lineno": 38, "message": "Disk usage warning", "disk_usage": 85.2, "threshold": 80, "level": "WARNING", "timestamp": "2023-06-07 12:48:35,251"}
4. INFO
The INFO level captures events in the system that are significant to the
application's business purpose. Such events are logged to show that the system
is operating normally. Production systems typically default to logging at this
level so that a summary of the application's normal behavior is visible to
anyone reviewing the logs.
Some events that are typically logged at the INFO level include the following:
- Changes in the state of an operation, such as transitioning from "PENDING" to "IN PROGRESS".
- The successful completion of scheduled jobs or tasks.
- Starting or stopping a service or application component.
- Records of important milestones or significant events within the application.
- Progress updates during long-running processes or tasks
- Information about system health checks or status reports.
These examples demonstrate the broad range of events that can be logged at the
INFO level. While it is good to log the events that are significant to the
business purpose, ensure that you do not log excessive details that may clutter
the logs or compromise security.
Here's an example of how to log at the INFO level using the
Semantic Logger framework
for Ruby programs:
require 'semantic_logger'
SemanticLogger.add_appender(io: $stdout, formatter: :json)
logger = SemanticLogger['MyApp']
logger.info('API request to /api/v1/users completed successfully',
            method: 'GET', status_code: 200, elapsed_ms: 212, endpoint: '/api/v1/users')
{"host":"fedora","application":"Semantic Logger","timestamp":"2023-06-05T22:59:17.869874Z","level":"info","level_index":2,"pid":3824579,"thread":"60","name":"MyApp","message":"API request to /api/v1/users completed successfully","payload":{"method":"GET","status_code":200,"elapsed_ms":212,"endpoint":"/api/v1/users"}}
5. DEBUG
The DEBUG level is used for logging messages that aid developers in
identifying issues during a debugging session. The content of the messages
logged at the DEBUG level will vary depending on your application, but they
typically contain detailed information that assists its developers in
troubleshooting problems efficiently. This can include variables' state within
the surrounding scope or relevant error codes.
Unlike the other levels discussed so far, DEBUG logs should typically not be
enabled in production environments chiefly due to the volume of the records
produced, as this results in increased disk I/O and storage requirements,
especially under heavy load. If you must use DEBUG in production for a
debugging session, turn it off immediately afterward to minimize the performance
impact.
Some everyday events that are usually logged at the DEBUG level include the
following:
- Database queries, which can aid in identifying performance bottlenecks or issues related to data retrieval or manipulation.
- The details of external API calls and their responses.
- Configuration values to aid in troubleshooting misconfigured or mismatched settings.
- Timing information such as the duration of specific operations or method executions.
Here's an example of how to write DEBUG logs:
logger.debug(
  {
    query: 'SELECT * FROM users WHERE age >= 18',
    time_taken_ms: 2.57,
  },
  'Select 18+ users from the database'
);
When DEBUG logging is enabled
(see below), you will observe the
following output:
{"level":"debug","time":1686043946948,"env":"production","query":"SELECT * FROM users WHERE age >= 18","time_taken_ms":2.57,"msg":"Select 18+ users from the database"}
Please be aware that DEBUG logs often contain sensitive information such as
usernames, passwords, application secrets, and more, so ensure that access is
restricted to authorized personnel only. You can employ log redaction or similar
techniques to reduce the risk of exposing such details in your logs.
6. TRACE
The TRACE level is designed specifically for tracing the path of code
execution within a program. It is primarily used to provide a detailed breakdown
of the events leading up to a crash, error, or other logged events at higher
levels. For example, you may use it when analyzing complex algorithms or
decision-making processes. Logging the intermediate steps, input values, and
output results allows you to validate and evaluate the algorithm's behavior to
confirm if it works as intended.
Concrete examples of events that should be logged at the TRACE level include
the following:
- When entering or exiting from a function or method, along with any relevant parameters or return values.
- The iteration details within loops, such as the current index or the values being processed.
- Calculations or operations that produce specific output or intermediate results.
As you can see, the TRACE level provides a comprehensive view of the program's
Consequently, enabling TRACE logging generates a significant output volume,
which can substantially increase log file size, I/O operations, and
computational overhead. It can also obscure relevant information and make it
harder to identify critical events.
Therefore, TRACE logging should only be reserved for development and other
non-production environments where the performance impact isn't a primary
concern. This way, you can leverage its detailed insights without compromising
the performance or efficiency of the application in live deployments.
Here's an example that uses the Zerolog framework for Go to produce
TRACE logs within a function that does some calculations:
package main
import (
    logger "github.com/rs/zerolog/log"
)
func complexCalculation(input int) int {
    logger.Trace().Msg("Entering complexCalculation() function")
    logger.Trace().Int("input", input).Msgf("Received input: %d", input)
    // step 1
    result := input * 2
    logger.Trace().
        Int("step1-result", result).
        Msg("Intermediate value after step 1")
    // step 2
    result += 10
    logger.Trace().
        Int("step2-result", result).
        Msg("Intermediate value after step 2")
    // final result
    result *= 3
    logger.Trace().
        Int("final-result", result).
        Msgf("Final result is: %d", result)
    logger.Trace().Msg("Exiting complexCalculation() function")
    return result
}
func main() {
    result := complexCalculation(5)
    logger.Info().Int("result", result).Msg("Calculation completed")
}
{"level":"trace","time":"2023-06-06T11:01:55+01:00","message":"Entering complexCalculation() function"}
{"level":"trace","input":5,"time":"2023-06-06T11:01:55+01:00","message":"Received input: 5"}
{"level":"trace","step1-result":10,"time":"2023-06-06T11:01:55+01:00","message":"Intermediate value after step 1"}
{"level":"trace","step2-result":20,"time":"2023-06-06T11:01:55+01:00","message":"Intermediate value after step 2"}
{"level":"trace","final-result":60,"time":"2023-06-06T11:01:55+01:00","message":"Final result is: 60"}
{"level":"trace","time":"2023-06-06T11:01:55+01:00","message":"Exiting complexCalculation() function"}
{"level":"info","result":60,"time":"2023-06-06T11:01:55+01:00","message":"Calculation completed"}
Now that we've examined the most common log levels and their typical usage, used, let's now dive into another crucial aspect of log levels: controlling the volume of your logs.
Controlling your application's log volume
Log levels provide a means to regulate the amount of log data produced by your application. By setting the appropriate log level, you can determine which log entries should be recorded and which should be ignored. This allows you to balance capturing essential information and avoiding an overwhelming flood of logs.
Controlling log volume is essential for a few reasons. First, excessive logging can lead to performance degradation and increased storage requirements, impacting the overall efficiency of your application. Second, recording an overwhelming number of logs can make identifying and analyzing critical events or issues difficult. Thirdly, log management services usually charge based on log volume, so this can lead to significantly increased associated costs.
Once you select your default level, all log entries with a severity lower than
the default will be excluded. For example, in production environments, it's
common to set the default level to INFO, capturing only messages of INFO
level or higher (WARN, ERROR, and FATAL). If you want to focus solely on
problem indicators, you can set the default level to WARN:
func main() {
    // this logger is set to the INFO level
    logger := zerolog.New(os.Stdout).
        Level(zerolog.InfoLevel).
        With().
        Timestamp().
        Logger()
    logger.Trace().Msg("Trace message")
    logger.Debug().Msg("Debug message")
    logger.Info().Msg("Info message")
    logger.Warn().Msg("Warn message")
    logger.Warn().Msg("Error message")
    logger.Fatal().Msg("Fatal message")
}
Notice how the DEBUG and TRACE logs are missing here:
{"level":"info","time":"2023-06-06T11:54:28+01:00","message":"Info message"}
{"level":"warn","time":"2023-06-06T11:54:28+01:00","message":"Warn message"}
{"level":"warn","time":"2023-06-06T11:54:28+01:00","message":"Error message"}
{"level":"fatal","time":"2023-06-06T11:54:28+01:00","message":"Fatal message"}
During a troubleshooting session, you may temporarily reduce the default
severity level to DEBUG so that you can have the necessary context to debug
the issue. Ensure to turn off DEBUG logging afterward to prevent an influx of
irrelevant log entries during normal application operation.
Lowering the default level to TRACE produces an even greater amount of logs
compared to DEBUG and is generally unsuitable for production use. It is better
suited for development or testing environments where performance degradation is
not a critical concern.
Controlling the default log level is commonly achieved through environmental variables, allowing you to modify it without altering the code. However, keep in mind that restarting the application may be necessary to update the level to a different value.
// set the log level from the environment
logLevel, err := zerolog.ParseLevel(os.Getenv("APP_LOG_LEVEL"))
if err != nil {
    // default to INFO if log level is not set in the environment
    logLevel = zerolog.InfoLevel
}
logger := zerolog.New(os.Stdout).
    Level(logLevel).
    With().
    Timestamp().
    Logger()
Additionally, various techniques exist to update the log level at runtime, but the specific method to use depends on the application environment and framework in use. Please explore the available options if runtime log level modification is of interest to you.
Creating and using custom levels
So far, we've discussed the typical log levels that you're likely to encounter by default in most logging frameworks. While you should generally stick to these levels to maintain consistency and adhere to established logging conventions, they sometimes do not fully address specific business needs.
In such cases, creating custom log levels can prove valuable. For example, you
can create a custom SECURITY level for logging events related authentication
and access control so that notable security events can be quickly identified and
investigated.
const pino = require('pino');
const levels = {
  fatal: 60,
  error: 50,
  warn: 40,
  security: 35,
  info: 30,
  debug: 20,
  trace: 10,
};
const logger = pino({
  level: 'debug',
  customLevels: levels,
  useOnlyCustomLevels: true,
  formatters: {
    bindings: (bindings) => {
      return {
        env: process.env.NODE_ENV || 'production',
        server: process.env.server,
      };
    },
    level: (label) => {
      return { level: label };
    },
  },
});
logger.security('A notable security event');
{"level":"security","time":1686051401495,"env":"production","msg":"A notable security event"}
Before introducing custom log levels to your project, carefully evaluate the necessity and impact of customization in terms of standardization, consistency, readability, and operational overhead. It's advisable to stick to established conventions and only leverage custom log levels sparingly.
How to use log levels for monitoring and analysis
Once your application is configured to include severity levels in its generated logs, you may be curious about how to utilize these labels for effectively analyzing the log records. In this section, we will explore three main approaches for leveraging log levels in post-logging analysis:
1. Filtering and searching
Log levels serve as a valuable filtering mechanism to narrow your analysis to specific severity levels. By filtering logs based on severity, you can reduce the scope of analysis to only the relevant events and prioritize areas that require attention. With Better Stack, you can easily specify filters that display only entries of a specific severity in the live tail interface.
2. Alerting
Log levels can trigger alerting and notification mechanisms to address critical
events proactively. You can receive immediate notifications when these events
occur by setting up alerts based on specific log levels, such as FATAL or
ERROR. The example below sends an alert to configured email addresses when
more than five ERROR entries are logged within a 30-second period indicating
that they could be a persistent problem that needs attention.
3. Calculating various metrics
Log levels are also a useful tool for generating various metrics about the application, especially in the absence of specialized tools like Prometheus. For example, a significant increase in ERROR or FATAL logs may indicate underlying issues or bugs that require immediate attention. This information can help you prioritize debugging efforts or plan a dedicated "bug squashing sprint" to address the identified issues.
Final thoughts
Understanding and using log levels correctly is a fundamental aspect of effective log management. With well-defined log levels, you can easily filter and prioritize your logs based on their severity, allowing you to focus on critical events and create meaningful alerts. I hope that this article has equipped you with the necessary knowledge to grasp the concept of log levels and their appropriate usage.
For the next logging article see microservices logging, or dive further into logging techniques and recommended practices, with the additional articles available in our comprehensive logging guide.
Thanks for reading, and happy logging!
