Side note: Get a Python logs dashboard
Save hours of sifting through Python logs. Centralize with Better Stack and start visualizing your log data in minutes.
See the Python demo dashboard live.
Logging is an essential component of software development, playing a vital role in monitoring and debugging applications. It also serves as a valuable method for gaining a thorough understanding of application performance in real-world scenarios. By utilizing logs, developers can easily monitor, debug, and identify patterns that can inform product decisions.
As your application collects more data, adopting proper logging practices becomes crucial for swiftly and efficiently comprehending the overall functionality. This enables you to address issues before they impact end-users.
In this article, we will delve into the best practices for logging in Python. By following these practices, you can ensure that the logs generated are informative, actionable, and scalable. Let's get started!
This article assumes that you have a basic understanding of Python and its logging concepts.
Save hours of sifting through Python logs. Centralize with Better Stack and start visualizing your log data in minutes.
See the Python demo dashboard live.
The root logger is the default logger in the Python logging
module. While it
can be tempting to use the root logger to simplify logging code, there are
several reasons why it should be avoided:
Lack of control: When you use the root logger, you have limited control over how log messages are handled. This can lead to issues where log messages are sent to unexpected destinations, or where log levels are set incorrectly.
Difficulty in managing loggers: When using the root logger, it can be challenging to manage multiple loggers in a complex application. This can lead to issues where log messages are duplicated, or where loggers are not properly configured.
Inability to separate log data: The root logger is shared across all modules and components in an application. This can make it challenging to separate log data by module or component, which can be important when analyzing log data.
Security risks: The root logger can be modified by any module in an application, which can create security risks if an attacker is able to modify log settings.
Instead of using the root logger, it is recommended to create a logger for each module or component in an application. This allows you to control log settings for each logger independently, and makes it easier to separate log data for analysis.
To create a logger for each module in Python, you can use the
logging.getLogger()
method, which returns a logger object that you can use to
log messages for that module. Here is an example of how to create a logger for a
module called my_module
:
logger = logging.getLogger("my_module")
The getLogger()
method takes a name argument, which is used to identify the
logger. Typically, you would use the name of the module as the logger name to
make it easy to identify which module is generating log messages.
You can also write it as follows:
logger = logging.getLogger(__name__)
Once you have created a logger for a module, you can use the standard logging
methods to log messages, such as debug()
, info()
, warning()
, error()
,
and critical()
.
By default, loggers propagate messages up to the root logger, so it is important
to set the propagate attribute to False
for each logger that you create. This
prevents log messages from being duplicated or handled by unexpected loggers.
Here is an example of how to disable propagation for a logger:
logger.propagate = False
By creating a separate logger for each module in your application, you can control log settings independently and organize log data in a way that makes it easy to analyze and troubleshoot issues.
As your application grows and becomes more complex, managing logging configurations can become more challenging. Centralizing the configurations can help ensure that logging is handled consistently and efficiently as your application scales.
It can also allow you to customize the logging settings based on the deployment environment. For example, you may want to log more information in development or testing environments, but only log essential information in production.
Thus, logging configuration should be done at the application level rather than in individual modules. This can ensure that all log messages are handled consistently across your application. This can help improve the readability and maintainability of your codebase, as well as make it easier to troubleshoot issues.
Here are some steps to centralize logging configuration in Python:
Create a separate module for logging configuration: Create a new Python module that will contain all the logging configuration code. This module should import the logging module and contain all the necessary configurations.
Define your logging settings: Define your desired logging settings, such as log format, log level, and log output destination. You can also define additional handlers and formatters as needed.
Import logging settings into your application: Import the logging configuration module into your main application code. This will allow you to use the same logging settings across all modules in your application.
Set the logging configuration: Set the logging configuration by calling
the logging.config.dictConfig()
method and passing in your logging settings
dictionary. This method will configure the logging module with your specified
settings.
Here's an example of a centralized logging configuration for a Python project that uses the python-json-logger library to output structured logs:
import logging.config
from pythonjsonlogger import jsonlogger
LOGGING = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"json": {
"format": "%(asctime)s %(levelname)s %(message)s",
"class": "pythonjsonlogger.jsonlogger.JsonFormatter",
}
},
"handlers": {
"stdout": {
"class": "logging.StreamHandler",
"stream": "ext://sys.stdout",
"formatter": "json",
}
},
"loggers": {"": {"handlers": ["stdout"], "level": "DEBUG"}},
}
logging.config.dictConfig(LOGGING)
In the above example, we defined a dictionary called LOGGING
that contains all
the logging configuration settings such as log format, log level, and log output
destination. The configure_logging()
function uses the
logging.config.dictConfig()
method to configure the logging module with our
specified settings.
To use this centralized logging configuration in your Python application, you
just need to import the configure_logging()
function and call it at the start
of your application:
import logging_config
import logging
logger = logging.getLogger(__name__)
logger.info("An info")
logger.warning("A warning")
{"asctime": "2023-04-27 14:50:35,894", "levelname": "INFO", "message": "An info"}
{"asctime": "2023-04-27 14:50:35,895", "levelname": "WARNING", "message": "A warning"}
Log levels are used to indicate the severity of a log message in a software application. They are a way to categorize log messages based on their importance or significance. In Python, each log level is associated with a numerical value or a constant name that represents a certain level of severity.
Python’s logging module supports five different logging levels, from highest to lowest severity:
For example, if an error occurs while connecting to the database, we can catch the exception and log the error using the critical log level so that we can investigate the issue and resolve it before the application crashes or causes data loss:
def connect_to_database():
try:
# connect here
except Exception as e:
logger.critical("Failed to connect to database: %s", e, exc_info=True)
exit(1)
return conn
def process_request(request):
try:
# Process the request
except Exception as e:
logger.error('Error processing request: %s', e, exc_info=True)
# Return an error message to the user
Here's an example:
def low_memory_check():
available_memory = get_available_memory()
if available_memory < 1024:
logger.warning('Low memory detected')
# Send an alert to the you
For example, you might use INFO
logging to track how often certain features
are used or a notable event in your application lifecycle. Here's an example:
def some_function(record_id):
# Do some processing
logger.info('New record created in the database. ID: %s', record_id)
def get_user_info(user_id):
logger.debug('Retrieving user info for user with ID: %s', user_id)
# Fetch user info from the database
user_info = database.get_user_info(user_id)
logger.debug('Retrieved user info: %s', user_info)
return user_info
Setting the appropriate log level also allows you to control which messages are
displayed in the log output. For example, if the log level is set to INFO
,
only log messages with an INFO
level or higher (i.e. WARNING
, ERROR
, and
CRITICAL
) will be recorded. This can be useful in production environments
where you only want to see messages that indicate an issue that needs immediate
attention.
Here’s an example of how you can configure the logging level to ERROR
in
Python:
LOGGING = {
# the rest of your config
"loggers": {"": {"handlers": ["stdout"], "level": "ERROR"}},
}
logging.config.dictConfig(LOGGING)
Logging can impact application performance, so its important to be mindful of how often and how much you are logging. Be sure to log enough information to enable you diagnose issues, but not so much that it impacts the performance of the application.
Writing meaningful log messages is important because they help you understand what is happening within an application at any given time. When an issue or error occurs, logs can be used to diagnose the problem and fix it quickly. Meaningful log messages can make this process much easier and faster. We have also compiled a list of the most common Python errors with their respective fixes.
Moreover, in production environments, logs are often monitored to ensure the application is running smoothly. Meaningful log messages can help operators quickly identify any issues that may arise.
Also as applications grow and evolve, it can be difficult to remember how different parts of the system work. Meaningful log messages can serve as a form of documentation, reminding you of what happened in the past and how the application has evolved over time.
To ensure that your log messages are meaningful, easy to understand, provide context and help you diagnose and fix issues quickly, here are few tips that you can follow:
Be clear and concise: Log messages should be easy to understand and to the point. Avoid using technical jargon or complex sentences.
Provide context: Include information about the context of the log message. This could include the function or module where the log message was generated, the user who initiated the action, input parameters, or any relevant data that will help understand the message.
Be consistent: Use a consistent format for log messages across your application. This makes it easier to read and understand them, especially when you have many log messages such as in a production environment.
Use placeholders: Use placeholders for values that will be dynamically inserted into the log message. This makes it easier to read and understand the message, and also prevents sensitive data from being logged.
Here's an example of using placeholders in a log statement to generate a meaningful message:
name = 'Alice'
age = 30
salary = 50000
logger.info("Employee name: %s, age: %s, salary: %s", name, age, salary)
The above log statement includes three placeholders %s
, which are replaced
with the values of the name, age, and salary variables when the log message is
generated.
The resulting log message would look something like this (assuming you're using the logging configuration shown here:
{"asctime": "2023-04-27 20:31:54,737", "levelname": "INFO", "message": "Employee name: Alice, age: 30, salary: 50000"}
By using placeholders in this way, we can create meaningful log messages that include dynamic values. This makes it easier to understand and troubleshoot issues in our application, as we can see the specific values that were involved in a given operation or event.
Here's an example where the log message is designed to provide actionable insights to whoever is reading the logs, by suggesting to consider collecting more data before proceeding with the data processing:
data = [1, 2, 3, 4, 5]
if len(data) < 6:
logging.warning("Data is too small. Consider collecting more data before proceeding.")
else:
# process data
pass
To summarize our understanding, let's look at some examples of meaningful and not-so-meaningful log messages:
Good log messages:
In the above examples, the log messages are clear, concise, and provide useful information for debugging and troubleshooting. They indicate what action was taken, whether it was successful or not, and any relevant details that may be helpful in identifying the root cause of a problem.
Bad log messages:
In the above examples, the log messages are too vague and do not provide any useful information for debugging or troubleshooting. They do not indicate what action was taken, what caused the error, or what steps should be taken to resolve the issue.
%
vs f-strings for string formatting in logsIn Python, there are two main ways to format strings: using % formatting and f-strings. However, there are some differences between the two methods that may make one more suitable than the other in certain cases.
Here are some considerations for when to use each approach for string formatting in logs:
Use %
formatting when:
%
formatting for formatting complex data types.%
formatting can be more performant than f-strings, especially for large
numbers of log messages.Use f-strings when:
Ultimately, the choice between %
formatting and f-strings for string
formatting in logs depends on your personal preference, the requirements of your
application, and the Python version you are using. Both approaches can be
effective for formatting log messages, and you should choose the one that works
best for your specific use case.
However, it is generally recommended to be consistent in your use of string formatting across your codebase to improve readability and maintainability.
Traditional text-based logging formats, while useful, can be difficult to read and analyze, particularly as applications become more complex. Structured logging provides a solution to this problem by using a standardized format, such as JSON, to log data in a structured way.
By using a structured logging format, developers can easily parse and analyze log data, making it easier to identify and troubleshoot issues.
Here are some of the benefits of using structured JSON logging:
Improved readability and searchability: Structured JSON logs are easier to read and search compared to traditional text logs. By using a standardized JSON format, you can easily parse and analyze log data using tools like Elasticsearch or Kibana.
Consistency across components: When different components of an application use different logging formats, it can be challenging to analyze logs across the entire application stack. By using a standardized JSON format, you can ensure that all components use the same format, making it easier to analyze logs across the entire application.
Better context and metadata: With structured JSON logging, you can add additional metadata to your logs, such as request IDs, user IDs, or timestamps. This metadata can provide valuable context when troubleshooting issues or analyzing log data.
Support for structured data: JSON is a flexible format that supports structured data, making it easy to log complex data structures like dictionaries or lists. By using a structured format, you can avoid the need for parsing text logs, which can be error-prone and time-consuming.
Scalability: As your application grows, the volume of logs generated by your application can increase significantly. By using a structured JSON format, you can easily scale your logging infrastructure to handle large volumes of log data.
There are several Python logging libraries that support structured JSON logging, such as python-json-logger , loguru, and structlog.
Once you have installed any of these libraries and configured the logger, you
can use it to write logs in a structured JSON format. To do this, you can call
the logger.info()
method (or any other logging method) and pass in a
dictionary of key-value pairs representing the log message.
Here's an example using loguru
:
import sys
from loguru import logger
logger.remove(0)
logger.add(
sys.stdout,
format="{time:MMMM D, YYYY > HH:mm:ss!UTC} | {level} | {message}",
serialize=True,
)
logger.info("Incoming API request: GET /api/users/123")
This will write a JSON log message to the standard output with the following structure:
{"text": "April 27, 2023 > 19:50:33 | INFO | Incoming API request: GET /api/users/123\n", "record": {"elapsed": {"repr": "0:00:00.017884", "seconds": 0.017884}, "exception": null, "extra": {}, "file": {"name": "main.py", "path": "/home/betterstack/dev/demo/python-logging/main.py"}, "function": "<module>", "level": {"icon": "ℹ️", "name": "INFO", "no": 20}, "line": 21, "message": "Incoming API request: GET /api/users/123", "module": "main", "name": "__main__", "process": {"id": 407115, "name": "MainProcess"}, "thread": {"id": 140129253443392, "name": "MainThread"}, "time": {"repr": "2023-04-27 20:50:33.843118+01:00", "timestamp": 1682625033.843118}}}
You can also use the logging library's built-in features to add additional context to your logs, such as timestamps, log levels, and exception stack traces.
Logging without a timestamp is only marginally better than not knowing about the event at all. By including a timestamp in your logs, you can make the lives of those who use logs for troubleshooting much easier. Moreover, timestamps allow you to analyze log entries to gain insights and analytics about user and program behavior over time.
When different systems or components need to exchange timestamp data, it is essential that they all use the same format to ensure interoperability. Choosing an inappropriate format for timestamps can create chaos by conflicting with other services that are currently being used or may be used in the future for log management or application monitoring.
To avoid this, it is best to adopt a standard format for timestamps. One such standard format is ISO-8601, which is an internationally recognized standard for exchanging date- and time-related data.
By adhering to this standard, you can ensure that your timestamps are compatible with a wide range of tools and services, reducing the likelihood of conflicts or issues in the future.
Here’s how a timestamp expressed in ISO-8601 format:
2022-06-15T04:32:19.955Z
This is a basic example of how to configure the formatting to allow ISO-8601 timestamps:
LOGGING = {
"formatters": {
"json": {
"format": "%(asctime)s %(levelname)s %(message)s",
"datefmt": "%Y-%m-%dT%H:%M:%SZ",
"class": "pythonjsonlogger.jsonlogger.JsonFormatter",
}
},
}
logging.config.dictConfig(LOGGING)
Sensitive data should be kept out of logs because logs are often used for troubleshooting and debugging purposes, and can contain sensitive information such as user passwords, credit card numbers, or other private data. If logs are not properly secured or managed, they can become a target for hackers and other malicious actors who may attempt to gain access to this sensitive data.
Additionally, logs are often stored in plain text files or other unencrypted formats, making them vulnerable to unauthorized access or disclosure. By keeping sensitive data out of logs, you can help protect their users' privacy and reduce the risk of data breaches or other security incidents.
Here are some general tips to keep sensitive data out of logs and reduce the risk of sensitive data being exposed:
Avoid logging sensitive data: The simplest way to keep sensitive data out of logs is not to log it in the first place. Ensure that your logging system is configured to exclude sensitive data.
Mask or redact sensitive data: If sensitive data needs to be logged, you can mask or redact it. For example, you can replace credit card numbers or passwords with a series of asterisks or replace them with a hash value.
For example, if the credit card number is "1234-5678-9012-3456", it can be masked or redacted. Here's how to use filters to implement log redaction in Python:
import logging
import logging.config
from pythonjsonlogger import jsonlogger
import re
class SensitiveDataFilter(logging.Filter):
pattern = re.compile(r"\d{4}-\d{4}-\d{4}-\d{4}")
def filter(self, record):
# Modify the log record to mask sensitive data
record.msg = self.mask_sensitive_data(record.msg)
return True
def mask_sensitive_data(self, message):
# Implement your logic to mask or modify sensitive data
# For example, redact credit card numbers like this
message = self.pattern.sub("[REDACTED]", message)
return message
LOGGING = {
"version": 1,
"disable_existing_loggers": False,
"filters": {
"sensitive_data_filter": {
"()": SensitiveDataFilter,
}
},
"formatters": {
"json": {
"format": "%(asctime)s %(levelname)s %(message)s",
"datefmt": "%Y-%m-%dT%H:%M:%SZ",
"class": "pythonjsonlogger.jsonlogger.JsonFormatter",
}
},
"handlers": {
"stdout": {
"class": "logging.StreamHandler",
"stream": "ext://sys.stdout",
"formatter": "json",
"filters": ["sensitive_data_filter"],
}
},
"loggers": {"": {"handlers": ["stdout"], "level": "INFO"}},
}
logging.config.dictConfig(LOGGING)
import logging_config
import logging
logger = logging.getLogger(__name__)
credit_card_number = "1234-5678-9012-3456"
logger.info(f"User made a payment with credit card number: {credit_card_number}")
{"asctime": "2023-04-27T21:36:39Z", "levelname": "INFO", "message": "User made a payment with credit card number: [REDACTED]"}
Use environment variables: You can store sensitive data, such as API keys or database credentials, in environment variables instead of hardcoding them into your code. This way, the values will not be logged.
Limit the scope of logs: You can limit the scope of logs by only logging what is necessary. This means you can log only errors or critical events and not all events. Additionally, you can also limit the amount of information logged by your application.
Encrypt log data: You can encrypt log data to keep sensitive information secure. This will ensure that only authorized personnel can access and read the logs.
Use a secure log management solution: Ensure that the logging system you are using is secure and has appropriate controls in place to prevent unauthorized access to sensitive data.
Rotating log files means creating new log files periodically and archiving or deleting the old ones. The purpose of rotating log files is to manage log file size, improve performance, preserve log data, simplify debugging, and enhance security. If log files are not rotated, they can consume valuable disk space and cause performance issues.
There are several strategies for rotating log files, including:
Time-based rotation: Create a new log file at fixed time intervals (e.g., daily or weekly) and archive or delete old log files.
Size-based rotation: Create a new log file when the current log file reaches a certain size limit (e.g., 10 MB) and archive or delete old log files.
Hybrid rotation: Combine time-based and size-based rotation strategies to create new log files at fixed time intervals and archive or delete old log files based on size limits.
In Python, we can rotate log files using the built-in logging module. The
logging module provides a RotatingFileHandler
class that allows you to create
log files that are rotated based on a specified size or time interval.
Here's an example of how to use the RotatingFileHandler
class to rotate log
files based on size:
# Create a logger
logger = logging.getLogger('my_logger')
logger.setLevel(logging.DEBUG)
# Create a rotating file handler
handler = logging.handlers.RotatingFileHandler(
'my_log.log', maxBytes=1000000, backupCount=5)
# Set the formatter for the handler
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
# Add the handler to the logger
logger.addHandler(handler)
# Test the logger
logger.debug('Debug message')
In this example, we create a logger named my_logger and set the log level to
DEBUG. We then create a RotatingFileHandler
with a maximum file size of 1 MB
and a backup count of 5.
This means that once the log file reaches 1 MB, a new log file will be created and the old log file will be archived. The backup count specifies the number of archived log files to keep.
We set the formatter for the handler to include the timestamp, logger name, log level, and log message. Finally, we add the handler to the logger and log a debug message.
This is just a simple example of how to rotate log files using the logging module in Python. We generally recommend leaving log rotation concerns to an external tool like logrotate which can help you enforce consistency in log rotation policies across multiple applications or services running on the same machine.
As soon as your application is deployed to production, it will immediately start generating logs, which are usually stored on the host server. While logging into one or two servers may be practical enough for log viewing and analysis, this practice becomes tedious and ineffective as your application scales across dozens of servers.
By centralizing logs, you can simplify log management by consolidating logs from multiple sources into a single location. This makes it easier to search, analyze, and monitor logs and reduces the need to manage logs across multiple systems.
Centralizing logs at one place has several advantages such as the following:
Improve troubleshooting: Centralizing logs makes it easier to troubleshoot issues by providing a single source of truth for log data. This allows you to correlate events across different systems and identify the root cause of issues more quickly.
Enhance security: Centralizing logs can enhance security by providing a centralized location for monitoring and detecting security threats. By analyzing logs from multiple systems, you can identify patterns and anomalies that may indicate a security breach.
Increase scalability: Centralizing logs can increase scalability by allowing you to collect and store large volumes of log data in a centralized location. This makes it easier to scale your log infrastructure as your system grows.
Facilitate compliance: Centralizing logs can facilitate compliance with regulatory requirements by providing a centralized location for storing and auditing log data. This makes it easier to demonstrate compliance with regulations and standards.
When choosing a cloud logging solution, there are several factors to consider:
Features: Look for a solution that provides the features you need, such as real-time log streaming, search and analysis capabilities, and alerts and notifications.
Scalability: Make sure the solution can handle your current log volume and can scale as your log volume grows.
Integration: Ensure the solution can integrate with your existing systems and tools, such as your logging frameworks, monitoring and alerting tools, and cloud platforms.
Security: Look for a solution that provides robust security features, such as encryption, access controls, and data retention policies.
Cost: Consider the cost of the solution, including any upfront fees, ongoing subscription costs, and any additional costs for features like storage or data processing.
Support: Check the level of support provided by the vendor, including documentation, technical support, and community forums.
Ease of use: Look for a solution that is easy to set up, configure, and use, with an intuitive interface and clear documentation.
By considering these factors, you can choose a cloud logging solution that meets your needs and helps you better manage and analyze your log data.
Logtail provides a comprehensive set of features and more, including effortless log data collection, filtering, and correlation from multiple sources, all of which can be analyzed in a single location.
In addition, Logtail offers built-in collaboration tools that allow users to share insights with team members or draw attention to specific issues. To begin using Logtail, simply sign up for a free account and review the documentation to explore integration options with your application.
In conclusion, implementing the best logging practices in Python can greatly improve the maintainability, performance, and security of your application. By following these practices, you can ensure that your logs are well-structured, properly formatted, and easy to search and analyze. You can also reduce the risk of sensitive data being exposed in your logs and minimize the impact of log file size on your system's performance.
To achieve the best logging practices, it is important to use appropriate log levels and message formats, and implement proper error handling and exception logging. Additionally, you should consider implementing log rotation and retention policies to ensure that your logs are properly managed and archived.
By prioritizing logging as a key aspect of your development process, you can gain valuable insights into your application's behavior, diagnose problems quickly, and ultimately improve the overall quality and reliability of your software.
Thanks for reading, and happy logging!
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usWrite a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github