Side note: Get an application logs dashboard
Save hours of sifting through your application logs. Centralize with Better Stack and start visualizing your log data in minutes.
See the live demo.
Deciding what to log is one of the most challenging aspects of application development since it's difficult to foresee which pieces of information will prove critical during troubleshooting.
Many developers resort to logging everything, generating a tremendous amount of log data, which can be cumbersome to manage and expensive to store and process.
To maximize the effectiveness of your logging efforts and prevent excessive logging, it's crucial to follow well-established logging best practices.
These guidelines are designed not only to improve the quality of your log data but also to minimize the impact of logging on system performance.
By implementing the following logging strategies, you'll ensure that your logs are both informative and manageable, leading to quicker issue resolution and lower costs!
Impact | Difficulty | |
---|---|---|
Establish clear logging objectives | ⭐⭐⭐⭐⭐ | ⭐⭐ |
Use log levels correctly | ⭐⭐⭐⭐⭐ | ⭐ |
Structure your logs | ⭐⭐⭐⭐⭐ | ⭐⭐ |
Write meaningful log entries | ⭐⭐⭐ | ⭐⭐⭐⭐ |
Sample your logs | ⭐⭐⭐⭐ | ⭐⭐ |
Aggregate and centralize your logs | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
Establish log retention policies | ⭐⭐⭐ | ⭐⭐ |
Don't log sensitive data | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
Don't ignore the performance cost of logging | ⭐⭐⭐ | ⭐⭐⭐ |
Don't use logs for monitoring | ⭐⭐⭐ | ⭐ |
To prevent noisy logs that don't add any value, it's crucial to define the objectives of your logging strategy. Ask yourself: what are the overarching business or operational goals? What function is your application designed to perform?
Once you've pinpointed these objectives, you can determine the key performance indicators (KPIs) that will help you track your advancement towards these goals.
With a clear understanding of your aims and KPIs, you'll be in a better position to make informed decisions about which events to log and which ones are best left to track through other means (such as metrics and traces) instead of trying to do everything through your logs
It's hard to get it this stuff right from the get-go, so you'll want to err on the side of over logging and establish a regular review process to refine the signal-to-noise ratio (SNR) in your logs.
Take error logging, for instance: the objective is not just to record errors, but to enable their resolution. You should log the error details and the events leading up to the error, providing a narrative that helps diagnose the underlying issues.
The fix is usually straightforward once you know what's broken and why.
Log levels are the most basic signal for indicating the severity of the event being logged. They let you distinguish routine events from those that require further scrutiny.
Here's a summary of common levels and how they're typically used:
INFO
: Significant and noteworthy business events.WARN
: Abnormal situations that may indicate future problems.ERROR
: Unrecoverable errors that affect a specific operation.FATAL
: Unrecoverable errors that affect the entire program.Other levels like TRACE
and DEBUG
aren't really about event severity but the
level of detail that the application should produce.
In development or testing environments, you might default to DEBUG
to capture
extensive details, but production environments will typically default to INFO
to prevent noisy logs while occasionally dropping down to DEBUG
for
troubleshooting issues.
Modifying log verbosity is typically done through static config files or environmental variables. However, a more agile solution might involve implementing a mechanism to adjust log levels on the fly.
Some languages and frameworks provide the flexibility to alter log levels for specific components or services within the application rather than globally. This targeted approach allows for more granular control and minimizes unnecessary log output.
Learn more: Log Levels Explained and How to Use Them
Historical logging practices were oriented toward creating logs that are readable by humans, often resulting in entries like these:
[2023-11-03 08:45:33,123] ERROR: Database connection failed: Timeout exceeded.
Nov 3 08:45:10 myserver kernel: USB device 3-2: new high-speed USB device number 4 using ehci_hcd
ERROR: relation "custome" does not exist at character 15
These types of logs lack a uniform format that machines can parse efficiently, which can hinder automated analysis and extend the time needed for diagnosing issues.
To streamline this process, consider the following steps:
Firstly, adopt a logging framework that allows you to log in a structured format like JSON if your language does not provide such capabilities in its standard library.
Learn more: How to Choose a Logging Framework
Secondly, configure your application dependencies to output structured data where possible. For example, PostgreSQL produces plaintext logs by default but as of version 15, it can be configured to emit logs in JSON format.
Thirdly, you can use log shippers to parse and transform unstructured logs into structured formats before they are shipped to long-term storage.
As an example, consider Nginx error logs. At the time of writing, they don't support native structuring but with a tool like Vector, you can convert an unstructured log from this:
172.17.0.1 - alice [01/Apr/2021:12:02:31 +0000] "POST /not-found HTTP/1.1" 404 153 "http://localhost/somewhere" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36" "2.75"
To structured JSON like this:
{
"agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36",
"client": "172.17.0.1",
"compression": "2.75",
"referer": "http://localhost/somewhere",
"request": "POST /not-found HTTP/1.1",
"size": 153,
"status": 404,
"timestamp": "2021-04-01T12:02:31Z",
"user": "alice"
}
With your logs in a structured format, it becomes significantly easier to set up custom parsing rules for monitoring, alerting, and visualization using log management tools like Better Stack.
For those times in development when you prefer to read logs that are easier on the eyes, you can use tools designed to colorize and prettify logs, such as humanlog , or check if your framework offers built-in solutions for log beautification.
The utility of logs is directly tied to the quality of the information they contain. Entries filled with irrelevant or unclear information will inevitably be ignored, undermining the entire purpose of logging.
Approach log message creation with consideration for the reader, who might be your future self. Write clear and informative messages that precisely document the event being captured.
Including ample contextual fields within each log entry helps you understand the context in which the event was captured, and lets you link related log entries to see the bigger picture. Essential details can include:
Here's an example of a log entry without sufficient context:
{
"timestamp": "2023-11-06T14:52:43.123Z",
"level": "INFO",
"message": "Login attempt failed"
}
And here's one with just enough details to piece together who performed the action, why the failure occurred, and other meaningful contextual data.
{
"timestamp": "2023-11-06T14:52:43.123Z",
"level": "INFO",
"message": "Login attempt failed due to incorrect password",
"userId": "12345",
"sourceIP": "192.168.1.25",
"attemptNumber": 3,
"sessionID": "xyz-session-456",
"service": "user-authentication",
"deviceInfo": "iPhone 12; iOS 16.1",
"location": "New York, NY"
}
Do explore the Open Web Application Security Project’s (OWASP) compilation of recommended event attributes for additional insights into enriching your log entries.
Learn more: Log Formatting Best Practices
For systems that generate voluminous amounts of data daily, reaching into gigabytes or terabytes, log sampling is an invaluable cost-control strategy that involves selectively capturing a subset of logs that are representative of the whole, allowing the remainder to be safely omitted without affecting troubleshooting needs.
This targeted retention significantly lowers the demands on log storage and processing, yielding a far more cost-effective logging process.
A basic log sampling approach is capturing a predetermined proportion of logs at set intervals. For instance, with a sampling rate of 20%, out of 10 occurrences of an identical event within one second, only two would be recorded, and the rest discarded.
func main() {
log := zerolog.New(os.Stdout).
With().
Timestamp().
Logger().
Sample(&zerolog.BasicSampler{N: 5})
for i := 1; i <= 10; i++ {
log.Info().Msg("a log message: %d", i)
}
}
For more nuanced control, advanced sampling methods can be deployed, such as adjusting sampling rates based on the content within the logs, varying rates according to the severity of log levels or selectively bypassing sampling for logs that contain specific fields deemed critical.
Log sampling can be implemented directly within the application, provided that the logging framework accommodates such a feature. Alternatively, the sampling process can be incorporated into your logging pipeline when the logs are aggregated and centralized.
It's crucial to introduce log sampling in your logging process sooner rather than later before costs become an issue.
Most modern applications are composed of various services dispersed across numerous servers and cloud environments, with each one contributing to a massive, multifaceted stream of log data.
In such systems, aggregating and centralizing logs is not just a necessity but a strategic approach to gaining holistic insight into your application's performance and health.
By funneling all logs into a centralized log management system, you'll create a singular, searchable source of truth that simplifies monitoring, analysis, and debugging efforts across your entire infrastructure.
Implementing a robust log aggregation and management system allows you to correlate events across services, accelerate incident response by enabling quicker root cause analysis, and ensure regulatory compliance in data handling and retention while reducing storage and infrastructure costs by consolidating multiple logging systems.
With the right tools and strategies, proper log management turns a deluge of log data into actionable insights, promoting a more resilient and performant application ecosystem.
Learn more: What is Log Aggregation? Getting Started and Best Practices
When aggregating and centralizing your logs, a crucial cost-controlling measure is configuring a retention policy.
Log management platforms often set their pricing structures based on the volume of log data ingested and its retention period.
Without periodically expiring or archiving your logs, costs can quickly spiral, especially when dealing with hundreds of gigabytes or terabytes of data. To mitigate this, establish a retention policy that aligns with your organizational needs and regulatory requirements.
This policy should specify how long logs must be kept active for immediate analysis and at what point they can be compressed and moved to long-term, cost-effective storage solutions or purged entirely.
You can apply different policies to different categories of logs. The most important thing is to consider the value of the logs over time and ensure that your policy reflects the balance between accessibility, compliance, and cost.
Remember also to set up an appropriate log rotation strategy to keep log file sizes in check on your application hosts.
The mishandling of sensitive information in logs can have severe repercussions, as exemplified by the incidents at Twitter and GitHub in 2018.
Twitter inadvertently stored plaintext passwords in internal logs, leading to a massive password reset initiative. GitHub also encountered a less extensive but similar issue where user passwords were exposed in internal logs.
Although there was no indication of exploitation or unauthorized access in these cases, they underscore the critical importance of ensuring sensitive information is never logged.
To prevent the accidental inclusion of sensitive data in your logs, vigilance in reviewing every logging statement is vital.
A practical approach is to hide sensitive information at the application level such that even if an object containing sensitive fields is logged, the confidential information is either omitted or anonymized.
For instance, in Go's Slog package, this is achievable by
implementing the LogValuer
interface to control which struct fields are
included in logs:
package main
import (
"log/slog"
"os"
)
type User struct {
ID string `json:"id"`
FirstName string `json:"first_name"`
LastName string `json:"last_name"`
Email string `json:"email"`
Password string `json:"password"`
}
func (u *User) LogValue() slog.Value {
return slog.StringValue(u.ID)
}
func main() {
handler := slog.NewJSONHandler(os.Stdout, nil)
logger := slog.New(handler)
u := &User{
ID: "user-12234",
FirstName: "Jan",
LastName: "Doe",
Email: "jan@example.com",
Password: "pass-12334",
}
logger.Info("info", "user", u)
}
Implementing the LogValuer
interface above prevents all the fields of the
User
struct from being logged. Instead, only the ID
field is logged:
{"time":"2023-11-08T08:10:54.942944689+02:00","level":"INFO","msg":"info","user":"user-12234"}
It's a good practice to always implement such interfaces for any custom objects you create. Even if an object doesn't contain sensitive fields today, they may be introduced in the future, resulting in leaks if it's being logged somewhere.
Redacting sensitive data can also be done outside the application through your logging pipeline to address cases that slip through initial filters.
You can catch a broader variety of patterns and establish a unified redaction strategy for all your applications, even if they're developed in different programming languages.
The main disadvantage here is that you will face a performance penalty since pattern matching can be pretty expensive, especially when done through regular expressions.
Learn more: Best Logging Practices for Safeguarding Sensitive Data
Logging is essential, but it's important to recognize that it always incurs a performance cost in your application. This cost can be exacerbated by excessive logging, using an inefficient framework, or maintaining a suboptimal pipeline.
To illustrate, let's consider a basic Go application server:
package main
import (
"fmt"
"log"
"net/http"
)
func main() {
http.HandleFunc("/login", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Login successful")
})
fmt.Println("Starting server at port 8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
This server, when tested without logging, handles around 192k requests per second:
wrk -t 1 -c 10 -d 10s --latency http://localhost:8080/login
. . .
Requests/sec: 191534.19
Transfer/sec: 23.75MB
However, introducing logging with Logrus , a popular Go logging library leads to a 20% performance drop:
func main() {
l := logrus.New()
l.Out = io.Discard
l.Level = logrus.InfoLevel
l.SetFormatter(&logrus.JSONFormatter{})
http.HandleFunc("/login", func(w http.ResponseWriter, r *http.Request) {
l.WithFields(logrus.Fields{
"user_id": 42,
"event": "login",
"ip_address": "192.168.1.100",
}).Info("User login event recorded")
fmt.Fprintf(w, "Login successful")
})
. . .
}
. . .
Requests/sec: 152572.85
Transfer/sec: 18.92MB
In contrast, adopting the newly introduced Slog package results in a much more modest performance reduction of about 3%:
func main() {
l := slog.New(slog.NewJSONHandler(io.Discard, nil))
http.HandleFunc("/login", func(w http.ResponseWriter, r *http.Request) {
l.Info(
"User login event recorded",
slog.Int("user_id", 42),
slog.String("event", "login"),
slog.String("ip_address", "192.168.1.100"),
)
fmt.Fprintf(w, "Login successful")
})
. . .
}
. . .
Requests/sec: 187070.78
Transfer/sec: 23.19MB
These examples highlight the importance of choosing efficient logging tools to balance between logging needs and maintaining optimal application performance.
While logs are crucial for observing and troubleshooting system behavior, they shouldn't be used for monitoring. Since they only capture predefined events and errors, they aren't suitable for trend analysis or anomaly detection.
Metrics, on the other hand, excel in areas where logs fall short. They provide a continuous and efficient stream of data regarding various application behaviors and help define thresholds that necessitate intervention or further scrutiny.
They can help you answer questions like:
Metrics allow for the tracking of these parameters over time, presenting a dynamic and evolving picture of system behavior, health, and performance.
Their inherent structure and lightweight nature make them ideal for aggregation and real-time analysis. This quality is crucial for creating dashboards that help identify trends and patterns in system behavior.
Save hours of sifting through your application logs. Centralize with Better Stack and start visualizing your log data in minutes.
See the live demo.
Starting with these 10 logging practices is a solid step towards better application logs. However, continual monitoring and periodic reviews are essential to ensure that your logs continue to fulfill ever evolving business needs.
Thanks for reading, and happy logging!
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usWrite a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github