Benchmarking in Go: A Comprehensive Handbook

Performance optimization is crucial for building efficient applications, but without proper measurement, optimization becomes mere guesswork. As Donald Knuth famously stated, "premature optimization is the root of all evil." This is where benchmarking comes in.

Go stands out among programming languages by providing built-in benchmarking as part of its standard library. This native support reflects Go's philosophy of making performance testing accessible to all developers, not just performance specialists.

Benchmarking in Go allows you to:

Measure code performance with microsecond precision.
Compare implementation alternatives.
Detect performance regressions.
Understand memory allocation patterns.
Make data-driven optimization decisions.

This guide will walk you through everything you need to know about benchmarking in Go, from basic concepts to advanced techniques.

Getting started with Go benchmarks

Go benchmarks are functions that live in *_test.go files, just like unit tests. While tests begin with Test, benchmarks follow a specific naming convention:

Copied!

func BenchmarkXxx(b *testing.B) {
    // benchmark code
}

The benchmark function must:

Start with Benchmark.
Accept a *testing.B parameter.
Be in a file with a _test.go suffix.

The testing.B type provides the benchmarking infrastructure, including timing, iteration control, and reporting facilities.

Let's create a simple benchmark for a string concatenation function:

contact.go

Copied!

package concat

func JoinStrings(strs []string) string {
    var result string
    for _, s := range strs {
        result += s
    }
    return result
}

concat_test.go

Copied!

package concat

import "testing"

func BenchmarkJoinStrings(b *testing.B) {
    strs := []string{"Hello", ", ", "world", "!"}

    // The benchmark runner will call this function b.N times
    for i := 0; i < b.N; i++ {
        JoinStrings(strs)
    }
}

To run a benchmark, use the go test command with the -bench flag:

Copied!

go test -bench=.

Output

goos: linux
goarch: amd64
pkg: github.com/betterstack-community/golang-benchmarks
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
BenchmarkJoinStrings-16          9762195               123.0 ns/op
PASS
ok      github.com/betterstack-community/golang-benchmarks      1.330s

This means:

The benchmark ran on 16 CPU cores (-16 suffix).
It executed 9762195 times.
Each operation took approximately 123 nanoseconds>

The fastest log
search on the planet

Better Stack lets you see inside any stack, debug any issue, and resolve any incident.

Understanding b.N

The benchmark framework automatically determines the value of b.N by running your benchmark multiple times with increasing values until it gets a statistically significant result.

The framework starts with a small value (usually 1) and increases it until the benchmark runs for a sufficient duration (default is 1 second). This is why your benchmark function must execute the code under test b.N times:

Copied!

func BenchmarkSomething(b *testing.B) {
    // Optional setup code

    b.ResetTimer() // Reset the timer if setup took significant time

    for i := 0; i < b.N; i++ {
        // Code you want to measure
    }
}

Often, benchmarks require setup and teardown code that shouldn't be included in the timing measurements:

Copied!

func BenchmarkComplexOperation(b *testing.B) {
    // Setup
    data := createLargeDataset()

    // Reset the timer to exclude setup time
    b.ResetTimer()

    for i := 0; i < b.N; i++ {
        processData(data)
    }

    // Optionally pause timer during cleanup
    b.StopTimer()
    cleanupResources()
}

The key timing control methods include:

b.ResetTimer(): Resets the timer to zero.
b.StartTimer(): Resumes the timer after it was stopped.
b.StopTimer(): Temporarily stops the timer.

Note that the Go compiler might optimize away code that doesn't have observable effects, potentially invalidating your benchmark:

Copied!

func BenchmarkMightBeOptimizedAway(b *testing.B) {
    for i := 0; i < b.N; i++ {
        // This computation might be eliminated by the compiler
        // since its result is never used
        math.Sqrt(float64(i))
    }
}

To prevent this, ensure the result is used:

Copied!

func BenchmarkPreventOptimization(b *testing.B) {
    var result float64
    for i := 0; i < b.N; i++ {
        result += math.Sqrt(float64(i))
    }
    // Use the result to prevent optimization
    if result < 0 {
        b.Fatalf("negative result: %f", result)
    }
}

Introducing b.Loop

Go 1.24 introduces a cleaner, more efficient approach to benchmarking with the testing.B.Loop method, which addresses several nuances and potential pitfalls of the traditional b.N loop:

Copied!

func BenchmarkStringConversion(b *testing.B) {
    // Setup - prepare a large integer to convert to string
    number := 9876543210
    b.ResetTimer()

    // We need a result variable to prevent optimization
    var result string

    for i := 0; i < b.N; i++ {
        // The operation we want to benchmark
        result = strconv.Itoa(number)
    }

    // Prevent compiler from optimizing away the unused result
    if len(result) == 0 {
        b.Fatal("unexpected empty string")
    }
}

Several issues arise with this approach:

The benchmark function runs multiple times, causing setup code to execute repeatedly.
You must remember to call b.ResetTimer() to exclude setup time from measurements.
You need to use a result variable and ensure it's used somehow to prevent the compiler from optimizing away your benchmark code.

The new b.Loop() approach eliminates these concerns:

Copied!

func BenchmarkStringConversion(b *testing.B) {
    // Setup - prepare a large integer to convert to string
    number := 9876543210

    // No need for b.ResetTimer() - everything outside the loop is excluded
    // No need for a result variable to prevent optimization

    for b.Loop() {
        // The operation we want to benchmark
        strconv.Itoa(number)
    }
}

Key advantages of b.Loop():

The benchmark function executes only once per -count, so setup code runs just once
Code outside the b.Loop() doesn't affect benchmark timing, eliminating the need for b.ResetTimer()
The compiler won't optimize away function calls within a b.Loop() body, even if results aren't used.

This results in benchmarks that are easier to write, less error-prone, and potentially more accurate by avoiding repeated setup overhead.

Note that your benchmarks should use either b.Loop() or a b.N-style loop, but not both in the same benchmark function.

Benchmarking different types of code

Go's benchmarking framework is versatile enough to handle various code patterns and structures. Whether you're benchmarking simple functions, methods on structs, concurrent operations, or memory-intensive processes, the framework provides appropriate tools and approaches.

With the introduction of the b.Loop() method in Go 1.24, benchmarking becomes even more straightforward and less error-prone across these different scenarios. Let's explore how to effectively benchmark various types of Go code using this improved approach.

Function benchmarks

We've already seen simple function benchmarks. For functions with parameters, ensure to create representative inputs:

Copied!

func BenchmarkCalculate(b *testing.B) {
    // Prepare realistic input data
    input := generateRepresentativeData()

    for i := 0; i < b.N; i++ {
        Calculate(input)
    }
}

Method benchmarks

Copied!

func BenchmarkProcessor_Process(b *testing.B) {
    processor := NewProcessor(/* config */)
    data := generateTestData()

    for b.Loop() {
        processor.Process(data)
    }
}

Method benchmarks are similar to function benchmarks but involve struct instances:

Concurrent code benchmarks

For benchmarking concurrent code, you may need to synchronize goroutines:

Copied!

func BenchmarkConcurrentOperation(b *testing.B) {
    for b.Loop() {
        var wg sync.WaitGroup
        wg.Add(10)

        for j := 0; j < 10; j++ {
            go func() {
                defer wg.Done()
                // Concurrent operation
                processItem()
            }()
        }

        wg.Wait()
    }
}

Memory allocation benchmarks

Go allows benchmarking memory allocations as well as execution time:

Copied!

func BenchmarkMemoryIntensive(b *testing.B) {
    // Report memory allocations
    b.ReportAllocs()

    for b.Loop() {
        createLargeData()
    }
}

Running with -benchmem flag provides allocation statistics:

Copied!

go test -bench=MemoryIntensive -benchmem

Output includes bytes allocated and allocations per operation:

Copied!

BenchmarkMemoryIntensive-8    100000    15234 ns/op    8192 B/op    16 allocs/op

As you've seen, the same core principles apply whether you're benchmarking a simple function or complex concurrent operations. The b.Loop() method simplifies all these cases by handling iteration count automatically and excluding setup code from timing measurements.

Now that we've covered the basics of benchmarking different code types, let's explore more advanced techniques that allow for more sophisticated performance analysis and comparative benchmarking.

Advanced benchmarking techniques

While basic benchmarks provide valuable insights, Go's benchmarking framework offers advanced capabilities that enable more sophisticated performance analysis.

These techniques help you benchmark across different parameters, compare multiple implementations, and gain deeper insights into performance characteristics under varying conditions.

The following approaches will help you create comprehensive benchmark suites that can identify subtle performance differences and guide your optimization efforts more effectively.

Subbenchmarks

Subbenchmarks allow running variants of a benchmark with different parameters:

Copied!

func BenchmarkSort(b *testing.B) {
   sizes := []int{100, 1000, 10000, 100000}

   for _, size := range sizes {
       b.Run(fmt.Sprintf("Size-%d", size), func(b *testing.B) {
           data := generateRandomSlice(size)

           for b.Loop() {
               // Create a copy to avoid measuring the sorting of already sorted data
               dataCopy := make([]int, len(data))
               copy(dataCopy, data)
               sort.Ints(dataCopy)
           }
       })
   }
}

Benchmark tables

Similar to table-driven tests, table-driven benchmarks help test multiple scenarios:

Copied!

func BenchmarkHashFunctions(b *testing.B) {
   benchmarks := []struct {
       name    string
       input   []byte
       hashFn  func([]byte) []byte
   }{
       {"MD5", []byte("test data"), md5Sum},
       {"SHA1", []byte("test data"), sha1Sum},
       {"SHA256", []byte("test data"), sha256Sum},
   }

   for _, bm := range benchmarks {
       b.Run(bm.name, func(b *testing.B) {
           for b.Loop() {
               bm.hashFn(bm.input)
           }
       })
   }
}

Parameterized input sizes

To understand how an algorithm performs with different input sizes:

Copied!

func BenchmarkSliceOperations(b *testing.B) {
   for _, size := range []int{10, 100, 1000, 10000} {
       slice := make([]int, size)
       for i := range slice {
           slice[i] = i
       }

       b.Run(fmt.Sprintf("Sum-%d", size), func(b *testing.B) {
           for b.Loop() {
               sum := 0
               for _, v := range slice {
                   sum += v
               }
               // Use sum to prevent optimization
               if sum < 0 {
                   b.Fatalf("negative sum")
               }
           }
       })
   }
}

Custom timing

For precise control over what gets timed:

Copied!

func BenchmarkWithPreciseControl(b *testing.B) {
   // Setup code not included in timing
   data := prepareData()

   for b.Loop() {
       // Only this operation is timed
       result := process(data)

       // With b.Loop(), we don't need to manually stop/start timers for cleanup
       // as only code in the Loop() is measured
       validate(result)
   }
}

Analyzing benchmark results

Standard benchmark output provides a wealth of information, though it appears deceptively simple. Consider this typical benchmark result:

Copied!

BenchmarkJoinStrings-8    5000000    264 ns/op    48 B/op    2 allocs/op

This condensed line tells us several important things about the benchmark execution:

The first section, BenchmarkJoinStrings-8, identifies the benchmark name followed by the number of CPU cores available during execution. This hyphenated suffix helps when comparing results across different machines.

The second figure, 5,000,000, represents the number of iterations the benchmark ran. The Go testing framework automatically determines this number by repeatedly running your benchmark with increasing iteration counts until it achieves statistical significance—typically aiming for a total run time of at least one second.

The third figure, 264 ns/op, is the average time per operation in nanoseconds. This is your primary performance metric, telling you how long, on average, each execution of your benchmarked code took.

When memory statistics are enabled with the -benchmem flag, you'll see two additional metrics: 48 B/op shows average memory allocated per operation (48 bytes in this case), and 2 allocs/op indicates the average number of distinct memory allocations per operation.

Comparing Benchmarks with benchstat

Raw benchmark numbers can be difficult to interpret, especially when comparing different implementations or tracking performance changes over time.

The benchstat tool, part of the Go erformance measurement toolkit, applies statistical analysis to benchmark results to provide more meaningful comparisons.

To use benchstat, first install it:

Copied!

go install golang.org/x/perf/cmd/benchstat@latest

Then, capture benchmark results from different versions of your code:

Copied!

go test -bench=. -count=10 > old.txt

When you make changes to your code, capture the new benchmark results in a different file:

Copied!

go test -bench=. -count=10 > new.txt

Then compare both results with:

Copied!

benchstat old.txt new.txt

Output

goos: linux
goarch: amd64
pkg: github.com/betterstack-community/golang-benchmarks
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
               │   old.txt    │            new.txt             │
               │    sec/op    │    sec/op     vs base          │
JoinStrings-16   74.27n ± 15%   73.98n ± 11%  ~ (p=0.684 n=10)

Here, the result shows:

Old implementation: 74.27 nanoseconds per operation with 15% variability.
New implementation: 73.98 nanoseconds per operation with 11% variability.

For the statistical analysis:

The tilde (~) indicates no statistically significant difference between the old and new implementations.
The p-value of 0.684 is well above the typical threshold of 0.05, confirming that the difference is not statistically significant.
"n=10" indicates that 10 samples were used for this statistical analysis.

In practical terms, this means that despite the small nominal improvement from 74.27ns to 73.98ns (about 0.4% faster), the high variability in the measurements (15% and 11%) and the high p-value (0.684) indicate that this difference is likely just random variation. The two implementations should be considered equivalent in performance.

This is a good example of why proper statistical analysis is important in benchmarking - looking at just the raw numbers might have led someone to incorrectly conclude that the new implementation was faster, when in fact there's no meaningful performance difference.

Final thoughts

Benchmarking in Go is more than just a development practice—it's a mindset that encourages performance-conscious programming. Go's testing package provides a robust framework for measuring, analyzing, and optimizing code performance without requiring external tools or complex setups.

Performance optimization without measurement is guesswork, but with Go's benchmarking tools, you can make data-driven decisions. By integrating benchmarking into your development workflow—whether through manual testing during development or automated performance monitoring in CI pipelines—you establish a foundation for maintaining and improving application performance over time.

Remember that the goal of benchmarking isn't just to make code faster—it's to understand the performance implications of your design choices and to ensure that your application meets its performance requirements consistently. A well-crafted benchmark suite serves as both documentation of your performance expectations and a safeguard against unexpected regressions.

Armed with these benchmarking techniques and best practices, you're well-equipped to build Go applications that are not only correct and maintainable but also performant and efficient.

Thanks for reading!

Got an article suggestion? Let us know

Timeouts in Go: A Comprehensive Guide

This article will show you how to implement timeouts effectively using Go's built-in concurrency primitives

→

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Benchmarking in Go: A Comprehensive Handbook

Contents

Getting started with Go benchmarks

Understanding b.N

Introducing b.Loop

Benchmarking different types of code

Function benchmarks

Method benchmarks

Concurrent code benchmarks

Memory allocation benchmarks

Advanced benchmarking techniques

Subbenchmarks

Benchmark tables

Parameterized input sizes

Custom timing

Analyzing benchmark results

Comparing Benchmarks with benchstat

Final thoughts

Make your mark

Join the writer's program

Build on top of Better Stack