Back to Scaling Python Applications guides

A Comprehensive Guide to Profiling in Python

Stanley Ulili
Updated on March 10, 2025

Python provides profiling tools that allow you to identify performance bottlenecks and optimize your code.

The standard library offers powerful profiling modules like cProfile and profile, combined with visualization tools like snakeviz, providing comprehensive insights into your application's execution flow.

This article will guide you through creating and implementing a profiling strategy for your Python applications.

Prerequisites

Before continuing, make sure you have a recent version of Python (version 3.13 or higher) installed on your local machine. This guide assumes you're already comfortable with basic Python concepts.

Step 1 — Getting started with Python profiling

For the best learning experience, set up a fresh Python project to experiment directly with the concepts introduced in this tutorial.

Begin by creating a new directory and setting up a virtual environment:

 
mkdir python-profiling && cd python-profiling
 
python3 -m venv venv

Activate the virtual environment:

 
source venv/bin/activate

Let's start with a common recursive algorithm you'll use throughout this article to demonstrate different profiling techniques - the Fibonacci sequence.

Create a file named main.py with the following content:

main.py
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

if __name__ == "__main__":
    result = fibonacci(30)
    print(f"Fibonacci(30) = {result}")

This function calculates the n-th Fibonacci number recursively. While simple to understand, this implementation has exponential time complexity - it recalculates the same Fibonacci numbers repeatedly, making it inefficient for larger values of n.

Let's see just how inefficient it is with some manual timing:

main.py
import time
def fibonacci(n): if n <= 1: return n return fibonacci(n - 1) + fibonacci(n - 2) if __name__ == "__main__":
start_time = time.time()
result = fibonacci(30)
end_time = time.time()
print(f"Fibonacci(30) = {result}")
print(f"Time taken: {end_time - start_time:.4f} seconds")

Here, you use the built-in time module to measure manually precisely how long the recursive Fibonacci calculation takes.

Run your script with the following command:

 
python main.py

You'll see output like:

Output
Fibonacci(30) = 832040
Time taken: 0.1044 seconds

While this basic timing approach tells us how long the function takes to run, it doesn't provide insights into what's happening inside it.

For complex applications with many functions, measuring total execution time doesn't help identify bottlenecks. This is where profiling comes in.

Step 2 — Basic profiling with the built-in cProfile module

While manual timing gives us a general idea of overall execution time, it has several limitations:

  • It only shows total execution time, not where time is spent within the function
  • No call count information. We can't see how many times each function was called
  • We don't see the relationships between function calls
  • You need to add timing code around every function you want to measure

This is where Python's built-in profiling tools become invaluable. The standard library includes a powerful module called cProfile that provides detailed insights without requiring you to modify your code extensively.

Let's use cProfile to analyze our Fibonacci function in a new file called main.py:

main.py
import cProfile
import pstats
import io
def fibonacci(n): if n <= 1: return n return fibonacci(n - 1) + fibonacci(n - 2) if __name__ == "__main__":
# Create and start the profiler
profiler = cProfile.Profile()
profiler.enable()
# Run the code we want to profile
result = fibonacci(30)
print(f"Fibonacci(30) = {result}")
# Disable the profiler and print stats
profiler.disable()
# Format and display the results
s = io.StringIO()
stats = pstats.Stats(profiler, stream=s).sort_stats('cumulative')
stats.print_stats(20) # Print top 20 functions
print(s.getvalue())

In the highlighted blocks, you first import Python's built-in profiling modules:

  • cProfile for gathering detailed performance data
  • pstats for organizing and analyzing that data
  • io for conveniently formatting the output

Next, you create a profiling instance (profiler) and activate it using profiler.enable(). With profiling active, you run the fibonacci(30) function to capture detailed execution metrics. Once the function finishes executing, you deactivate the profiler (profiler.disable()).

Finally, you use pstats.Stats to process and sort the profiling data based on cumulative execution time, outputting the top 20 function calls to quickly identify the most time-consuming parts of your code.

Run this script:

 
python main.py

You should observe output similar to the following:

Output
Fibonacci(30) = 832040
         2692537 function calls (30 primitive calls) in 0.412 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2692537/1    0.412    0.000    0.412    0.412 /Users/username/python-profiling/main_profiled.py:5(fibonacci)
        1    0.000    0.000    0.412    0.412 {built-in method builtins.exec}
        1    0.000    0.000    0.412    0.412 /Users/username/python-profiling/main_profiled.py:1(<module>)

This output reveals something striking - our fibonacci function was called 2,692,537 times to calculate the 30th Fibonacci number!

Let's understand what each column in the output means:

  • ncalls: The number of calls to each function. For entries showing two numbers (e.g., 2692537/1), it represents total calls versus primitive (non-recursive) calls. Specifically, 2692537/1 means the function was invoked over 2.6 million times, but only 1 call was direct.
  • tottime: Total time spent in the function, excluding time spent in calls to other functions.
  • percall: Average time spent per call (tottime/ncalls).
  • cumtime: Cumulative time spent in the function, including time spent in calls to other functions.
  • filename:lineno(function): The location and name of the function.

With this information, you can immediately identify algorithm inefficiencies, prioritize optimization efforts, and make data-driven decisions about where to focus your performance tuning work.

The profile data clearly shows that our naive recursive implementation is extremely inefficient due to the millions of redundant function calls.

However, working with the profiling data this way can still be cumbersome, especially for larger applications. Let's make profiling easier with a reusable solution.

Step 3 — Creating a reusable profiling decorator

While directly using cProfile works well for one-off profiling, you'll often want to profile multiple functions in real projects.

Instead of repeating the same profiling code, let's create a reusable decorator that can be applied to any function.

Create a new file named profiler.py:

profiler.py
import cProfile
import pstats
import io
from functools import wraps

def profile(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        # Create and start profiler
        pr = cProfile.Profile()
        pr.enable()

        # Call the original function
        result = func(*args, **kwargs)

        # Stop profiling
        pr.disable()

        # Format and print results
        s = io.StringIO()
        ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
        ps.print_stats(20)
        print(s.getvalue())

        return result
    return wrapper

This decorator wraps any function with profiling code, making it easy to profile any part of your application.

The @wraps(func) decorator from functools preserves the original function's metadata, which is essential for debugging and documentation.

For our Fibonacci example, you shouldn't apply the decorator directly to the recursive Fibonacci function itself, as this would cause issues with nested profiling.

Instead, let's create a wrapper function:

main.py
from profiler import profile
def fibonacci(n): if n <= 1: return n return fibonacci(n - 1) + fibonacci(n - 2)
@profile
def run():
result = fibonacci(30)
print(f"Fibonacci(30) = {result}")
if __name__ == "__main__":
run()

Now you can add the @profile decorator to any non-recursive function you want to profile. Run this script:

 
python main.py
Output
Fibonacci(30) = 832040
         2692540 function calls (4 primitive calls) in 0.464 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.464    0.464 /Users/username/profiling-python/python-profiling/main.py:10(run)
2692537/1    0.463    0.000    0.463    0.463 /Users/username/profiling-python/python-profiling/main.py:4(fibonacci)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

You'll see the same profiling information as before, but now you have a reusable tool that can be applied to any function with a single line of code. This approach is much more maintainable for larger projects where you must profile different parts of your codebase.

Step 4 — Saving profile data to files

To analyze profiling data more thoroughly or share it with team members, we should enhance our profiler to save data to files. Let's modify our profiler.py:

profiler.py
import cProfile
import pstats
import io
from functools import wraps

def profile(func=None, output_file=None):
def decorator(f):
@wraps(f)
def wrapper(*args, **kwargs):
# Create and start profiler
pr = cProfile.Profile()
pr.enable()
# Call the original function
result = f(*args, **kwargs)
# Stop profiling
pr.disable()
# Print formatted results to console
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats(20)
print(s.getvalue())
# Save to file if requested
if output_file:
ps.dump_stats(output_file)
print(f"Profile data saved to {output_file}")
return result
return wrapper
# Handle both @profile and @profile(output_file='stats.prof') syntax
if func is None:
return decorator
return decorator(func)

Here, you've updated the function signature of profile to include an optional output_file parameter, enabling you to specify a file location for saving profiling results.

You've also introduced new file-saving functionality. the added if output_file: block writes profiling data to the specified file using pstats.Stats.dump_stats(). Afterward, a confirmation message informs you that the file was created successfully.

The most significant structural change is the introduction of nested decorators. Previously, your decorator was straightforward—it always accepted a function directly. Now, it must handle two scenarios: being used directly (@profile) or with arguments (@profile(output_file='profile.stats')).

This enhancement allows you to save profiling data in a binary format for later analysis. Now, let's update your main.py file to take advantage of this new feature:

main.py
from profiler import profile

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

@profile(output_file='fibonacci.prof')
def run(): result = fibonacci(30) print(f"Fibonacci(30) = {result}") if __name__ == "__main__": run()

In the highlighted line, you've used the enhanced @profile decorator with the new output_file argument.

This modification instructs the profiler to save profiling results directly into a binary file named fibonacci.prof

Now, run your script to generate and save the profiling data:

 
python main.py

After the script runs, you'll receive a confirmation message indicating the profile data was successfully saved to this file:

Output
Fibonacci(30) = 832040
         2692540 function calls (4 primitive calls) in 0.440 seconds

   Ordered by: cumulative time

   ....

Profile data saved to fibonacci.prof

You can verify the file exists:

 
ls -l fibonacci.prof
Output
-rw-r--r--@ 1 stanley  staff  523 Mar  7 14:23 fibonacci.prof

This .prof file contains all your profiling data in a binary format. The benefit of saving to a file is that you can:

  1. Analyze profiles offline without re-running the code
  2. Compare profiles from different runs
  3. Share profile data with team members
  4. Use external tools to visualize and analyze the data

Saving profile data becomes especially valuable when profiling production systems or long-running processes where you can't quickly examine the immediate console output.

Step 5 — Visualizing profile data with snakeviz

With the profile data saved to a file, you can visualize it using specialized tools. One of the most popular is snakeviz, which creates interactive graphical representations of Python profiling data.

First, install snakeviz:

 
pip install snakeviz

Now, visualize the profile data you saved in the previous step:

 
snakeviz fibonacci.prof
Output

snakeviz web server started on 127.0.0.1:8080; enter Ctrl-C to exit
http://127.0.0.1:8080/snakeviz/<path-to-the-file/fibonacci.prof

This command opens your default web browser with an interactive visualization:

Snakeviz visualization of Fibonacci

When SnakeViz loads, you'll see several key components:

  • Control buttons at the top, including "Reset Root" and "Reset Zoom" for navigation

  • Visualization configuration dropdowns for Style, Depth, and Cutoff

  • The main visualization area in the center, showing your profiling data

  • A data table at the bottom with detailed function statistics

The default view shows an "Icicle" visualization, which displays your code's execution as nested rectangles. Each rectangle represents a function, with the width indicating the proportion of execution time.

At the top of the page, you'll find the "Style" dropdown to switch between visualization types.

Switch to the Sunburst view to quickly identify functions taking up significant execution time:

Screenshot of the Sunburst visualization

The Sunburst visualization creates a circular chart where each function call radiates outward from the center, clearly illustrating the hierarchical relationship and the proportion of time spent in each call.

The Sunburst view is particularly valuable for understanding recursive functions. In this visualization:

  • Each ring represents a deeper level of recursion
  • The size of each segment shows the proportion of time spent
  • The center represents the entry point of your program
  • Moving outward shows deeper function calls

The detailed statistics table at the bottom provides the same information you saw in the text-based profile output, but in a sortable, interactive format. Click on any column header to sort by that metric, helping you quickly identify the most expensive functions by different criteria.

Screenshot of the statistics table

Combining intuitive visual representations and detailed statistics, SnakeViz clearly illustrates your application's performance, helping you quickly pinpoint areas for optimization.

Step 6 — Optimizing code based on profiling results

Now that you've identified performance bottlenecks using profiling and visualization, it's time to apply this knowledge to optimize your code.

Based on our profiling data, you can see that the recursive implementation makes millions of redundant function calls.

Let's improve the Fibonacci implementation using an iterative approach:

main.py
from profiler import profile
import time
def fibonacci(n):
if n <= 1:
return n
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
@profile(output_file='fibonacci_iterative.prof') def run():
n = 30
start = time.time()
result = fibonacci(n)
end = time.time()
print(f"Fibonacci({n}) = {result}")
print(f"Time taken: {end - start:.6f} seconds")
if __name__ == "__main__": run()

The iterative implementation eliminates recursion entirely. Instead of calling itself repeatedly, it uses a simple loop to calculate each Fibonacci number once, storing only the two most recent values (a and b) at any time.

Now run the code:

 
python main.py
Output
Fibonacci(30) = 832040
Time taken: 0.000003 seconds
         7 function calls in 0.000 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 /Users/stanley/python-profiling/main.py:15(run)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 /Users/stanley/python-profiling/main.py:5(fibonacci)
        2    0.000    0.000    0.000    0.000 {built-in method time.time}

Profile data saved to fibonacci.prof

The difference in performance is striking. Not only is the computation nearly instantaneous, but look at the profiling data:

  • Only 7 total function calls compared to over 2.6 million in the recursive version

  • Negligible execution time (microseconds instead of seconds)

  • Linear O(n) time complexity instead of exponential O(2ⁿ)

Visualize the profile with snakeviz:

 
snakeviz fibonacci_iterative.prof

You'll see a dramatically simpler visualization with far fewer function calls and much shorter execution time:

Screenshot of the visualization profile

The visualization is dramatically different from the recursive implementation. Instead of the deep, complex call tree that we saw previously, the visualization shows a simple, flat structure:

  • The main run() function at the top level

  • A single call to fibonacci() that takes minimal time

  • A few built-in function calls like print and time.time

The visualization confirms what your profiling data showed - the optimized version runs in constant time with minimal overhead. The print operations consume more time than the Fibonacci calculation itself!

The statistics table at the bottom of the visualization provides additional confirmation, showing that the actual fibonacci function now consumes just microseconds of processing time - an improvement of over 100,000x compared to the recursive version.

This exercise illustrates the power of profiling-driven optimization: pinpointing the exact performance issue (redundant recursive calls) allowed you to implement a targeted solution that drastically improved performance.

Step 7 — Conclusion

This article has guided you through a comprehensive Python profiling workflow, from simple timing to sophisticated visualization. We explored cProfile, custom decorators, data persistence, and graphical analysis with SnakeViz.

The practical examples demonstrated how profiling can transform inefficient code into highly optimized implementations. This approach applies equally to complex applications - always profile before optimizing to ensure you're targeting actual bottlenecks.

Thanks for reading and happy coding!

Author's avatar
Article by
Stanley Ulili
Stanley Ulili is a technical educator at Better Stack based in Malawi. He specializes in backend development and has freelanced for platforms like DigitalOcean, LogRocket, and AppSignal. Stanley is passionate about making complex topics accessible to developers.
Got an article suggestion? Let us know
Next article
Getting Started with Poetry
Discover how Poetry transforms Python development with streamlined dependency management and virtual environments in one cohesive tool. Learn to create projects, manage dependencies with lockfiles, and organize package groups while ensuring reproducible builds across machines.
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github