Python Monitoring with Prometheus (Beginner's Guide)
This article provides a detailed guide on integrating Prometheus metrics into your Python application.
It explores key concepts, including instrumenting your application with various metric types, monitoring HTTP request activity, and exposing metrics for Prometheus to scrape.
The complete source code for this tutorial is available in this GitHub repository.
Let's get started!
Prerequisites
- Prior experience with Python and Flask, along with a recent version of Python installed.
- Familiarity with Docker and Docker Compose.
- Basic understanding of how Prometheus works.
Step 1 — Setting up the demo project
To demonstrate Prometheus instrumentation in Python applications, let's set up a simple "Hello World" Flask application along with the Prometheus server.
First, clone the repository to your local machine and navigate into the project directory:
Here's the Flask application you'll be instrumenting:
This app exposes two endpoints: / returns a simple with "Hello world!"
message, and /metrics endpoint that will eventually expose the instrumented
metrics.
This project also includes a compose.yaml file, which defines two services:
The app service is the Flask application running on port 8000, while
prometheus configures a Prometheus server to scrape the Flask app via the
prometheus.yml file:
Before starting the services, rename .env.example to .env. This file
contains the application's PORT setting:
Rename it with:
Then launch both services in detached mode with:
You should see output similar to this:
To confirm that the Flask application is running, send a request to the root endpoint:
This should return:
To verify that Prometheus is able to access the exposed /metrics endpoint,
visit http://localhost:9090/targets in your browser:
With everything up and running, you're ready to integrate Prometheus in your Python application in the next step.
Step 2 — Installing the Prometheus Client
Before instrumenting your Flask application with Prometheus, you need to install the official Prometheus client for Python applications.
Open your requirements.txt file and include the latest version of the
prometheus_client package:
Then rebuild the app service by running the command below to ensure that the
prometheus_client dependency is installed:
Once the app service restarts, you may integrate Prometheus into your
application by modifying main.py as follows:
This modification introduces the prometheus_client package and its
generate_latest() function, which collects and returns metrics in a format
that Prometheus can scrape.
Once you've saved the file, visit http://localhost:8000/metrics in your
browser or use curl to see the default Prometheus metrics:
By default, Prometheus uses a global registry that automatically includes standard Python runtime and process-level metrics:
If you want to disable these and expose only specific metrics, you need to create a custom registry:
Since no custom metrics are registered yet, the /metrics endpoint will return
an empty response now. If you'd like to retain the default metrics, you can
import and register all default collectors as follows:
With these modifications, the default metrics will be exposed along with any custom metrics you register later on.
In the following sections, you will instrument the application with different metric types, including Counters, Gauges, Histograms, and Summaries.
Step 3 — Instrumenting a Counter metric
Let's start with a fundamental metric that tracks the total number of HTTP requests made to the server. Since this value always increases, it is best represented as a Counter.
Edit your main.py file to include counter instrumentation:
This implementation creates a Counter metric named http_requests_total with
labels for status code, path, and HTTP method. It uses Flask's after_request()
hook to automatically count all HTTP requests by incrementing the counter after
each request is processed and capturing the actual response status.
If you refresh http://localhost:8000/metrics several times, you'll see output
like:
For each counter metric in your application, Prometheus Python client creates two metrics:
- The actual counter (
http_requests_total) - A creation timestamp gauge (
http_requests_created)
If you want to disable this behavior, you can use the
disable_created_metrics() function:
With this setup, you'll no longer see the _created metrics for all counters:
You can view your metrics in the Prometheus client by heading to
http://localhost:9090. Then type http_requests_total into the query box and
click Execute to see the raw values:
You can switch to the Graph tab to visualize the counter increasing over time:
In the next section, we'll explore how to instrument a Gauge metric!
Step 4 — Instrumenting a Gauge metric
A Gauge represents a value that can fluctuate up or down, making it ideal for tracking real-time values such as active connections, queue sizes, or memory usage.
In this section, we'll use a Prometheus Gauge to monitor the number of active requests being processed by the service.
Modify your main.py file to include the following:
The active_requests_gauge metric is created using Gauge() to track the
number of active HTTP requests at any given moment.
In the before_request() hook, the gauge is incremented when a new request
starts processing. In after_request(), the gauge is decremented when the
request is completed.
To observe the metric, you can add some random delay to the / route as
follows:
Then use a load testing tool like wrk to generate
requests to the / route:
Visiting the /metrics endpoint on your browser will show something like:
This indicates that there are currently 101 active requests being processed by your service.
You can also view the changing gauge values over time in Prometheus's Graph view
at http://localhost:9090:
Tracking absolute values
If you need a Gauge that tracks absolute but fluctuating values, you can set the value directly instead of incrementing or decrementing it.
For example, to track the current memory usage of the Flask application, you can define a gauge and use it to record the current memory usage of the process like this:
The collect_memory_metrics() function runs in a background thread to
continuously update the memory_usage_gauge metric every second. Here, set()
is used instead of inc/dec to set absolute values.
Here's the output you'll see in your /metrics endpoint:
Next up, you'll instrument a Histogram metric to track HTTP request latency.
Step 5 — Instrumenting a Histogram metric
Histograms are useful for tracking the distribution of measurements, such as
HTTP request durations. In Python, creating a Histogram metric is
straightforward with the Histogram class from prometheus_client.
Modify your main.py file to include the following:
The latency_histogram metric is created to track the duration of each request
to the server. With such a metric, you can:
- Track response time distributions,
- Calculate percentiles (like p95, p99),
- Identify slow endpoints,
- Monitor performance trends over time.
Before a request is processed, the middleware stores the request start time. After the request completes, the middleware calculates the total duration and records it in the histogram.
After saving the file and refreshing the application a few times, visiting
http://localhost:8000/metrics will display the recorded histogram data:
Let's understand what this output means:
- Each
_bucketline represents the number of requests that took less than or equal to a specific duration. For example,le="0.025"} 4means four requests completed within 25 milliseconds. - The
_sumvalue is the total of all observed durations. - The
_countvalue is the total number of observations.
The histogram uses these default buckets (in seconds):
If these buckets don't suit your needs, you can specify custom ones:
The real power of histograms comes when analyzing them in Prometheus. For example, to calculate the 99th percentile latency over a 1-minute window you can use:
This query will show you the response time that 99% of requests fall under, which is more useful than averages for understanding real user experience.
With the histogram metric successfully instrumented, the next step is to explore how to track additional insights using a Summary metric.
Step 6 — Instrumenting a Summary metric
A Summary metric in Prometheus is useful for capturing pre-aggregated quantiles, such as the median, 95th percentile, or 99th percentile, while also providing overall counts and sums for observed values.
Unlike a histogram, which allows aggregation across instances on the Prometheus server, a Summary metric calculates quantiles directly on the client side. This makes it valuable when quantile calculations need to be performed independently per instance without relying on Prometheus for aggregation.
To set up a Summary metric for monitoring request latency, update your main.py
file as follows:
The posts_latency_summary metric tracks the duration of requests to an
external API. In the /posts endpoint, the start time of the request is
recorded before sending a GET request to the API.
Once the request completes, the duration is calculated and recorded in the
Summary metric using posts_latency_summary.observe(duration).
After saving the your changes, add the requests package to your
requirements.txt file as follows:
Then rebuild the app service with:
Once the service is up, send requests to the /posts endpoint using a tool like
wrk to generate latency data:
The metrics endpoint will show output like:
Unfortunately, the prometheus_client package does not currently support
quantiles which makes this output useless. To fix this, you may use the
prometheus-summary package
instead. It is fully compatible with native client Summary class and adds
support of configurable quantiles:
Once you rebuild the app service and send some load to the /posts endpoint
once again, you will now see the post_request_duration_seconds metric with the
following precomputed quantiles:
The median request time is about 341 milliseconds (0.341 seconds), 90% of requests complete within 355 milliseconds (0.355 seconds), and 99% complete within 498 milliseconds (0.498 seconds).
If you'd like to customize the quantiles, you can provide the invarients
argument with quantile-precision pairs. The default is:
In the Prometheus web interface, entering the metric name will display recorded latency values:
Final thoughts
In this tutorial, we explored setting up and using Prometheus metrics in a Python application.
We covered how to define and register different types of metrics - counters for tracking cumulative values, gauges for fluctuating measurements, histograms for understanding value distributions, and summaries for calculating client-side quantiles.
To build on this foundation, you might want to:
- Set up Prometheus Alertmanager to create alerts based on your metrics.
- Connect your metrics to Grafana or Better Stack for powerful visualization and dashboarding.
- Explore PromQL to write more sophisticated queries for analysis.
Don't forget to see the final code used in this tutorial on GitHub.
Thanks for reading, and happy monitoring!