A Beginner's Guide to the OpenTelemetry Demo
The OpenTelemetry Demo provides a practical, hands-on environment for exploring and implementing OpenTelemetry.
It demonstrates how OpenTelemetry can be used to instrument and monitor a microservice-based application featuring realistic workflows and fault simulations to illustrate the effective use of the OpenTelemetry SDK, API, Collector, and other components.
In this article, we'll delve into the architecture of the OpenTelemetry Demo, examine how its services interact, and explain how telemetry data is collected and analyzed to help you better understand and adopt OpenTelemetry in your own systems.
Let's get started!
Prerequisites
- Some familiarity with OpenTelemetry's basic concepts.
- Docker and Docker Compose installed. I recommend using OrbStack on macOS instead of Docker Desktop.
What is the OpenTelemetry Demo?
The OpenTelemetry Demo simulates an online astronomy-themed retail shop. It is a fictional e-commerce application serves as a realistic environment for showcasing everything OpenTelemetry has to offer for instrumenting and monitoring a distributed system.
The shop features Astronomy-related items, such as telescopes, star charts, and space-themed merchandise. It also includes a shopping workflow involving browsing products, adding items to a cart, checking out, and receiving an order confirmation—typical interactions in an e-commerce platform.
It was developed with the following primary goals:
To showcase how OpenTelemetry operates in a complex environment with multiple interconnected services built in different technologies.
To allow you observe distributed tracing, metrics, and logs in action for monitoring and troubleshooting an application and learn how to implement them in your own services.
To serve as a foundational platform for vendors and tooling developers to showcase their OpenTelemetry integrations.
For OpenTelemetry contributors, the demo provides a functional, living environment to test and validate new versions of the API, SDK, or other OpenTelemetry components before updates are released to the broader community.
Components
The services that comprise the demo can be divided into four broad categories:
Core services: These are microservices written in different programming languages that talk to each other over gRPC and HTTP. These services are all instrumented with OpenTelemetry to produce traces, metrics, and logs.
Dependencies: Services that support the application services, such as Redis, Kafka, and Valkey.
Telemetry services: Components that deal with the telemetry data generated by the above services like the OpenTelemetry Collector, Prometheus, Grafana, OpenSearch, and Jaeger.
Utility services: These are additional services that provide specific functionality to support the core application such as the
load-generator
,flagd
, andflagdui
service.
Let's look at how you can set up the demo on your local machine next.
Setting up the OpenTelemetry Demo
Setting up the OpenTelemetry Demo is straightforward with Docker, and there's also an option to deploy it through Kubernetes if that's your preference.
Begin by cloning its GitHub repository to your local machine:
git clone https://github.com/open-telemetry/opentelemetry-demo.git
Then navigate into the cloned repository with:
cd opentelemetry-demo
You can now start the demo with docker compose
:
docker compose up --force-recreate --remove-orphans --detach
Since the demo involves a large number of containers and services, downloading and building all the necessary Docker images might take several minutes depending on your internet speed.
Once the setup is complete, you should see output similar to the following:
. . .
[+] Running 26/26
✔ Network opentelemetry-demo Created 0.2s
✔ Container flagd Started 2.2s
✔ Container grafana Started 2.5s
✔ Container opensearch Healthy 12.6s
✔ Container valkey-cart Started 2.5s
✔ Container prometheus Started 2.5s
✔ Container kafka Healthy 17.1s
✔ Container jaeger Started 2.1s
✔ Container otel-collector Started 12.7s
✔ Container cart-service Started 14.5s
✔ Container flagdui Started 14.5s
✔ Container quote-service Started 13.4s
✔ Container frauddetection-service Started 17.4s
✔ Container currency-service Started 14.1s
✔ Container email-service Started 14.1s
✔ Container imageprovider Started 13.5s
✔ Container ad-service Started 14.8s
✔ Container accounting Started 17.4s
✔ Container product-catalog-service Started 14.1s
✔ Container shipping-service Started 14.8s
✔ Container payment-service Started 13.3s
✔ Container recommendation-service Started 14.6s
✔ Container checkout-service Started 17.2s
✔ Container frontend Started 17.5s
✔ Container load-generator Started 18.0s
✔ Container frontend-proxy Started 18.5s
The Docker Compose setup includes 25 containers, most of which represent the microservices in the demo application. Additionally, the setup includes:
prometheus
,grafana
,opensearch
, andjaeger
: For inspecting and visualizing telemetry data.load-generator
: Uses Locust to simulate user traffic.flagd
,flagdui
: Provides support for changing feature flags through a user interface.
Once all services are up and running, open your browser and navigate to
http://localhost:8080
to interact with the demo application:
Scrolling down reveals the available products for purchase:
Clicking on a product takes you to its details page, where you can select a quantity and add it to your shopping cart:
The shopping cart displays the selected products, calculated total, and demo payment options:
Scroll to the bottom of the cart and click the Place Order button:
You'll be directed to an order confirmation page, verifying that the demo is functioning as expected:
As you interact with the application, multiple microservices communicate and work together to handle your actions.
These interactions are fully instrumented with OpenTelemetry, allowing you to collect and analyze traces, metrics, and logs with the included telemetry tools or with a vendor you're currently evaluating.
Exploring the telemetry instrumentation and coverage
Each service in the demo application uses its respective OpenTelemetry SDKs to collect telemetry data—traces, metrics, and logs. However, the extent of coverage for each telemetry type varies by service, reflecting differences in functionality and maturity of instrumentation.
Trace coverage
Tracing is the most comprehensively implemented telemetry type in the demo. Most services feature robust trace instrumentation, including:
- Automatic and manual span creation
- Span enrichment
- Context propagation
Features like baggage and span links are selectively implemented in services with complex interactions, such as the Checkout and Fraud Detection services.
Metric coverage
The metric coverage varies significantly, with most services having incomplete instrumentation. Advanced features like multiple instruments, views, and exemplars are largely missing across the board.
However, the existing metrics are sufficient to demonstrate core OpenTelemetry metric collection capabilities.
Log coverage
Logs are the least developed aspect of telemetry instrumentation in the demo as only a few services have implemented OpenTelemetry Protocol (OTLP) log export.
Context propagation
. . .
// Check baggage for synthetic_request=true, and add charged attribute accordingly
const baggage = propagation.getBaggage(context.active());
if (baggage && baggage.getEntry('synthetic_request') && baggage.getEntry('synthetic_request').value === 'true') {
span.setAttribute('app.payment.charged', false);
} else {
span.setAttribute('app.payment.charged', true);
}
. . .
Context propagation is a key feature in OpenTelemetry that enables the correlation of telemetry signals regardless of where they are generated. Here's how it's implemented in the demo:
- Trace headers (such as
traceparent
andtracestate
) are passed along with requests as they travel between services. - Baggage is used to carry additional context across service boundaries. In the demo, it is used to annotate synthetic requests from the load generator.
- Some metrics collected in the demo include trace exemplars, which are detailed samples of individual traces associated with specific metrics.
Telemetry collection and export
The generated traces, metrics, and logs are sent to the OpenTelemetry Collector via gRPC and exported to the following services:
- Prometheus: Scrapes the metrics and exemplars generated by the services.
- Grafana: Visualizes metric data in customizable dashboards.
- Jaeger: Processes and displays distributed traces.
- OpenSearch: Used to centralize logging data from services.
The configuration for the Collector defines how the telemetry data is received, processed, and exported. Below is a snippet of the configuration:
receivers:
otlp:
protocols:
grpc:
endpoint: ${env:OTEL_COLLECTOR_HOST}:${env:OTEL_COLLECTOR_PORT_GRPC}
http:
endpoint: ${env:OTEL_COLLECTOR_HOST}:${env:OTEL_COLLECTOR_PORT_HTTP}
cors:
allowed_origins:
- "http://*"
- "https://*"
. . .
exporters:
debug:
otlp:
endpoint: "jaeger:4317"
tls:
insecure: true
otlphttp/prometheus:
endpoint: "http://prometheus:9090/api/v1/otlp"
tls:
insecure: true
opensearch:
logs_index: otel
http:
endpoint: "http://opensearch:9200"
tls:
insecure: true
processors:
batch:
. . .
connectors:
spanmetrics:
service:
pipelines:
traces:
receivers: [otlp]
processors: [transform, batch]
exporters: [otlp, debug, spanmetrics]
metrics:
receivers: [hostmetrics, docker_stats, httpcheck/frontendproxy, otlp, prometheus, redis, spanmetrics]
processors: [batch]
exporters: [otlphttp/prometheus, debug]
logs:
receivers: [otlp]
processors: [batch]
exporters: [opensearch, debug]
For traces, a single OTLP receiver is configured to accept data over both gRPC
and HTTP. The trace data is enriched using a transform
processor, batched for
efficient handling, and then exported to Jaeger for visualization, the debug
exporter for troubleshooting, and spanmetrics
for generating trace-derived
metrics.
Metrics are collected from various sources, including system-level data with
hostmetrics
and docker stats, and service-level data through OTLP and
Prometheus receivers. The data is processed in batches and exported to
Prometheus.
Finally, logs are collected via the OTLP receiver, then batched and exported to OpenSearch for centralized storage and querying.
Within the Grafana instance (exposed at http://localhost:8080/grafana/
), you
can select the OpenTelemetry Collector Data Flow dashboard to monitor both
egress and ingress metrics, along with the observability data
flow within the system:
Viewing trace data in Jaeger
To explore the trace data generated by the application's services, open the
Jaeger UI in your browser at http://localhost:8080/jaeger/ui/
.
On the System Architecture page, you can view a high-level visualization of how the various components in the system interact with one another.
The DAG (Directed Acyclic Graph) tab provides insights into the flow of calls between services by showing the number of calls from one service to another.
In the Force Directed Graph tab, you can click on a service and highlight all the components it interacts with, which helps simplify the debugging process by narrowing down dependencies.
Switching to the Search page, you'll find a list of recorded traces corresponding to service interactions once you click Find Traces:
By clicking on a trace, you can drill down into the details of a specific request. This reveals the services involved in processing the request and the time each service took, allowing you to identify potential performance bottlenecks.
The application activity observed is due to the load-generator
component,
which simulates user traffic using Locust. You can observe its activity at
http://localhost:8080/loadgen/
:
If you're interested in tracking specific user actions in Jaeger, I recommended stopping the load generator temporarily.
This ensures that only the traces corresponding to your actions in the application are recorded, making it easier to identify and analyze them.
Viewing metric data in Prometheus and Grafana
Service metrics in the OpenTelemetry Demo are collected and stored in
Prometheus, whose UI is accessible at http://localhost:9090
:
While Prometheus provides raw data and query capabilities, visualizing the metrics is much easier and more effective with the pre-built dashboards in Grafana.
These dashboards display key metrics like latency, request rates, and resource usage for each service in a user-friendly format.
To access the Grafana dashboards, navigate to
http://localhost:8080/grafana/dashboards
in your browser. You'll be greeted
with the following set of default dashboards:
Click on the Demo Dashboard entry and select a specific service to view detailed metric graphs. For example, selecting the adservice will show visualizations for metrics like response time, request count, and CPU usage:
Simulating service faults with feature flags
The OpenTelemetry Demo utilizes flagd to implement feature flagging based on the OpenFeature specification. This setup allows for dynamic control over application behavior without the need for redeployment to enable the simulation of various scenarios and fault conditions.
It includes a range of feature flags that can simulate realistic application behaviors and faults, such as:
paymentServiceFailure
: Simulates an error when thecharge
method is invoked in the Payment Service.cartServiceFailure
: Causes theEmptyCart
method to fail in the Cart Service.kafkaQueueProblems
: Overloads the Kafka queue and introduces consumer-side delays, leading to lag spikes.
To manage these feature flags, access the Flagd Configurator at
http://localhost:8080/feature
:
The configurator offers two views:
- Basic: Allows you to toggle predefined values for each flag, such as on or off.
- Advanced: Displays the raw JSON configuration for direct editing, providing greater flexibility for customization.
By default, all feature flags are disabled. To simulate a failure, you can
enable the cartServiceFailure
flag by toggling it on and clicking Save:
With the flag activated, the Empty Cart button in the application will stop functioning as expected:
These simulated faults are also reflected in the telemetry data. In Jaeger, errors will begin appearing in new traces:
Examining the trace spans in Jaeger will pinpoint the source of the errors to
the cartservice
, correlating directly to the feature flag change:
Since this is only simulated behavior, you can revert the feature flag change to restore normal operation.
However, in a real-world scenario, you'll just as easily know where to find the underlying bug in the code to fix the problem or at least what to investigate further.
Final thoughts
The OpenTelemetry Demo offers a great way to explore the capabilities of OpenTelemetry while learning how to effectively use its various components.
Even running the demo application for a few minutes generates a significant amount of telemetry data, which helps you understand how the services interact within a distributed system.
For a deeper understanding, you can dive into the code for services written in your preferred programming languages to see how they are instrumented and how the collector ties everything together.
The flexibility of the OpenTelemetry Collector also makes the demo an excellent tool for evaluating and comparing different observability backends. You can specify multiple backends and see how each vendor handles the telemetry data.
With the help of feature flags, you can also simulate faults to see which tools help you identify and resolve issues the fastest.
Thanks for reading!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github