A Beginner's Guide to the OpenTelemetry Collector
The first step towards observability with OpenTelemetry is instrumenting your application to enable it to generate essential telemetry signals such as traces, logs, and metrics.
Once telemetry data is being generated, it must be sent to a backend tool that may perform many functions, including analysis, visualization, and alerting.
While you could send this data directly to the observability backend, using an intermediary tool between your services and the backend offers significant advantages.
In this article, we'll examine the reasons behind the growing popularity of the OpenTelemetry Collector, and why it is often the recommended intermediary tool for building observability pipelines.
Prerequisites
Before proceeding with this article, ensure that you're familiar with basic OpenTelemetry concepts.
What is the OpenTelemetry Collector?
The Collector is a core element of the OpenTelemetry observability framework, acting as a neutral intermediary for collecting, processing, and forwarding telemetry signals (traces, metrics, and logs) to an observability backend.
It aims to simplify your observability setup by eliminating the need for multiple agents for different telemetry types. Instead, it consolidates everything into a single, unified collection point.
This approach not only streamlines your setup but also acts as a buffer between your applications and your observability backends to provide a layer of abstraction and flexibility.
It natively supports the OpenTelemetry Protocol (OTLP) but also accommodates other formats like Jaeger, Prometheus, Fluent Bit, and others. Its vendor-neutral design also lets you export your data to various open-source or commercial backends.
Built on Go and licensed under Apache 2.0, the OpenTelemetry Collector encourages you to extend its functionality by incorporating custom components. This flexibility is invaluable when you need to extend its capabilities beyond standard use cases.
Benefits of using OpenTelemetry Collector
While sending telemetry data directly to an observability backend might seem convenient at first, using the OpenTelemetry Collector as a middleman between your services and the backend offers significant advantages for building a more flexible and resilient observability pipeline.
Let's delve into a few of the most compelling reasons:
1. Preventing vendor lock-in
Direct telemetry reporting or using a vendor-specific agent can create a tight coupling between your services and the specific backend you're using. This makes it challenging to switch backends in the future or even experiment with multiple backends simultaneously.
With the OpenTelemetry Collector, you can effectively decouple your applications from any specific observability backend. By configuring the collector to send data to various backends, or even multiple backends at once, you have the freedom to choose the best tools for your needs without being locked into a single platform.
If you ever decide to migrate to a different backend, you only need to update the collector's configuration, and not your entire application codebase.
2. Consolidation of observability tooling
Using the OpenTelemetry Collector can simplify your observability stack by acting as a unified collection point for telemetry data from various sources. By supporting various open-source and commercial protocols and formats for logs, traces, and metrics, it eliminates the need for multiple agents and shippers which reduces complexity and cognitive load for your engineering teams.
3. Filtering sensitive data
A common challenge in observability is the inadvertent logging of sensitive information, such as API keys or user data like credit card numbers, by monitored services. Without a collector, this data could be exposed within your observability system, posing a significant security risk.
The Collector addresses this by allowing you to filter and sanitize your telemetry data before it's exported. This ensures compliance and strengthens your security posture by preventing sensitive information from reaching the backend.
4. Reliable and efficient data delivery
The OpenTelemetry Collector optimizes telemetry data transmission through efficient batching and retries to minimize network overhead and ensure reliable data delivery even in the face of network disruptions.
5. Managing costs
Through features like filtering, sampling, and aggregation, the Collector can help you move away from a "spray and pray" approach to signal collection by selectively reducing the amount of data transmitted. This allows you to focus on the most relevant information, minimizing unnecessary storage and analysis costs.
6. The OpenTelemetry Collector is observable
A core strength of the OpenTelemetry Collector lies in its inherent observability. It doesn't just collect and process telemetry data from your applications; it also meticulously monitors its own performance and health by emitting logs, metrics, and traces, to allow you to track key performance indicators, resource utilization, and potential bottlenecks.
This level of transparency fosters confidence in your observability pipeline, guaranteeing that the very tool responsible for gathering insights also remains under close observation.
How the OpenTelemetry Collector works
At a high level, the OpenTelemetry Collector operates in three primary stages:
Data reception: It collects telemetry data from a variety of sources, including instrumented applications, agents, and other collectors. This is done through
receiver
components.Data processing: It uses
processors
to process the collected data, performing tasks like filtering, transforming, enriching, and batching to optimize it for storage and analysis.Data transmission: It sends the processed data to various backend systems, such as observability platforms, databases, or cloud services, through
exporters
for storage, visualization, and further analysis.
By combining receivers, processors, and exporters in the Collector configuration, you can create pipelines which serve as a separate processing lane for logs, traces, or metrics. Data enters from various sources, undergoes transformations via processors, and is ultimately delivered to one or more backends through exporters.
Connector components can also link one pipeline's output to another's input allowing you to use the processed data from one pipeline as the starting point for another. This enables more complex and interconnected data flows within the Collector.
Installing the OpenTelemetry Collector
There are several ways to install the OpenTelemetry Collector, and each release comes with pre-built binaries for Linux, macOS, and Windows. For the complete list of options, refer to the official docs.
The key decision is choosing the appropriate distribution to install.
Core: This contains only the most essential components along with frequently used extras like filter and attribute processors, and popular exporters such as Prometheus, Kafka, and others. It's distributed under the
otelcol
binary name.Contrib: This is the comprehensive version, including almost everything from both the core and contrib repositories, except for components that are still under development. It's distributed under the
otelcol-contrib
binary name.Kubernetes: This distribution is tailored for use within a Kubernetes cluster to monitor the Kubernetes infrastructure and the various services deployed within it. It's distributed under the
otelcol-k8s
binary name.
There are also third-party distributions provided by various vendors, which are tailored for easier deployment and integration with their specific backends.
The contrib distribution generally recommend for most users since it includes a wider range of components and out-of-the-box functionality to address various observability needs.
The easiest way to get started with the Collector is through the official Docker images which you can download using:
docker pull otel/opentelemetry-collector:latest # OpenTelemetry core
docker pull otel/opentelemetry-collector-contrib:latest # OpenTelemetry contrib
docker pull otel/opentelemetry-collector-k8s:latest # OpenTelemetry K8s
For more advanced users, the OpenTelemetry Collector Builder offers the ability to create a custom distribution containing only the components you need from the core, contrib, or even third-party repositories. While beyond the scope of this article, we'll be sure to explore this in a future tutorial.
Configuring the OpenTelemetry Collector
The Collector's configuration is managed through a YAML file. On Linux, this
file is typically found at /etc/<otel-directory>/config.yaml
, where
<otel-directory>
varies based on the specific Collector version or
distribution you're using (e.g., otelcol
, otelcol-contrib
).
You can also provide a custom configuration file when starting the Collector
using the --config
option:
otelcol --config=/path/to/otelcol.yaml
For Docker, mount your custom configuration file as a volume when launching the container with:
docker run -v $(pwd)/otelcol.yaml:/etc/otelcol-contrib/config.yaml otel/opentelemetry-collector-contrib:latest
The configuration can also be loaded from other sources, such as environmental variables, YAML strings, or even external URLs, offering great flexibility in how you choose to manage your settings.
otelcol --config=env:OTEL_COLLECTOR_CONFIG
otelcol --config=https://example.com/otelcol.yaml
otelcol --config="yaml:exporters::debug::verbosity: normal"
If multiple --config
flags are provided, they will be merged into a final
configuration.
The configuration also automatically expands environment variables within the configuration so that you can keep sensitive data, like API secrets, secure outside of the version-controlled configuration files.
processors:
attributes/example:
actions:
- key: ${env:API_SECRET}
action: ${env:OPERATION}
Here's a quick overview of the basic structure of a Collector configuration file:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
processors:
batch:
exporters:
otlp:
endpoint: jaeger:4317
extensions:
health_check:
service:
extensions: [health_check]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
This configuration sets up an OpenTelemetry Collector that receives trace data
via the OTLP protocol over HTTP on port 4318, applies batch
processing, and
then exports the processed traces to a Jaeger endpoint located at jaeger:4317
.
It also includes a health_check
extension for monitoring the collector's
status.
Each component within the configuration is assigned a unique identifier using
the format type/<name>
. The <name>
part is optional if you only have a
single instance of a particular component type.
However, when you need to define multiple components of the same type, providing
a distinct <name>
for each one becomes necessary:
processors:
batch:
batch/2:
send_batch_size: 10000
timeout: 10s
batch/test:
timeout: 1s
The service
section is also crucial, as it controls which configured
components are enabled. Any component not mentioned there is silently ignored,
even if it's configured in other sections.
Once you're done configuring your Collector instance, ensure to validate the
configuration with the validate
command:
otelcol validate --config=/path/to/config.yaml
In the next section, we'll dive deeper into the individual components of the Collector configuration.
Exploring the OpenTelemetry Collector components
Let's now delve into the heart of the OpenTelemetry Collector: its components. In this section, we'll explore the building blocks that enable the Collector to receive, process, and export telemetry data.
We'll cover receivers, processors, exporters, extensions, and connectors, understanding their roles and how they work together to create a powerful and flexible observability pipeline.
Let's begin with the receivers first.
Receivers
Receivers are the components responsible for collecting telemetry data from various sources, serving as the entry points into the Collector.
They gather traces, metrics, and logs from instrumented applications, agents, or other systems, and translate the incoming data into OpenTelemetry's internal format, preparing it for further processing and export.
For the Collector to work properly, your configuration needs to include and enable at least one receiver. The core distribution includes the versatile OTLP receiver, which can be used in trace, metric, and log pipelines:
receivers:
otlp:
The oltp
receiver here starts an HTTP and gRPC server at localhost:4318
and
localhost:4317
respectively, then waits for the instrumented services to
connect and start transmitting data in the OTLP format.
Similarly, many other receivers come with default settings, so specifying the receiver's name is enough to configure it. To change the default configuration, you may override the default values. For example, you may disable the gRPC protocol by simply not specifying it in the list of protocols:
receivers:
otlp:
protocols:
http:
You can also change the default endpoint through http.endpoint
:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
The contrib repository boasts over 90 additional receivers, catering to a wide array of data formats and protocols, including popular sources like Jaeger, Prometheus, Apache Kafka, PostgreSQL, Redis, AWS X-Ray, GCP PubSub, and many more.
Processors
Processors are components that modify or enhance telemetry data as it flows through the pipeline. They perform various operations on the collected telemetry data, such as filtering, transforming, enriching, and batching so that it is ready to be exported.
While no processors are enabled by default, you'll typically want to include the
batch
processor:
processors:
batch:
This processor groups spans, metrics, or logs into time-based and size-based batches, enhancing efficiency. Additionally, it supports sharding data based on client metadata, allowing for effective multi-tenant data processing even with high volumes.
Another processor in the otelcol
core distribution is the
memory_limiter
which helps prevent out-of-memory errors by periodically checking service memory
usage against defined limits:
processors:
memory_limiter:
check_interval: 5s
limit_mib: 4000 # 4 mebibytes hard limit
spike_limit_mib: 800 # soft limit is `limit_mib` minus `spike_limit_mib` (3200)
It operates with a soft and a hard limit. Exceeding the soft limit results in new data rejection until memory is freed up. Breaching the hard limit triggers garbage collection so that memory usage drops below the soft limit.
This mechanism adds back pressure to the Collector, making it resilient to overload. However, it requires receivers to handle data rejections gracefully, usually through retries with exponential backoff.
Beyond these, the contrib repository offers several other processors for tasks like filtering sensitive data, adding geolocation details, appending Kubernetes metadata, and more.
Exporters
Exporters serve as the final stage in the Collector's pipeline and are responsible for sending processed telemetry data to various backend systems such as observability platforms, databases, or cloud services, where the data is stored, visualized, and analyzed.
To operate, the Collector requires at least one exporter configured through the
exporters
property.
Here's an sample configuration exporting trace data to a local Jaeger instance:
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
This configuration defines an exporter named otlp/jaeger
that targets a local
Jaeger instance listening on port 4317 via gRPC. The insecure: true
setting
disables encryption, which is not recommended for production environments.
For a broader range of destinations, the contrib repository provides various other exporters, supporting diverse observability platforms, databases, and cloud services.
Extensions
Extensions add supplementary features to the OpenTelemetry Collector beyond the core data collection, processing, and export function. They offer features like health checks, performance profiling, authentication, and integration with external systems.
Here's a sample configuration for extensions:
extensions:
pprof:
health_check:
zpages:
The pprof
extension here enables Go's net/http/pprof
endpoint on
http://localhost:1777
so that you can collect performance profiles and
investigate issues with the service.
The health_check
extension offers an HTTP URL (http://localhost:13133/
by
default) that can be used to monitor the collector's status.
You can use this URL to implement liveness checks (to check if the collector is
running) and readiness checks (to confirm if the collector is ready to accept
data).
A new and improved health check extension is currently being developed to enable individual components within the collector (like receivers, processors, and exporters) to provide their own health status updates.
The zPages
extension is equally useful. It provides various HTTP endpoints for
monitoring and debugging the Collector without relying on any backend. This
enables you to inspect traces, metrics, and the collector's internal state
directly, assisting in troubleshooting and performance optimization.
Authentication extensions also play a vital role in security by allowing you to authenticate both incoming connections at the receiver level and outgoing requests at the exporter level.
Beyond these examples, the contrib repository offers a wide array of extensions to further expand the Collector's capabilities.
Connectors
Connectors are specialized components that bridge the different pipelines within the OpenTelemetry Collector.
They function as both an exporter for one pipeline and a receiver for another, allowing telemetry data to flow seamlessly between pipelines, even if they handle different types of data.
Some use cases for connectors are:
Conditional routing: Direct telemetry data to specific pipelines based on predefined rules, ensuring that the right data reaches the appropriate destination for processing or analysis.
Data replication: Create copies of data and send them to multiple pipelines, enabling diverse processing or analysis approaches.
Data summarization: Condense large volumes of telemetry data into concise overviews for easier comprehension.
Data transformation: Convert one type of telemetry data into another, such as transforming raw traces into metrics for simplified aggregation and alerting.
The connectors
section in your Collector configuration file is where you
define these connections. Note that each connector is designed to work with
specific data types and can only connect pipelines that handle those types.
connectors:
count:
logs:
app.event.count:
description: "Log count by event"
attributes:
- key: event
For instance, the count
connector can count various telemetry data types. In
the above example, it groups incoming logs based on the event
attribute and
counts the occurrences of each event type. The result is exported as the metric
app.event.count
, allowing you to track the frequency of different events in
your logs.
Services
The service
section specifies which components, such as receivers, processors,
exporters, connectors, and extensions, are active and how they are
interconnected through pipelines. If a component is configured but not defined
within the service
section, it will be silently ignored.
It consists of three subsections which are:
1. Extensions
The service.extensions
subsection determines which of the configured
extensions will be enabled:
service:
extensions: [health_check, pprof, zpages]
2. Pipelines
The service.pipelines
subsection configures the data processing pathways
within the Collector. These pipelines are categorized into three types:
traces
, metrics
, and logs
.
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
Each pipeline comprises a collection of receivers, processors, and exporters.
Note that each component must be configured in their respective sections
(receivers
, processors
, exporters
) before incorporating them into a
pipeline.
Pipelines can have multiple receivers feeding data to the first processor. Each processor processes and passes data to the next, potentially dropping some if sampling or filtering is applied. The final processor distributes data to all exporters in the pipeline, ensuring each receives a copy of the processed data.
3. Telemetry
The service.telemetry
section within the Collector configuration focuses on
controlling the telemetry data generated by the Collector itself.
Metrics
service:
telemetry:
metrics:
address: 0.0.0.0:8888
level: detailed
Metrics are exposed through a Prometheus interface, which defaults to port
8888
and there are four verbosity levels:
none
: No metrics are collected.basic
: The most essential service telemetry.normal
: The default level which adds a few more standard indicators tobasic
-level metrics.detailed
: The most verbose level which emits additional low-level metrics like HTTP and RPC statistics.
You can also configure the Collector to scrape its metrics with a Prometheus receiver and send them through configured pipelines, but this could put your telemetry data at risk if the Collector isn't performing optimally.
Metrics cover resource consumption, data rates, drop rates, throttling states, connection counts, queue sizes, latencies, and more. For the full list, refer to the internal metrics page.
Logs
service:
telemetry:
logs:
OpenTelemetry Collector logs are outputted to the standard error by default and
you can use the operating environment's logging mechanisms (journalctl
,
docker logs
, etc) to view and manage the logs.
Logs provide insights into Collector events like startups, shutdowns, data
drops, and crashes. Just like with metrics, you can configure a verbosity
level (defaults to INFO
) as well as log sampling
policy, static metadata fields, and whether to encode the logs
in JSON format.
Under the hood, the Collector uses Uber's highly regarded Zap library to write the logs.
Traces
While the Collector doesn't currently expose traces by default, there's ongoing work to change that. This involves adding the ability to configure the OpenTelemetry SDK used for the Collector's internal telemetry. For now, this functionality is controlled by the following feature gate:
otelcol --config=config.yaml --feature-gates=telemetry.useOtelWithSDKConfigurationForInternalTelemetry
Once enabled, you can then register a service.telemetry.traces
section like
this:
service:
telemetry:
traces:
processors:
batch:
exporter:
otlp:
protocol: grpc/protobuf
endpoint: jaeger:4317
Understanding feature gates
The OpenTelemetry Collector's feature gates offer a valuable way to manage the adoption of new features by allowing them to be easily turned on or off. This provides a safe environment for testing and experimenting with new functionalities in production without fully committing to them.
Each feature gate typically progresses through a lifecycle similar to Kubernetes:
- Alpha: The feature is initially disabled by default and requires explicit activation.
- Beta: The feature becomes enabled by default but can be deactivated if necessary.
- Stable: The feature is considered fully integrated and generally available, and the feature gate is removed, leaving it permanently enabled.
In some cases, features might be deprecated if they prove unworkable. Such features remain available for a limited time (typically two additional releases) before being removed completely.
You can control feature gates using the --feature-gates
flag:
otelcol --config=config.yaml --feature-gates=transform.flatten.logs
To disable a feature gate, prefix its identifier with a -
:
otelcol --config=config.yaml --feature-gates=-transform.flatten.logs
If you use the
zPages extension,
you can see all the feature gates you have enabled by going to
http://localhost:55679/debug/featurez
:
Final thoughts
Throughout this article, we've explored the key concepts of OpenTelemetry Collector, so you should now have a good grasp of its capabilities and how it can help you build effective observability pipelines.
For a deeper dive into configuring the OpenTelemetry Collector, I recommend exploring the opentelemetry-collector and opentelemetry-collector-contrib repositories on GitHub and their official docs. These contain extensive documentation and examples that will guide you through setting up and tailoring the Collector to your specific requirements.
The best way to follow the development of the Collector is through its GitHub repo. In particular, you will find the changes that are being planned for upcoming releases on the roadmap page on GitHub. An official #otel-collector channel on the CNCF Slack also exists for community discussions.
Thanks for reading, and until next time!
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for usBuild on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github