Essential OpenTelemetry Collector Components for Observability Pipelines
Modern applications generate massive amounts of operational data. Capturing, processing, and routing this telemetry data efficiently presents significant challenges for engineering teams. The OpenTelemetry Collector addresses these challenges by providing a unified, vendor-neutral way to handle traces, metrics, and logs.
At its core, the OpenTelemetry Collector functions as a middleware layer between your applications and observability backends. Rather than instrumenting your code to send data to multiple destinations, the Collector acts as a central hub, receiving data from various sources and distributing it to multiple backends while handling tasks like batching, filtering, and transformation.
Understanding the OpenTelemetry Collector Architecture
The OpenTelemetry Collector's pipeline architecture consists of five fundamental building blocks:
- Receivers: Entry points that ingest telemetry data through various protocols and formats
- Processors: Components that modify, enhance, filter, or aggregate telemetry data
- Exporters: Components that send data to observability backends and storage systems
- Connectors: Elements that bridge different pipelines and convert between signal types
- Extensions: Add-ons that enhance core functionality with capabilities like health monitoring and authentication
These components connect through pipelines defined in the Collector's configuration, allowing for customizable data flows tailored to specific observability requirements.
Let's explore the most valuable components you should consider for your observability pipeline. You can find a comprehensive set of components in the OpenTelemetry Contrib repository.
Data Ingestion Components
The OTLP Receiver: Your Universal Data Ingress
The OpenTelemetry Protocol (OTLP) receiver serves as the standard entry point for telemetry data in the OpenTelemetry ecosystem. Supporting both gRPC and HTTP protocols, it provides a consistent, efficient way to ingest traces, metrics, and logs from applications instrumented with any OpenTelemetry SDK.
Key Features:
- Protocol flexibility with both gRPC and HTTP support
- Comprehensive signal type support (traces, metrics, logs)
- Efficient binary encoding for reduced network overhead
- Standardized data format that preserves semantic meaning
Configuration Example:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
Filelog Receiver
While many observability tools focus primarily on traces and metrics, logs remain essential for troubleshooting and understanding application behavior. The Filelog receiver bridges this gap by collecting logs directly from files on disk.
Key Features:
- Real-time log file tailing
- Support for various log formats including JSON and multiline logs
- Container log parsing (Docker, CRI-O, ContainerD)
- Kubernetes metadata extraction from container logs
- Flexible include/exclude path patterns
Configuration Example:
receivers:
filelog:
include:
- /var/log/applications/*.log
exclude:
- /var/log/applications/debug*.log
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<severity>[A-Z]+) (?P<message>.*)$'
Prometheus Receiver
For organizations already using Prometheus, the Prometheus receiver provides a seamless integration path. Despite its name, this component actively scrapes metrics from targets using Prometheus configurations, rather than passively receiving data.
Key Features:
- Compatible with existing Prometheus scrape configurations
- Support for service discovery mechanisms (including Kubernetes)
- Metric relabeling and filtering
- OpenMetrics format support with exemplars
Configuration Example:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'api-service'
scrape_interval: 10s
static_configs:
- targets: ['api-service:8080']
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
Kubeletstats Receiver
For Kubernetes environments, the Kubeletstats receiver provides specialized metrics collection directly from the kubelet API on each node, offering detailed resource utilization data for nodes, pods, and containers.
Key Features:
- Comprehensive Kubernetes resource metrics
- Direct metrics collection without requiring Prometheus
- Hierarchical data from node to pod to container
- Volume metrics for storage analysis
Configuration Example:
receivers:
kubeletstats:
collection_interval: 20s
auth_type: serviceAccount
endpoint: ${K8S_NODE_NAME}:10250
insecure_skip_verify: true
Data Processing Components
Batch Processor
The Batch processor significantly improves collector performance by grouping individual telemetry items into batches before forwarding them to exporters, reducing network overhead and backend load.
Key Features:
- Configurable batch size and timeout parameters
- Reduced network traffic and API calls
- Improved throughput for high-volume telemetry
- Lower backend database load
Configuration Example:
processors:
batch:
send_batch_size: 8192
timeout: 5s
send_batch_max_size: 0
K8sattributes Processor: Enriching Telemetry with Kubernetes Context
In Kubernetes environments, understanding which pods, deployments, and namespaces generate telemetry is crucial. The K8sattributes processor automatically enriches telemetry data with Kubernetes metadata.
Key Features:
- Pod, namespace, and deployment identification
- Node and cluster information
- Label and annotation extraction
- Owner references (ReplicaSets, DaemonSets, etc.)
Configuration Example:
processors:
k8sattributes:
auth_type: serviceAccount
passthrough: false
extract:
metadata:
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.namespace.name
- k8s.node.name
- k8s.pod.start_time
Attributes Processor: Customizing Your Telemetry Data
The Attributes processor provides fine-grained control over the attributes attached to your telemetry data, allowing you to insert, update, delete, or hash specific attributes based on your requirements.
Key Features:
- Attribute manipulation (insert, update, delete, hash)
- Support for all telemetry types (traces, metrics, logs)
- Regular expression matching for attribute selection
- Conditional attribute processing
Configuration Example:
processors:
attributes:
actions:
- key: db.statement
action: hash
- key: environment
value: production
action: insert
- key: http.url
action: delete
Filter Processor: Reducing Data Volume
The Filter processor allows you to selectively include or exclude telemetry data based on specific criteria, helping reduce data volume while preserving the most valuable signals.
Key Features:
- Independent filtering for traces, metrics, and logs
- Regular expression and strict matching
- OTTL-based filtering expressions
- Metric name and value-based filtering
Configuration Example:
processors:
filter:
metrics:
include:
match_type: regexp
metric_names: ["^(system|application).*"]
spans:
exclude:
match_type: strict
attributes:
- key: http.route
value: /health
Tail Sampling Processor
For high-volume tracing, examining every trace becomes impractical. The Tail Sampling processor applies sampling decisions after receiving complete traces, allowing you to keep traces that match specific criteria.
Key Features:
- Sampling based on trace attributes and properties
- Multiple policy types (rate limiting, numeric attribute, probabilistic)
- Error and latency-based sampling
- Complex policy combinations
Configuration Example:
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100
expected_new_traces_per_sec: 10
policies:
- name: error-policy
type: status_code
status_code: ERROR
- name: latency-policy
type: latency
latency_threshold_ms: 500
Transform Processor: Advanced Data Manipulation
The Transform processor leverages the OpenTelemetry Transformation Language (OTTL) to perform complex data transformations beyond what the Attributes processor can achieve.
Key Features:
- Conditional logic with OTTL expressions
- Advanced string and numeric operations
- Field and attribute manipulations
- Access to nested data structures
Configuration Example:
processors:
transform:
metric_statements:
- context: datapoint
statements:
- set(value, value * 1000) where metric.name == "duration_seconds"
trace_statements:
- context: span
statements:
- set(attributes["http.route"], Concat(["/api/v1/", attributes["operation"]]))
- replace_pattern(attributes["user.email"], "(.*)@example.com", "${1}@redacted.com")
Data Export Components
OTLP Exporters: Standardized Backend Communication
The OTLP exporters (both gRPC and HTTP variants) send processed telemetry data to any backend that supports the OpenTelemetry Protocol, providing a vendor-neutral way to transmit data.
Key Features:
- Standardized data format for vendor-neutral transmission
- Choice between gRPC and HTTP transport
- TLS/mTLS support for secure communication
- Compression options for reduced bandwidth usage
Configuration Example:
exporters:
otlp:
endpoint: otel-collector:4317
tls:
insecure: false
cert_file: /certs/client.crt
key_file: /certs/client.key
otlphttp:
endpoint: https://backend.example.com:4318
headers:
Authorization: "Bearer ${OTLP_TOKEN}"
Building Complete Pipelines
The true power of the OpenTelemetry Collector comes from combining these components into complete pipelines. Here's an example configuration that brings together many of the components we've discussed:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
kubeletstats:
collection_interval: 20s
auth_type: serviceAccount
filelog:
include:
- /var/log/pods/*/*/*.log
processors:
batch:
send_batch_size: 10000
timeout: 5s
k8sattributes:
auth_type: serviceAccount
extract:
metadata:
- k8s.pod.name
- k8s.namespace.name
- k8s.deployment.name
filter:
metrics:
include:
match_type: regexp
metric_names: ["^(system|application).*"]
exporters:
otlp:
endpoint: backend.example.com:4317
tls:
insecure: false
service:
pipelines:
traces:
receivers: [otlp]
processors: [k8sattributes, batch]
exporters: [otlp]
metrics:
receivers: [otlp, prometheus, kubeletstats]
processors: [k8sattributes, filter, batch]
exporters: [otlp]
logs:
receivers: [otlp, filelog]
processors: [k8sattributes, batch]
exporters: [otlp]
Validating and Visualizing Your Configuration
When working with complex OpenTelemetry Collector configurations, visualization tools can be invaluable for understanding data flows and identifying potential issues. Tools like OTelBin allow you to validate your configuration against various collector distributions and visualize your pipelines graphically.
Final thoughts
The OpenTelemetry Collector provides a powerful framework for building customized observability pipelines. By understanding the key components described in this article, you can create efficient, scalable telemetry processing systems that meet your specific needs.
Starting with the Contrib distribution is recommended, as it includes all the components we've covered. Remember that different distributions may support different components, so always verify compatibility with your chosen distribution.
As your observability needs evolve, you can incrementally enhance your collector configuration, adding new components and capabilities to address emerging requirements. This flexibility is one of the OpenTelemetry Collector's greatest strengths, allowing your observability infrastructure to grow alongside your applications.
Make your mark
Join the writer's program
Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.
Write for us
Build on top of Better Stack
Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.
community@betterstack.comor submit a pull request and help us build better products for everyone.
See the full list of amazing projects on github