Essential OpenTelemetry Collector Components for Observability Pipelines

OpenTelemetry Collector OpenTelemetry Observability

Ayooluwa Isaiah

Updated on March 17, 2025

Understanding the OpenTelemetry Collector Architecture
Data Ingestion Components
Data Processing Components
Data Export Components
Building Complete Pipelines
Validating and Visualizing Your Configuration
Final thoughts

Modern applications generate massive amounts of operational data. Capturing, processing, and routing this telemetry data efficiently presents significant challenges for engineering teams. The OpenTelemetry Collector addresses these challenges by providing a unified, vendor-neutral way to handle traces, metrics, and logs.

At its core, the OpenTelemetry Collector functions as a middleware layer between your applications and observability backends. Rather than instrumenting your code to send data to multiple destinations, the Collector acts as a central hub, receiving data from various sources and distributing it to multiple backends while handling tasks like batching, filtering, and transformation.

Understanding the OpenTelemetry Collector Architecture

The OpenTelemetry Collector's pipeline architecture consists of five fundamental building blocks:

Receivers: Entry points that ingest telemetry data through various protocols and formats
Processors: Components that modify, enhance, filter, or aggregate telemetry data
Exporters: Components that send data to observability backends and storage systems
Connectors: Elements that bridge different pipelines and convert between signal types
Extensions: Add-ons that enhance core functionality with capabilities like health monitoring and authentication

These components connect through pipelines defined in the Collector's configuration, allowing for customizable data flows tailored to specific observability requirements.

Let's explore the most valuable components you should consider for your observability pipeline. You can find a comprehensive set of components in the OpenTelemetry Contrib repository.

Data Ingestion Components

The OTLP Receiver: Your Universal Data Ingress

The OpenTelemetry Protocol (OTLP) receiver serves as the standard entry point for telemetry data in the OpenTelemetry ecosystem. Supporting both gRPC and HTTP protocols, it provides a consistent, efficient way to ingest traces, metrics, and logs from applications instrumented with any OpenTelemetry SDK.

Key Features:

Protocol flexibility with both gRPC and HTTP support
Comprehensive signal type support (traces, metrics, logs)
Efficient binary encoding for reduced network overhead
Standardized data format that preserves semantic meaning

Configuration Example:

Copied!

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

Filelog Receiver

While many observability tools focus primarily on traces and metrics, logs remain essential for troubleshooting and understanding application behavior. The Filelog receiver bridges this gap by collecting logs directly from files on disk.

Key Features:

Real-time log file tailing
Support for various log formats including JSON and multiline logs
Container log parsing (Docker, CRI-O, ContainerD)
Kubernetes metadata extraction from container logs
Flexible include/exclude path patterns

Configuration Example:

otelcol.yaml

Copied!

receivers:
  filelog:
    include:
      - /var/log/applications/*.log
    exclude:
      - /var/log/applications/debug*.log
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<severity>[A-Z]+) (?P<message>.*)$'

Prometheus Receiver

For organizations already using Prometheus, the Prometheus receiver provides a seamless integration path. Despite its name, this component actively scrapes metrics from targets using Prometheus configurations, rather than passively receiving data.

Key Features:

Compatible with existing Prometheus scrape configurations
Support for service discovery mechanisms (including Kubernetes)
Metric relabeling and filtering
OpenMetrics format support with exemplars

Configuration Example:

otelcol.yaml

Copied!

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'api-service'
          scrape_interval: 10s
          static_configs:
            - targets: ['api-service:8080']
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true

Kubeletstats Receiver

For Kubernetes environments, the Kubeletstats receiver provides specialized metrics collection directly from the kubelet API on each node, offering detailed resource utilization data for nodes, pods, and containers.

Key Features:

Comprehensive Kubernetes resource metrics
Direct metrics collection without requiring Prometheus
Hierarchical data from node to pod to container
Volume metrics for storage analysis

Configuration Example:

Copied!

receivers:
  kubeletstats:
    collection_interval: 20s
    auth_type: serviceAccount
    endpoint: ${K8S_NODE_NAME}:10250
    insecure_skip_verify: true

Data Processing Components

Batch Processor

The Batch processor significantly improves collector performance by grouping individual telemetry items into batches before forwarding them to exporters, reducing network overhead and backend load.

Key Features:

Configurable batch size and timeout parameters
Reduced network traffic and API calls
Improved throughput for high-volume telemetry
Lower backend database load

Configuration Example:

Copied!

processors:
  batch:
    send_batch_size: 8192
    timeout: 5s
    send_batch_max_size: 0

K8sattributes Processor: Enriching Telemetry with Kubernetes Context

In Kubernetes environments, understanding which pods, deployments, and namespaces generate telemetry is crucial. The K8sattributes processor automatically enriches telemetry data with Kubernetes metadata.

Key Features:

Pod, namespace, and deployment identification
Node and cluster information
Label and annotation extraction
Owner references (ReplicaSets, DaemonSets, etc.)

Configuration Example:

Copied!

processors:
  k8sattributes:
    auth_type: serviceAccount
    passthrough: false
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.namespace.name
        - k8s.node.name
        - k8s.pod.start_time

Attributes Processor: Customizing Your Telemetry Data

The Attributes processor provides fine-grained control over the attributes attached to your telemetry data, allowing you to insert, update, delete, or hash specific attributes based on your requirements.

Key Features:

Attribute manipulation (insert, update, delete, hash)
Support for all telemetry types (traces, metrics, logs)
Regular expression matching for attribute selection
Conditional attribute processing

Configuration Example:

Copied!

processors:
  attributes:
    actions:
      - key: db.statement
        action: hash
      - key: environment
        value: production
        action: insert
      - key: http.url
        action: delete

Filter Processor: Reducing Data Volume

The Filter processor allows you to selectively include or exclude telemetry data based on specific criteria, helping reduce data volume while preserving the most valuable signals.

Key Features:

Independent filtering for traces, metrics, and logs
Regular expression and strict matching
OTTL-based filtering expressions
Metric name and value-based filtering

Configuration Example:

Copied!

processors:
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names: ["^(system|application).*"]
    spans:
      exclude:
        match_type: strict
        attributes:
          - key: http.route
            value: /health

Tail Sampling Processor

For high-volume tracing, examining every trace becomes impractical. The Tail Sampling processor applies sampling decisions after receiving complete traces, allowing you to keep traces that match specific criteria.

Key Features:

Sampling based on trace attributes and properties
Multiple policy types (rate limiting, numeric attribute, probabilistic)
Error and latency-based sampling
Complex policy combinations

Configuration Example:

Copied!

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      - name: error-policy
        type: status_code
        status_code: ERROR
      - name: latency-policy
        type: latency
        latency_threshold_ms: 500

The fastest log
search on the planet

Better Stack lets you see inside any stack, debug any issue, and resolve any incident.

Transform Processor: Advanced Data Manipulation

The Transform processor leverages the OpenTelemetry Transformation Language (OTTL) to perform complex data transformations beyond what the Attributes processor can achieve.

Key Features:

Conditional logic with OTTL expressions
Advanced string and numeric operations
Field and attribute manipulations
Access to nested data structures

Configuration Example:

Copied!

processors:
  transform:
    metric_statements:
      - context: datapoint
        statements:
          - set(value, value * 1000) where metric.name == "duration_seconds"
    trace_statements:
      - context: span
        statements:
          - set(attributes["http.route"], Concat(["/api/v1/", attributes["operation"]]))
          - replace_pattern(attributes["user.email"], "(.*)@example.com", "${1}@redacted.com")

Data Export Components

OTLP Exporters: Standardized Backend Communication

The OTLP exporters (both gRPC and HTTP variants) send processed telemetry data to any backend that supports the OpenTelemetry Protocol, providing a vendor-neutral way to transmit data.

Key Features:

Standardized data format for vendor-neutral transmission
Choice between gRPC and HTTP transport
TLS/mTLS support for secure communication
Compression options for reduced bandwidth usage

Configuration Example:

Copied!

exporters:
  otlp:
    endpoint: otel-collector:4317
    tls:
      insecure: false
      cert_file: /certs/client.crt
      key_file: /certs/client.key
  otlphttp:
    endpoint: https://backend.example.com:4318
    headers:
      Authorization: "Bearer ${OTLP_TOKEN}"

Building Complete Pipelines

The true power of the OpenTelemetry Collector comes from combining these components into complete pipelines. Here's an example configuration that brings together many of the components we've discussed:

Copied!

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
  kubeletstats:
    collection_interval: 20s
    auth_type: serviceAccount
  filelog:
    include:
      - /var/log/pods/*/*/*.log

processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
  k8sattributes:
    auth_type: serviceAccount
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names: ["^(system|application).*"]

exporters:
  otlp:
    endpoint: backend.example.com:4317
    tls:
      insecure: false

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [k8sattributes, batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp, prometheus, kubeletstats]
      processors: [k8sattributes, filter, batch]
      exporters: [otlp]
    logs:
      receivers: [otlp, filelog]
      processors: [k8sattributes, batch]
      exporters: [otlp]

Validating and Visualizing Your Configuration

Screenshot 2025-03-17 at 09-01-27 OTelBin – by Dash0.png

When working with complex OpenTelemetry Collector configurations, visualization tools can be invaluable for understanding data flows and identifying potential issues. Tools like OTelBin allow you to validate your configuration against various collector distributions and visualize your pipelines graphically.

Final thoughts

The OpenTelemetry Collector provides a powerful framework for building customized observability pipelines. By understanding the key components described in this article, you can create efficient, scalable telemetry processing systems that meet your specific needs.

Starting with the Contrib distribution is recommended, as it includes all the components we've covered. Remember that different distributions may support different components, so always verify compatibility with your chosen distribution.

As your observability needs evolve, you can incrementally enhance your collector configuration, adding new components and capabilities to address emerging requirements. This flexibility is one of the OpenTelemetry Collector's greatest strengths, allowing your observability infrastructure to grow alongside your applications.

Got an article suggestion? Let us know

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us

Writer of the month

Marin Bezhanov

Marin is a software engineer and architect with a broad range of experience working...

Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github

Essential OpenTelemetry Collector Components for Observability Pipelines

Contents

Understanding the OpenTelemetry Collector Architecture

Data Ingestion Components

The OTLP Receiver: Your Universal Data Ingress

Filelog Receiver

Prometheus Receiver

Kubeletstats Receiver

Data Processing Components

Batch Processor

K8sattributes Processor: Enriching Telemetry with Kubernetes Context

Attributes Processor: Customizing Your Telemetry Data

Filter Processor: Reducing Data Volume

Tail Sampling Processor

Transform Processor: Advanced Data Manipulation

Data Export Components

OTLP Exporters: Standardized Backend Communication

Building Complete Pipelines

Validating and Visualizing Your Configuration

Final thoughts

Make your mark

Join the writer's program

Build on top of Better Stack