Back to Observability guides

Essential OpenTelemetry Collector Components for Observability Pipelines

Ayooluwa Isaiah
Updated on March 17, 2025

Modern applications generate massive amounts of operational data. Capturing, processing, and routing this telemetry data efficiently presents significant challenges for engineering teams. The OpenTelemetry Collector addresses these challenges by providing a unified, vendor-neutral way to handle traces, metrics, and logs.

At its core, the OpenTelemetry Collector functions as a middleware layer between your applications and observability backends. Rather than instrumenting your code to send data to multiple destinations, the Collector acts as a central hub, receiving data from various sources and distributing it to multiple backends while handling tasks like batching, filtering, and transformation.

Understanding the OpenTelemetry Collector Architecture

The OpenTelemetry Collector's pipeline architecture consists of five fundamental building blocks:

  1. Receivers: Entry points that ingest telemetry data through various protocols and formats
  2. Processors: Components that modify, enhance, filter, or aggregate telemetry data
  3. Exporters: Components that send data to observability backends and storage systems
  4. Connectors: Elements that bridge different pipelines and convert between signal types
  5. Extensions: Add-ons that enhance core functionality with capabilities like health monitoring and authentication

These components connect through pipelines defined in the Collector's configuration, allowing for customizable data flows tailored to specific observability requirements.

Let's explore the most valuable components you should consider for your observability pipeline. You can find a comprehensive set of components in the OpenTelemetry Contrib repository.

opentelemetry-collector-contrib.png

Data Ingestion Components

The OTLP Receiver: Your Universal Data Ingress

The OpenTelemetry Protocol (OTLP) receiver serves as the standard entry point for telemetry data in the OpenTelemetry ecosystem. Supporting both gRPC and HTTP protocols, it provides a consistent, efficient way to ingest traces, metrics, and logs from applications instrumented with any OpenTelemetry SDK.

Key Features:

  • Protocol flexibility with both gRPC and HTTP support
  • Comprehensive signal type support (traces, metrics, logs)
  • Efficient binary encoding for reduced network overhead
  • Standardized data format that preserves semantic meaning

Configuration Example:

 
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

Filelog Receiver

While many observability tools focus primarily on traces and metrics, logs remain essential for troubleshooting and understanding application behavior. The Filelog receiver bridges this gap by collecting logs directly from files on disk.

Key Features:

  • Real-time log file tailing
  • Support for various log formats including JSON and multiline logs
  • Container log parsing (Docker, CRI-O, ContainerD)
  • Kubernetes metadata extraction from container logs
  • Flexible include/exclude path patterns

Configuration Example:

otelcol.yaml
receivers:
  filelog:
    include:
      - /var/log/applications/*.log
    exclude:
      - /var/log/applications/debug*.log
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<severity>[A-Z]+) (?P<message>.*)$'

Prometheus Receiver

For organizations already using Prometheus, the Prometheus receiver provides a seamless integration path. Despite its name, this component actively scrapes metrics from targets using Prometheus configurations, rather than passively receiving data.

Key Features:

  • Compatible with existing Prometheus scrape configurations
  • Support for service discovery mechanisms (including Kubernetes)
  • Metric relabeling and filtering
  • OpenMetrics format support with exemplars

Configuration Example:

otelcol.yaml
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'api-service'
          scrape_interval: 10s
          static_configs:
            - targets: ['api-service:8080']
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true

Kubeletstats Receiver

For Kubernetes environments, the Kubeletstats receiver provides specialized metrics collection directly from the kubelet API on each node, offering detailed resource utilization data for nodes, pods, and containers.

Key Features:

  • Comprehensive Kubernetes resource metrics
  • Direct metrics collection without requiring Prometheus
  • Hierarchical data from node to pod to container
  • Volume metrics for storage analysis

Configuration Example:

 
receivers:
  kubeletstats:
    collection_interval: 20s
    auth_type: serviceAccount
    endpoint: ${K8S_NODE_NAME}:10250
    insecure_skip_verify: true

Data Processing Components

Batch Processor

The Batch processor significantly improves collector performance by grouping individual telemetry items into batches before forwarding them to exporters, reducing network overhead and backend load.

Key Features:

  • Configurable batch size and timeout parameters
  • Reduced network traffic and API calls
  • Improved throughput for high-volume telemetry
  • Lower backend database load

Configuration Example:

 
processors:
  batch:
    send_batch_size: 8192
    timeout: 5s
    send_batch_max_size: 0

K8sattributes Processor: Enriching Telemetry with Kubernetes Context

In Kubernetes environments, understanding which pods, deployments, and namespaces generate telemetry is crucial. The K8sattributes processor automatically enriches telemetry data with Kubernetes metadata.

Key Features:

  • Pod, namespace, and deployment identification
  • Node and cluster information
  • Label and annotation extraction
  • Owner references (ReplicaSets, DaemonSets, etc.)

Configuration Example:

 
processors:
  k8sattributes:
    auth_type: serviceAccount
    passthrough: false
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.namespace.name
        - k8s.node.name
        - k8s.pod.start_time

Attributes Processor: Customizing Your Telemetry Data

The Attributes processor provides fine-grained control over the attributes attached to your telemetry data, allowing you to insert, update, delete, or hash specific attributes based on your requirements.

Key Features:

  • Attribute manipulation (insert, update, delete, hash)
  • Support for all telemetry types (traces, metrics, logs)
  • Regular expression matching for attribute selection
  • Conditional attribute processing

Configuration Example:

 
processors:
  attributes:
    actions:
      - key: db.statement
        action: hash
      - key: environment
        value: production
        action: insert
      - key: http.url
        action: delete

Filter Processor: Reducing Data Volume

The Filter processor allows you to selectively include or exclude telemetry data based on specific criteria, helping reduce data volume while preserving the most valuable signals.

Key Features:

  • Independent filtering for traces, metrics, and logs
  • Regular expression and strict matching
  • OTTL-based filtering expressions
  • Metric name and value-based filtering

Configuration Example:

 
processors:
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names: ["^(system|application).*"]
    spans:
      exclude:
        match_type: strict
        attributes:
          - key: http.route
            value: /health

Tail Sampling Processor

For high-volume tracing, examining every trace becomes impractical. The Tail Sampling processor applies sampling decisions after receiving complete traces, allowing you to keep traces that match specific criteria.

Key Features:

  • Sampling based on trace attributes and properties
  • Multiple policy types (rate limiting, numeric attribute, probabilistic)
  • Error and latency-based sampling
  • Complex policy combinations

Configuration Example:

 
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      - name: error-policy
        type: status_code
        status_code: ERROR
      - name: latency-policy
        type: latency
        latency_threshold_ms: 500

Transform Processor: Advanced Data Manipulation

The Transform processor leverages the OpenTelemetry Transformation Language (OTTL) to perform complex data transformations beyond what the Attributes processor can achieve.

Key Features:

  • Conditional logic with OTTL expressions
  • Advanced string and numeric operations
  • Field and attribute manipulations
  • Access to nested data structures

Configuration Example:

 
processors:
  transform:
    metric_statements:
      - context: datapoint
        statements:
          - set(value, value * 1000) where metric.name == "duration_seconds"
    trace_statements:
      - context: span
        statements:
          - set(attributes["http.route"], Concat(["/api/v1/", attributes["operation"]]))
          - replace_pattern(attributes["user.email"], "(.*)@example.com", "${1}@redacted.com")

Data Export Components

OTLP Exporters: Standardized Backend Communication

The OTLP exporters (both gRPC and HTTP variants) send processed telemetry data to any backend that supports the OpenTelemetry Protocol, providing a vendor-neutral way to transmit data.

Key Features:

  • Standardized data format for vendor-neutral transmission
  • Choice between gRPC and HTTP transport
  • TLS/mTLS support for secure communication
  • Compression options for reduced bandwidth usage

Configuration Example:

 
exporters:
  otlp:
    endpoint: otel-collector:4317
    tls:
      insecure: false
      cert_file: /certs/client.crt
      key_file: /certs/client.key
  otlphttp:
    endpoint: https://backend.example.com:4318
    headers:
      Authorization: "Bearer ${OTLP_TOKEN}"

Building Complete Pipelines

The true power of the OpenTelemetry Collector comes from combining these components into complete pipelines. Here's an example configuration that brings together many of the components we've discussed:

 
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
  kubeletstats:
    collection_interval: 20s
    auth_type: serviceAccount
  filelog:
    include:
      - /var/log/pods/*/*/*.log

processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
  k8sattributes:
    auth_type: serviceAccount
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names: ["^(system|application).*"]

exporters:
  otlp:
    endpoint: backend.example.com:4317
    tls:
      insecure: false

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [k8sattributes, batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp, prometheus, kubeletstats]
      processors: [k8sattributes, filter, batch]
      exporters: [otlp]
    logs:
      receivers: [otlp, filelog]
      processors: [k8sattributes, batch]
      exporters: [otlp]

Validating and Visualizing Your Configuration

Screenshot 2025-03-17 at 09-01-27 OTelBin – by Dash0.png

When working with complex OpenTelemetry Collector configurations, visualization tools can be invaluable for understanding data flows and identifying potential issues. Tools like OTelBin allow you to validate your configuration against various collector distributions and visualize your pipelines graphically.

Final thoughts

The OpenTelemetry Collector provides a powerful framework for building customized observability pipelines. By understanding the key components described in this article, you can create efficient, scalable telemetry processing systems that meet your specific needs.

Starting with the Contrib distribution is recommended, as it includes all the components we've covered. Remember that different distributions may support different components, so always verify compatibility with your chosen distribution.

As your observability needs evolve, you can incrementally enhance your collector configuration, adding new components and capabilities to address emerging requirements. This flexibility is one of the OpenTelemetry Collector's greatest strengths, allowing your observability infrastructure to grow alongside your applications.

Author's avatar
Article by
Ayooluwa Isaiah
Ayo is a technical content manager at Better Stack. His passion is simplifying and communicating complex technical ideas effectively. His work was featured on several esteemed publications including LWN.net, Digital Ocean, and CSS-Tricks. When he's not writing or coding, he loves to travel, bike, and play tennis.
Got an article suggestion? Let us know
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github