# Essential OpenTelemetry Collector Components for Observability Pipelines

Modern applications generate massive amounts of operational data. Capturing,
processing, and routing this telemetry data efficiently presents significant
challenges for engineering teams. The OpenTelemetry Collector addresses these
challenges by providing a unified, vendor-neutral way to handle traces, metrics,
and logs.

At its core, the OpenTelemetry Collector functions as a middleware layer between
your applications and observability backends. Rather than instrumenting your
code to send data to multiple destinations, the Collector acts as a central hub,
receiving data from various sources and distributing it to multiple backends
while handling tasks like batching, filtering, and transformation.

<iframe width="100%" height="315" src="https://www.youtube.com/embed/uRA7qee4Frg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

## Understanding the OpenTelemetry Collector Architecture

The OpenTelemetry Collector's pipeline architecture consists of five fundamental
building blocks:

1. **Receivers**: Entry points that ingest telemetry data through various
   protocols and formats
2. **Processors**: Components that modify, enhance, filter, or aggregate
   telemetry data
3. **Exporters**: Components that send data to observability backends and
   storage systems
4. **Connectors**: Elements that bridge different pipelines and convert between
   signal types
5. **Extensions**: Add-ons that enhance core functionality with capabilities
   like health monitoring and authentication

These components connect through pipelines defined in the Collector's
configuration, allowing for customizable data flows tailored to specific
observability requirements.

Let's explore the most valuable components you should consider for your
observability pipeline. You can find a comprehensive set of components in the [OpenTelemetry Contrib repository](https://github.com/open-telemetry/opentelemetry-collector-contrib).

![opentelemetry-collector-contrib.png](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/2c6f6aed-7142-461e-4577-645ec18f6700/lg1x =1200x600)

## Data Ingestion Components

### The OTLP Receiver: Your Universal Data Ingress

[The OpenTelemetry Protocol (OTLP) receiver](https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver)
serves as the standard entry point for telemetry data in the OpenTelemetry
ecosystem. Supporting both gRPC and HTTP protocols, it provides a consistent,
efficient way to ingest traces, metrics, and logs from applications instrumented
with any OpenTelemetry SDK.

**Key Features:**

- Protocol flexibility with both gRPC and HTTP support
- Comprehensive signal type support (traces, metrics, logs)
- Efficient binary encoding for reduced network overhead
- Standardized data format that preserves semantic meaning

**Configuration Example:**

```yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
```

### Filelog Receiver

While many observability tools focus primarily on traces and metrics, logs
remain essential for troubleshooting and understanding application behavior. The
[Filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver)
bridges this gap by collecting logs directly from files on disk.

**Key Features:**

- Real-time log file tailing
- Support for various log formats including JSON and multiline logs
- Container log parsing (Docker, CRI-O, ContainerD)
- Kubernetes metadata extraction from container logs
- Flexible include/exclude path patterns

**Configuration Example:**

```yaml
[label otelcol.yaml]
receivers:
  filelog:
    include:
      - /var/log/applications/*.log
    exclude:
      - /var/log/applications/debug*.log
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<severity>[A-Z]+) (?P<message>.*)$'
```

### Prometheus Receiver

For organizations already using [Prometheus](https://betterstack.com/community/guides/monitoring/prometheus/), the
[Prometheus receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver)
provides a seamless integration path. Despite its name, this component actively
scrapes metrics from targets using Prometheus configurations, rather than
passively receiving data.

**Key Features:**

- Compatible with existing Prometheus scrape configurations
- Support for service discovery mechanisms (including Kubernetes)
- Metric relabeling and filtering
- OpenMetrics format support with exemplars

**Configuration Example:**

```yaml
[label otelcol.yaml]
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'api-service'
          scrape_interval: 10s
          static_configs:
            - targets: ['api-service:8080']
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true
```

### Kubeletstats Receiver

For Kubernetes environments, the
[Kubeletstats receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/kubeletstatsreceiver)
provides specialized metrics collection directly from the kubelet API on each
node, offering detailed resource utilization data for nodes, pods, and
containers.

**Key Features:**

- Comprehensive Kubernetes resource metrics
- Direct metrics collection without requiring Prometheus
- Hierarchical data from node to pod to container
- Volume metrics for storage analysis

**Configuration Example:**

```yaml
receivers:
  kubeletstats:
    collection_interval: 20s
    auth_type: serviceAccount
    endpoint: ${K8S_NODE_NAME}:10250
    insecure_skip_verify: true
```

## Data Processing Components

### Batch Processor

[The Batch processor](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor)
significantly improves collector performance by grouping individual telemetry
items into batches before forwarding them to exporters, reducing network
overhead and backend load.

**Key Features:**

- Configurable batch size and timeout parameters
- Reduced network traffic and API calls
- Improved throughput for high-volume telemetry
- Lower backend database load

**Configuration Example:**

```yaml
processors:
  batch:
    send_batch_size: 8192
    timeout: 5s
    send_batch_max_size: 0
```

### K8sattributes Processor: Enriching Telemetry with Kubernetes Context

In Kubernetes environments, understanding which pods, deployments, and
namespaces generate telemetry is crucial.
[The K8sattributes processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor)
automatically enriches telemetry data with Kubernetes metadata.

**Key Features:**

- Pod, namespace, and deployment identification
- Node and cluster information
- Label and annotation extraction
- Owner references (ReplicaSets, DaemonSets, etc.)

**Configuration Example:**

```yaml
processors:
  k8sattributes:
    auth_type: serviceAccount
    passthrough: false
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.namespace.name
        - k8s.node.name
        - k8s.pod.start_time
```

### Attributes Processor: Customizing Your Telemetry Data

[The Attributes processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/attributesprocessor)
provides fine-grained control over the attributes attached to your telemetry
data, allowing you to insert, update, delete, or hash specific attributes based
on your requirements.

**Key Features:**

- Attribute manipulation (insert, update, delete, hash)
- Support for all telemetry types (traces, metrics, logs)
- Regular expression matching for attribute selection
- Conditional attribute processing

**Configuration Example:**

```yaml
processors:
  attributes:
    actions:
      - key: db.statement
        action: hash
      - key: environment
        value: production
        action: insert
      - key: http.url
        action: delete
```

### Filter Processor: Reducing Data Volume

[The Filter processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/filterprocessor)
allows you to selectively include or exclude telemetry data based on specific
criteria, helping reduce data volume while preserving the most valuable signals.

**Key Features:**

- Independent filtering for traces, metrics, and logs
- Regular expression and strict matching
- OTTL-based filtering expressions
- Metric name and value-based filtering

**Configuration Example:**

```yaml
processors:
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names: ["^(system|application).*"]
    spans:
      exclude:
        match_type: strict
        attributes:
          - key: http.route
            value: /health
```

### Tail Sampling Processor

For high-volume tracing, examining every trace becomes impractical.
[The Tail Sampling processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor)
applies sampling decisions after receiving complete traces, allowing you to keep
traces that match specific criteria.

**Key Features:**

- Sampling based on trace attributes and properties
- Multiple policy types (rate limiting, numeric attribute, probabilistic)
- Error and latency-based sampling
- Complex policy combinations

**Configuration Example:**

```yaml
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      - name: error-policy
        type: status_code
        status_code: ERROR
      - name: latency-policy
        type: latency
        latency_threshold_ms: 500
```

[ad-logs]

### Transform Processor: Advanced Data Manipulation

[The Transform processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor)
leverages the [OpenTelemetry Transformation Language (OTTL)](https://betterstack.com/community/guides/observability/ottl/) to perform
complex data transformations beyond what the Attributes processor can achieve.

**Key Features:**

- Conditional logic with OTTL expressions
- Advanced string and numeric operations
- Field and attribute manipulations
- Access to nested data structures

**Configuration Example:**

```yaml
processors:
  transform:
    metric_statements:
      - context: datapoint
        statements:
          - set(value, value * 1000) where metric.name == "duration_seconds"
    trace_statements:
      - context: span
        statements:
          - set(attributes["http.route"], Concat(["/api/v1/", attributes["operation"]]))
          - replace_pattern(attributes["user.email"], "(.*)@example.com", "${1}@redacted.com")
```

## Data Export Components

### OTLP Exporters: Standardized Backend Communication

The OTLP exporters (both gRPC and HTTP variants) send processed telemetry data
to any backend that supports the OpenTelemetry Protocol, providing a
vendor-neutral way to transmit data.

**Key Features:**

- Standardized data format for vendor-neutral transmission
- Choice between gRPC and HTTP transport
- TLS/mTLS support for secure communication
- Compression options for reduced bandwidth usage

**Configuration Example:**

```yaml
exporters:
  otlp:
    endpoint: otel-collector:4317
    tls:
      insecure: false
      cert_file: /certs/client.crt
      key_file: /certs/client.key
  otlphttp:
    endpoint: https://backend.example.com:4318
    headers:
      Authorization: "Bearer ${OTLP_TOKEN}"
```

## Building Complete Pipelines

The true power of the OpenTelemetry Collector comes from combining these
components into complete pipelines. Here's an example configuration that brings
together many of the components we've discussed:

```yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
  kubeletstats:
    collection_interval: 20s
    auth_type: serviceAccount
  filelog:
    include:
      - /var/log/pods/*/*/*.log

processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
  k8sattributes:
    auth_type: serviceAccount
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
  filter:
    metrics:
      include:
        match_type: regexp
        metric_names: ["^(system|application).*"]

exporters:
  otlp:
    endpoint: backend.example.com:4317
    tls:
      insecure: false

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [k8sattributes, batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp, prometheus, kubeletstats]
      processors: [k8sattributes, filter, batch]
      exporters: [otlp]
    logs:
      receivers: [otlp, filelog]
      processors: [k8sattributes, batch]
      exporters: [otlp]
```

## Validating and Visualizing Your Configuration

![Screenshot 2025-03-17 at 09-01-27 OTelBin – by Dash0.png](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/9e1e410e-40ff-45fc-e0c0-8c3e24925e00/public =6143x3226)

When working with complex OpenTelemetry Collector configurations, visualization
tools can be invaluable for understanding data flows and identifying potential
issues. Tools like [OTelBin](https://www.otelbin.io/) allow you to validate your
configuration against various collector distributions and visualize your
pipelines graphically.

## Final thoughts

The OpenTelemetry Collector provides a powerful framework for building
customized observability pipelines. By understanding the key components
described in this article, you can create efficient, scalable telemetry
processing systems that meet your specific needs.

Starting with the Contrib distribution is recommended, as it includes all the
components we've covered. Remember that different distributions may support
different components, so always verify compatibility with your chosen
distribution.

As your observability needs evolve, you can incrementally enhance your collector
configuration, adding new components and capabilities to address emerging
requirements. This flexibility is one of the OpenTelemetry Collector's greatest
strengths, allowing your observability infrastructure to grow alongside your
applications.
