Back to Observability guides

An Introduction to the OpenTelemetry Protocol (OTLP)

Ayooluwa Isaiah
Updated on September 30, 2024

The increasing complexity, distributed nature, and microservices architecture of modern software systems have made effective observability essential for maintaining application performance and reliability.

OpenTelemetry addresses this challenge by offering a comprehensive toolkit and standards for collecting telemetry data—logs, metrics, and traces—across an application's entire infrastructure.

At the heart of this initiative is the OpenTelemetry Protocol (OTLP), a standardized format for transmitting telemetry data between different components within the OpenTelemetry ecosystem.

OTLP enables you to capture and send data from instrumented applications, route it through OpenTelemetry Collectors, and forward it to various observability backends for analysis.

In this article, we will explore the key features of OTLP, how it works, and how to implement it in your applications to gain actionable insights from your telemetry data.

Prerequisites

Before proceeding with this article, ensure that you're familiar with the basic OpenTelemetry concepts.

What is the OpenTelemetry Protocol?

OTLP is a telemetry data format used to encode, transmit, and deliver telemetry data such as traces, metrics, and logs between the components of the OpenTelemetry ecosystem. This includes instrumented applications, infrastructure, the OTel Collector, and various observability backends.

It is a crucial aspect of the OpenTelemetry project that's designed to ensure that telemetry data, regardless of its source or vendor, can be processed in a consistent manner.

Key features of OTLP

  • Standardization: It provides a unified format for traces, metrics, and logs across various programming languages and platforms, promoting interoperability and vendor neutrality.

  • Flexibility: OTLP leverages gRPC with Protocol Buffers for efficient, real-time communication but it also supports http/protobuf or http/json for environments where gRPC may not be ideal.

  • Efficiency: OTLP is designed for efficient and reliable data transmission, supporting high throughput and low latency scenarios.

  • Extensibility: The protocol buffer-based encoding allows for future additions and extensions without breaking backward compatibility.

  • Semantic conventions: It defines a set of semantic conventions for common attributes and data types, ensuring consistency and meaning across different sources of telemetry data.

Exploring the OTLP Specification

The OpenTelemetry Protocol (OTLP) uses a request-based communication model, where telemetry data is transmitted from a client (sender) to a server (receiver) through individual requests.

Here's how the process typically works:

  • Data collection: An application instrumented with OpenTelemetry collects telemetry data (traces, metrics, and logs) and packages it into an OTLP-compliant request.

  • Data transmission: The request is sent to a server, often an OpenTelemetry Collector or directly to an observability backend.

  • Acknowledgment: The server processes the data and responds to the client with an acknowledgment of successful receipt. If there’s an issue, an error message is returned instead.

OTLP supports two primary transport mechanisms for this request/response interaction:

  • gRPC: A high-performance, bidirectional streaming protocol.
  • HTTP/1.1: A more traditional transport option for environments where gRPC may not be ideal.

Both transport mechanisms use Protocol Buffers (protobuf) to define the structure of the telemetry data payload. Additionally, servers are required to support Gzip compression for payloads, although uncompressed payloads are also accepted.

Choosing between gRPC and HTTP

When it comes to exporting telemetry data in OpenTelemetry, most SDKs provide options for OTLP transmission over either grpc or http/protobuf, with some exporters (like JavaScript) also supporting http/json.

If you're trying to decide between gRPC and HTTP, consider these points:

  • Ensure that the protocol is supported in the medium of its intended usage. For example, using grpc for exporting is not supported in the browser so you must use http/json or http/protobuf.

  • The default protocol used by SDKs differ per language and it is sometimes easier to stay with the defaults due to support levels. For instance, the Go SDK defaults to gRPC, while Node.js' SDK defaults to http/protobuf.

  • gRPC often introduces larger dependencies into your code base.

  • gRPC relies on HTTP/2 for transport, which may have varying support across your network infrastructure (think firewalls, proxies, and load balancers)

  • gRPC is generally more efficient and supports streaming which should come in handy when dealing with larger payload and higher throughput scenarios.

Note that the default port for OTLP over gRPC is 4317, while OLTP over HTTP uses port 4318.

The OTLP data model

At its core, OTLP establishes a well-defined and structured data model for representing telemetry data, facilitating its consistent and efficient handling throughout the observability pipeline. This model encompasses the three major types of telemetry: traces, metrics, and logs.

Traces In OTLP, traces are defined wholly by the their execution context, namely spans which represent a specific operation within the overall transaction. The entire protocol buffer encoding of a trace can be found

here, but here is a simplified overview:

trace.proto
// https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto
message TracesData {
  repeated ResourceSpans resource_spans = 1;
}

message ResourceSpans {
  opentelemetry.proto.resource.v1.Resource resource = 1;
  repeated ScopeSpans scope_spans = 2;
}

message ScopeSpans {
  opentelemetry.proto.common.v1.InstrumentationScope scope = 1;
  repeated Span spans = 2;
}

message Span {
  bytes trace_id = 1;
  bytes span_id = 2;
  string trace_state = 3;
  bytes parent_span_id = 4;
  fixed32 flags = 16;
  string name = 5;
  enum SpanKind {}
  SpanKind kind = 6;
  fixed64 start_time_unix_nano = 7;
  fixed64 end_time_unix_nano = 8;
  Status status = 15;
  message Event {}
  repeated Event events = 11;
  repeated opentelemetry.proto.common.v1.KeyValue attributes = 9;
  uint32 dropped_attributes_count = 10;
  uint32 dropped_events_count = 12;
  message Link {}
  repeated Link links = 13;
  uint32 dropped_links_count = 14;
  Status status = 15;
}


message Status {}

enum SpanFlags {}

This protocol buffer structure organizes telemetry data hierarchically like this:

 
TracesData -> ResourceSpans -> ScopeSpans -> Span

Here's an explanation of some of the major components of this structure:

  • TracesData: A collection of ResourceSpans, representing telemetry data associated with specific resources.

  • ResourceSpans: Contains information about the Resource itself and multiple ScopeSpans, which groups spans based on their instrumentation scope.

  • ScopeSpans: Groups multiple spans that share the same InstrumentationScope (the library or component responsible for generating the span).

  • Span: This is the core building block of a trace, representing a single operation or activity.

    • It includes identifiers like trace_id and span_id for linking spans within a trace.
    • Captures timing information with start_time_unix_nano and end_time_unix_nano.
    • Contains attributes (key/value pairs) providing additional context.
    • Can include events (Event objects) representing specific occurrences during the span's lifetime.
    • Can have links (Link objects) to other spans, potentially in different traces.
    • Carries a Status object indicating the success or failure of the span.
  • Status: It represents the outcome of a span's execution by including a code (UNSET, OK, and ERROR), and an optional message providing details.

  • SpanFlags: These are bit flags providing additional information about a span's context and behavior.

Metrics

Metrics data are also organized hierarchically by associating measurements with resources and instrumentation scopes. Each metric has a specific data type, and data points within them carry the actual values along with timestamps and additional context.

Here's a simplified view of the full metrics definition:

metrics.proto
message MetricsData {
  repeated ResourceMetrics resource_metrics = 1;
}

message ResourceMetrics {
  reserved 1000;
  opentelemetry.proto.resource.v1.Resource resource = 1;
  repeated ScopeMetrics scope_metrics = 2;
  string schema_url = 3;
}

message ScopeMetrics {
  opentelemetry.proto.common.v1.InstrumentationScope scope = 1;
  repeated Metric metrics = 2;
  string schema_url = 3;
}

message Metric {
  reserved 4, 6, 8;
  string name = 1;
  string description = 2;
  string unit = 3;
  oneof data {
    Gauge gauge = 5;
    Sum sum = 7;
    Histogram histogram = 9;
    ExponentialHistogram exponential_histogram = 10;
    Summary summary = 11;
  }
  repeated opentelemetry.proto.common.v1.KeyValue metadata = 12;
}

message Gauge {}

message Sum {}

message Histogram {}

message ExponentialHistogram {}

message Summary {}

enum AggregationTemporality {}

enum DataPointFlags {}

message NumberDataPoint {}

message HistogramDataPoint {}

message ExponentialHistogramDataPoint {}

message SummaryDataPoint {}

message Exemplar {}

The most important fields are described as follows:

  • MetricsData: A collection of ResourceMetrics, which ties metrics to specific resources.

  • ResourceMetrics: Contains resource details and multiple ScopeMetrics for grouping metrics based on instrumentation scope (library/framework used).

  • ScopeMetrics: Groups metrics sharing the same instrumentation scope and contains multiple Metric objects.

  • Metric: This is the fundamental unit, representing a specific measurement.

    • It has a name, description, and unit for identification and context,
    • One of five data types (explained below),
    • Optional metadata (key-value pairs) for additional information.

Types of metrics

  • Gauge: This represents a value at a specific point in time (e.g., current CPU usage, number of items in a queue, etc). It is a number that can either go up or down.

  • Sum: This is analogous to the Counter metric type in Prometheus. It represents a running total that increases over time (such as number of requests or errors).

  • Histogram: These are used to represent a distribution of measurements by sampling observations and counting them in configurable buckets.

  • Exponential Histogram: They are a special type of histogram used to efficiently represent a distribution of values in high-cardinality scenarios where the range of values can vary significantly.

  • Summary: Summaries are included in OpenTelemetry for legacy support, and OpenTelemetry APIs and SDK do not produce summaries.

Logs

OTLP's data model for logs offers a standardized way to represent log data from various sources including application logs, machine-generated events, system logs, and more.

This model allows for unambiguous mapping from existing log formats, ensuring compatibility and ease of integration. It also enables reverse mapping back to specific log formats, provided those formats support equivalent features.

You can check out the protocol buffer representation or read the full design document, but here's a concise summary of the model:

logs.proto
message LogsData {
  repeated ResourceLogs resource_logs = 1;
}

message ResourceLogs {
  reserved 1000;
  opentelemetry.proto.resource.v1.Resource resource = 1;
  repeated ScopeLogs scope_logs = 2;
  string schema_url = 3;
}

message ScopeLogs {
  opentelemetry.proto.common.v1.InstrumentationScope scope = 1;
  repeated LogRecord log_records = 2;
  string schema_url = 3;
}

enum SeverityNumber {}

enum LogRecordFlags {}

message LogRecord {
  reserved 4;
  fixed64 time_unix_nano = 1;
  fixed64 observed_time_unix_nano = 11;
  SeverityNumber severity_number = 2;
  string severity_text = 3;
  opentelemetry.proto.common.v1.AnyValue body = 5;
  repeated opentelemetry.proto.common.v1.KeyValue attributes = 6;
  uint32 dropped_attributes_count = 7;
  fixed32 flags = 8;
  bytes trace_id = 9;
  bytes span_id = 10;
}

Besides the LogsData, ResourceLogs, and ScopeLogs which perform a similar function of organizing log data in a hierarchical manner similar to the corresponding items in the trace and metrics data model, the main thing to pay attention to is the LogRecord object which is the fundamental unit representing a single log entry.

It consists of of the following attributes:

  • Timestamps: Such as time_unix_nano (when the log was created at the source) and observed_time_unix_nano (when the log was ingested by the Collector).
  • Severity: The severity_number is a numeric representation of the log severity ranging from ranging from TRACE (least severe) to FATAL (most severe), while severity_text is the human-readable description of the log severity.
  • Trace Context Fields: The trace_id and span_id fields allow for optionally linking the log to a specific trace and span for correlation.
  • Body: This body is the actual log message content. It uses the AnyValue type to accommodate various data types.
  • Attributes: This holds key-value pairs providing additional context about the event.

Implementing OTLP instrumentation

Instrumenting your applications to generate OTLP data, be it logs, metrics, or traces, is really straightforward. You only need to choose the appropriate SDK for your application depending on the language you're working in, then configure the OpenTelemetry Collector to receive, process, and export the data.

1. Instrumenting your application

Begin by selecting the appropriate OpenTelemetry SDK for your programming language, initializing it, and instrumenting your code. A list of supported languages and their respective SDKs can be found on the OpenTelemetry website.

Next, ensure that your application is configured to export telemetry data to an OTLP endpoint. While some observability backends offer direct OTLP endpoints via gRPC or HTTP, utilizing the OpenTelemetry Collector as an intermediary is recommended for its flexibility and advanced processing capabilities.

You might also need to adjust environment variables to point to the correct OTLP endpoint URL. By default, OTel SDKs use http://localhost:4317 and http://localhost:4318, but these can be customized via environment variables to match your setup.

2. Configuring the OTLP collector

Once your application is producing OTLP-formatted telemetry data, configure the OpenTelemetry Collector to receive, process, and export it to your chosen observability backend. For example, here's a configuration snippet demonstrating how to receive OTLP trace data over HTTP and export it to Jaeger in its native format:

 
receivers:
  otlp:
    protocols:
      http:
        endpoint: localhost:4318

processors:
  batch:

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/jaeger]

The Collector's powerful processing capabilities allow you to transform OTLP data before exporting. You can filter, enrich, or even anonymize data to comply with privacy regulations or optimize storage.

3. Convert existing instrumentation to OTLP

If your application already uses other instrumentation libraries or formats (e.g., Prometheus for metrics), you can still leverage OTLP and the OpenTelemetry ecosystem. The OpenTelemetry Collector supports a wide range of receivers that can ingest data in various formats and convert it to OTLP.

For example, to convert Prometheus metrics to OTLP you can use the following Collector configuration:

 
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'example'
          scrape_interval: 10s
          static_configs:
            - targets: ['localhost:9090']

processors:
  batch:

exporters:
  otlp:
    endpoint: <OTLP-endpoint>

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [otlp]

This configuration receives Prometheus metrics, processes them in batches, and exports them as OTLP data. Such setups are invaluable for legacy applications where you can't modify the original instrumentation.

4. Sending OTLP data to a backend

After configuring your application and Collector, the final step is to ensure OTLP data reaches your observability backend. Depending on the backend, you'll configure the Collector to export telemetry data in the required format.

For example, Better Stack supports ingesting OTLP data directly so you only need to you’ll configure the Collector’s exporter to match the provided endpoint. This way, the Collector can forward OTLP data efficiently, helping you monitor and visualize application performance in real-time.

Final thoughts

By following these steps, you can effectively implement OTLP instrumentation, harnessing its benefits for improved observability and gaining deeper insights into your system's behavior and performance.

Thanks for reading!

Author's avatar
Article by
Ayooluwa Isaiah
Ayo is the Head of Content at Better Stack. His passion is simplifying and communicating complex technical ideas effectively. His work was featured on several esteemed publications including LWN.net, Digital Ocean, and CSS-Tricks. When he’s not writing or coding, he loves to travel, bike, and play tennis.
Got an article suggestion? Let us know
Next article
Introduction to the OpenTelemetry SDK
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github