Back to Observability guides

Redacting Sensitive Data with the OpenTelemetry Collector

Ayooluwa Isaiah
Updated on September 20, 2024

As your business scales, the surge in telemetry data often brings with it a trove of sensitive information such as credit card numbers, personal identifiers, or application secrets. This presents a growing challenge: how do you process this data while ensuring compliance with privacy regulations like GDPR, HIPAA, or PCI DSS?

This where the OpenTelemetry Collector can play a vital role. By seamlessly integrating into your observability pipelines, the Collector ensures that sensitive data is scrubbed from your logs, traces, and metrics before they leave your environment.

This article will discuss the specific capabilities of the Collector for sensitive data redaction and teach you how to leverage them to craft rules that preserve the confidentiality of your application data without compromising your observability goals.

Let's get started!

Prerequisites

How to redact sensitive data with OpenTelemetry

The OpenTelemetry Collector offers three processors that enable real-time detection and redaction of sensitive data as it processes your telemetry, ensuring you can securely transmit the redacted data to external platforms.

The processors in question are:

  1. Attributes: For accessing and modifying individual attributes within a signal.
  2. Redaction: Specifically made to filter out sensitive attributes or mask their values.
  3. Transform: It uses the OpenTelemetry Transform Language to perform large scale transformations to telemetry data.

Let's take a look at each processor in turn.

Attributes processor

The attributes processor is designed to modify the attributes of spans, logs, or metrics. It supports various actions, such as inserting, updating, or removing attributes, to tailor telemetry data before exporting.

The update, delete, and hash actions are especially useful for redacting sensitive data. Here's an example of how to configure it:

 
processors:
  attributes/update:
    actions:
      - key: payment.card_number
        action: delete
      - key: user.email
        action: delete
      - key: app_secret
        value: [REDACTED]
        action: update
      - key: client.ip_address
        action: hash

This setup bolsters data security by deleting sensitive attributes like credit card numbers and emails while redacting the app_secret field and hashing client IP addresses (using SHA256) to allow for anonymized tracking.

If you only need to delete, redact, or hash specific sensitive fields in your signals, then attributes may well be all you need. But for implementing an allowlist of fields or detecting and redacting standard patterns (like email addresses), you'll need the redaction processor.

Let's take a look at it next.

Redaction processor

The redaction processor is specifically designed to prevent sensitive fields from leaking into your telemetry data. It offers powerful tools to:

  • Remove any attributes not included in a predefined list of permitted attributes.
  • Strip confidential information from telemetry to prevent accidental data exposure.
  • Mask or obfuscate sensitive attribute values that match standard patterns.

Here are a couple of ways you can use the redaction processor:

1. Implementing an allowlist of attributes

 
processors:
  redaction/allowlist:
    allow_all_keys: false
    allowed_keys:
      - http.method
      - http.url
      - http.status_code

This configuration ensures that only the attributes explicitly included in the allowlist will be retained. By setting allow_all_keys: false, the processor will block all attributes by default, except those specified in allowed_keys. This helps limit telemetry data to pre-approved attributes, minimizing the risk of leaking sensitive or unnecessary information.

2. Masking sensitive values in allowed attributes

For situations where you cannot predict the specific attributes that may contain sensitive data, you can use the blocked_values property to define regular expressions that match and mask those values:

 
processors:
  redaction/mask:
    allow_all_keys: true
    blocked_values:
      - 4[0-9]{12}(?:[0-9]{3})? # VISA
      - 5[1-5][0-9]{14} # Mastercard
      - 3[47][0-9]{13} # Amex

This approach lets you retain useful context from attributes while masking standard patterns like credit card numbers. When a match is found, the sensitive value is replaced with a placeholder such as ****, ensuring that confidential data is protected while keeping the attribute intact for analysis.

Transform processor

The transform processor uses the OpenTelemetry Transformation Language (OTTL) to enable a wide range of transformations, from simple attribute changes to complex conditional logic and metric aggregations.

With the transform processor, you can:

  • Modify attribute values
  • Add or remove attributes
  • Filter or drop entire spans, metrics, or logs
  • Convert metric types
  • Aggregate metrics
  • Apply conditional logic
  • And more!

In this section, we'll focus on its capabilities for redacting sensitive data from telemetry. First, let's review how to configure the transform processor, as it requires more setup than the other processors.

Here's an example configuration for the transform processor:

 
transform:
  error_mode: <ignore|silent|propagate>
  <trace|metric|log>_statements:
    - context: string
      conditions:
        - string
        - string
      statements:
        - string
        - string
        - string

The error_mode property determines how the processor will react to errors when processing a statement, while the transformation logic for each telemetry type is placed in trace_statements, metric_statements, and log_statements respectively.

For each telemetry type, you must specify a context for applying transformations. Valid contexts include:

  • For trace statements: resource, scope, span, and spanevent.
  • For metric statements: resource, scope, metric, and datapoint.
  • For log statements: resource, scope, and log.

The optional conditions property allows you to apply logic only if specific criteria are met, while the statements section defines the actual transformation operations.

Now, let's look at an example to demonstrate how to redact sensitive data from a trace span using the transform processor.

Redacting sensitive data from trace spans

A typical scenario requiring sensitive data redaction is when traces for client requests to external services include an http.url attribute that exposes confidential information such as client_id and client_secret.

Here's an example of the pipeline data representation containing this sensitive information:

 
{
  "span": {
      "attributes": {
        "http.method": "GET",
"http.url": "https://example.com/posts/1?client_id=test-id&client_secret=cd76c726-cdbd-4951-8d60-f67770236661",
"net.peer.name": "example.com", "user_agent.original": "go-resty/2.14.0 (https://github.com/go-resty/resty)", "http.status_code": 200 } } }

You can use the transform processor to redact sensitive fields like client_id and client_secret by applying OTTL statements, as shown in the following configuration:

 
processors:
  transform/redact:
    trace_statements:
      - context: span
        statements:
          - replace_pattern(attributes["http.url"], "client_id=[^&]+", "client_id=[REDACTED]")
          - replace_pattern(attributes["http.url"], "client_secret=[^&]+", "client_secret=[REDACTED]")

In this setup, the replace_pattern() function identifies and replaces sensitive data in the http.url attribute by matching regex patterns and replacing them with [REDACTED]. After these transformations, the http.url attribute will appear as follows:

 
{
  "span": {
      "attributes": {
        "http.method": "GET",
"http.url": "https://example.com/posts/1?client_id=[REDACTED]&client_secret=[REDACTED]",
"net.peer.name": "example.com", "user_agent.original": "go-resty/2.14.0 (https://github.com/go-resty/resty)", "http.status_code": 200 } } }

Some other useful OTTL functions for redaction include:

Deleting sensitive attributes

 
delete_key(attributes, "http.request.header.authorization")
 
delete_matching_keys(attributes, "http.request.header.*")

Implementing an allowlist

Similar to the redaction processor's allowed_keys property:

 
keep_keys(attributes, ["http.method", "http.route", "http.url"])
 
keep_matching_keys(attributes, "http.*")

Hashing attribute values

Unlike the attributes processor, you can specify a preferred hashing algorithm:

 
set(attributes["user.email"], SHA256(attributes["user.email"]))

With OTTL's powerful transformation capabilities, the transform processor should be your go-to solution for handling complex redaction and transformation requirements as it far surpasses the capabilities of both the attributes and redaction processors.

For more details, be sure to check out our comprehensive OTTL guide or explore the language documentation.

Final thoughts

While the OpenTelemetry Collector provides powerful tools for redacting sensitive information, the best practice is to filter out such data as early as possible in the observability pipeline—ideally at the instrumentation layer. Doing so minimizes the risk of accidental exposure and improves resource efficiency within the Collector.

That said, in situations where capturing sensitive data is unavoidable, the OpenTelemetry Collector provides an added layer of protection. By utilizing its built-in processors and the flexibility of OTTL transformations, you can ensure that your telemetry data remains secure and compliant.

Ultimately, observability and data privacy can work hand in hand. With thoughtful planning and the right tools, like the OpenTelemetry Collector, you can achieve both and gain valuable insights while upholding the highest standards of data protection.

Thanks for reading!

Author's avatar
Article by
Ayooluwa Isaiah
Ayo is the Head of Content at Better Stack. His passion is simplifying and communicating complex technical ideas effectively. His work was featured on several esteemed publications including LWN.net, Digital Ocean, and CSS-Tricks. When he’s not writing or coding, he loves to travel, bike, and play tennis.
Got an article suggestion? Let us know
Next article
Prometheus vs. OpenTelemetry: Choosing Your Observability Tool
Licensed under CC-BY-NC-SA

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Make your mark

Join the writer's program

Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them.

Write for us
Writer of the month
Marin Bezhanov
Marin is a software engineer and architect with a broad range of experience working...
Build on top of Better Stack

Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email.

community@betterstack.com

or submit a pull request and help us build better products for everyone.

See the full list of amazing projects on github