# Better Stack Kafka monitoring

Monitor Apache Kafka with [Better Stack collector](https://betterstack.com/docs/logs/collector/). Broker discovery and partition health out of the box, full broker internals with the Prometheus JMX exporter.

## What you get out of the box

[Install Better Stack collector](https://betterstack.com/docs/logs/collector/#getting-started) on the hosts running Kafka. The collector automatically discovers your brokers and starts collecting cluster metadata. No Kafka configuration needed:

- `kafka_brokers`: broker count
- `kafka_topic_partitions`: partitions per topic
- `kafka_topic_partition_in_sync_replica`: in-sync replicas (ISR) per partition
- `kafka_topic_partition_leader`: leader status per partition
- `kafka_topic_partition_under_replicated_partition`: under-replication status

![Kafka dashboard](https://imagedelivery.net/xZXo0QFi-1_4Zimer-T0XQ/214d7120-547c-4eed-6ed4-f8bd82a4db00/lg2x =4000x2056)

These metrics power the **Overview** and **Partitions & replication** sections of the [Kafka dashboard](https://betterstack.com/dashboards/kafka ";_blank").

The collector connects to brokers from the host network. Running Kafka in Docker? Publish the broker port and advertise a listener host clients can follow:

```yaml
[label docker-compose.yml]
services:
  kafka:
    image: apache/kafka:4.2.1
    ports:
      - "9092:9092"
    environment:
      KAFKA_LISTENERS: PLAINTEXT_HOST://0.0.0.0:9092,PLAINTEXT://0.0.0.0:19092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT_HOST://localhost:9092,PLAINTEXT://kafka:19092
```

Containers on the compose network keep connecting to `kafka:19092`.

Seeing your Kafka as **Configuration required** or **Unreachable**? Go to [Sources](https://telemetry.betterstack.com/team/0/sources ";_blank") → your collector → **Configure** → **Collect metrics**, open the Kafka target, and point it at a broker address reachable from the host, e.g. `localhost:9092`.

[warning]
#### Use the broker address, not a metrics endpoint

The Kafka target is a connection to the Kafka protocol port, typically `9092`. Don't point it at the JMX exporter port. The JMX exporter is connected separately as a Prometheus scrape target below.
[/warning]

## Get full Kafka metrics with JMX exporter

Broker-level performance metrics like throughput, request rates, controller state, and storage live in Kafka's JMX (Java Management Extensions) and aren't exposed outside the Java process by default. Deploy the Prometheus JMX exporter as a Java agent on each broker to light up the rest of the [Kafka dashboard](https://betterstack.com/dashboards/kafka ";_blank"):

- `kafka_server_brokertopicmetrics_*`: bytes and messages in/out, per topic
- `kafka_controller_kafkacontroller_*`: active controller, broker count, offline partitions
- `kafka_server_replicamanager_*`: leader count, partition count, ISR changes
- `kafka_network_requestmetrics_*`: request rates per request type
- `kafka_log_log_size`: log size per topic and partition

### Download the Java agent

Download the JMX exporter agent JAR to each Kafka broker:

```bash
[label Download JMX exporter agent]
curl -sSL -o /opt/jmx-exporter/jmx_prometheus_javaagent.jar \
  https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/1.0.1/jmx_prometheus_javaagent-1.0.1.jar
```

### Create the configuration file

Save the following configuration next to the agent JAR:

```yaml
[label /opt/jmx-exporter/jmx-kafka-config.yaml]
lowercaseOutputName: true
lowercaseOutputLabelNames: true
rules:
# Per-topic broker throughput (BytesInPerSec, MessagesInPerSec, ...)
- pattern: kafka.server<type=(.+), name=(.+), topic=(.+)><>Count
  name: kafka_server_$1_$2_total
  type: COUNTER
  labels:
    topic: "$3"
# Per-client, per-partition gauges
- pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
  name: kafka_server_$1_$2
  type: GAUGE
  labels:
    clientId: "$3"
    topic: "$4"
    partition: "$5"
- pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
  name: kafka_server_$1_$2
  type: GAUGE
  labels:
    clientId: "$3"
    broker: "$4:$5"
# Log size per topic-partition
- pattern: kafka.log<type=(.+), name=(.+), topic=(.+), partition=(.*)><>Value
  name: kafka_log_$1_$2
  type: GAUGE
  labels:
    topic: "$3"
    partition: "$4"
# Network request counters
- pattern: kafka.network<type=(.+), name=(.+), request=(.+)><>Count
  name: kafka_network_$1_$2_total
  type: COUNTER
  labels:
    request: "$3"
# Catch-alls with a single extra mbean property, kept as a label
- pattern: kafka.server<type=(.+), name=(.+), (.+)=(.+)><>Value
  name: kafka_server_$1_$2
  type: GAUGE
  labels:
    $3: "$4"
- pattern: kafka.server<type=(.+), name=(.+), (.+)=(.+)><>Count
  name: kafka_server_$1_$2_total
  type: COUNTER
  labels:
    $3: "$4"
# Generic gauges and counters
- pattern: kafka.server<type=(.+), name=(.+)><>Value
  name: kafka_server_$1_$2
  type: GAUGE
- pattern: kafka.server<type=(.+), name=(.+)><>Count
  name: kafka_server_$1_$2_total
  type: COUNTER
- pattern: kafka.controller<type=(.+), name=(.+)><>Value
  name: kafka_controller_$1_$2
  type: GAUGE
- pattern: kafka.network<type=(.+), name=(.+)><>Value
  name: kafka_network_$1_$2
  type: GAUGE
- pattern: kafka.log<type=(.+), name=(.+)><>Value
  name: kafka_log_$1_$2
  type: GAUGE
```

[info]
Rules are applied first-match-wins. Keep the specific patterns above the generic catch-alls, otherwise per-topic metric names get mangled.
[/info]

### Attach the agent to Kafka

Add the agent to Kafka's JVM options. The exporter serves Prometheus metrics on port `7071`:

[code-tabs]
```bash
[label Linux service]
# Add to the Kafka service environment, then restart Kafka
export KAFKA_OPTS="-javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent.jar=7071:/opt/jmx-exporter/jmx-kafka-config.yaml"
```
```yaml
[label Docker Compose]
services:
  kafka:
    image: apache/kafka:4.2.1
    ports:
      - "7071:7071" # JMX exporter Prometheus endpoint
    environment:
      KAFKA_OPTS: >-
        -javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent.jar=7071:/opt/jmx-exporter/jmx-kafka-config.yaml
    volumes:
      - ./jmx-exporter:/opt/jmx-exporter:ro
```
```yaml
[label Kubernetes]
# In the Kafka pod template: fetch the agent in an init container
initContainers:
  - name: jmx-exporter
    image: curlimages/curl:8.14.1
    command: ["curl", "-sSL", "-o", "/jmx-exporter/jmx_prometheus_javaagent.jar",
              "https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/1.0.1/jmx_prometheus_javaagent-1.0.1.jar"]
    volumeMounts:
      - name: jmx-exporter
        mountPath: /jmx-exporter
containers:
  - name: kafka
    env:
      - name: KAFKA_OPTS
        value: "-javaagent:/jmx-exporter/jmx_prometheus_javaagent.jar=7071:/jmx-exporter/jmx-kafka-config.yaml"
    ports:
      - containerPort: 7071
    volumeMounts:
      - name: jmx-exporter
        mountPath: /jmx-exporter
```
[/code-tabs]

Restart Kafka and verify the endpoint:

```bash
[label Verify the metrics endpoint]
curl -s http://localhost:7071/metrics | grep kafka_server
```

#### Beware of Kafka CLI tools inheriting KAFKA_OPTS

Every Kafka command-line tool started in the same environment, including Docker healthchecks like `kafka-broker-api-versions.sh`, picks up `KAFKA_OPTS`. It then tries to attach a second agent to the busy port, and crashes.

To resolve this, clear the variable for CLI invocations:

```bash
[label Healthcheck without the agent]
KAFKA_OPTS= /opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server localhost:9092
```

### Scrape the metrics with the collector

[code-tabs]
```yaml
[label Kubernetes]
# Add to the Kafka pod template; the collector discovers
# annotated pods and scrapes them automatically.
annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "7071"
  prometheus.io/path: "/metrics"
```
```text
[label Docker & other deployments]
Add the endpoint as a scrape target in Better Stack:

1. Go to Sources -> your collector -> Configure -> Collect metrics.
2. Click Collect metrics and select Custom Prometheus exporter or service.
3. Set Service name to match your Kafka service, e.g. kafka.
4. Set Endpoint to http://localhost:7071/metrics.
```
[/code-tabs]

[info]
The collector scrapes the endpoint **from the host network** every 30 seconds. When Kafka runs in a container, publish port `7071` so the endpoint is reachable on the host.
[/info]

Using the same service name as the automatically discovered Kafka service groups both metric sources under one service in Better Stack.

## Verify the configuration

Within a few minutes, metrics like `kafka_server_brokertopicmetrics_bytesinpersec_total` and `kafka_controller_kafkacontroller_activecontrollercount` appear in your collector source, and the scrape target shows as **Active** in **Configure** → **Collect metrics**.

[success]
#### Kafka metrics are now flowing into Better Stack

Check out the [Kafka dashboard](https://betterstack.com/dashboards/kafka ";_blank"): broker throughput, controller state, partition health, and storage in one place. Charts plotting rates need two scrapes before drawing the first point. Give them a minute.
[/success]

## Need help?

Please let us know at hello@betterstack.com.
We're happy to help! 🙏
