Search…
⌃K
Links

Kafka

Kafka Server

Setup

Kafka server can be enabled for Prometheus metrics either using an external exporter or JMX exporter.
  • Kafka exporter can be set up using instructions mentioned here - Exporter
docker run -ti --rm -p 9308:9308 danielqsj/kafka-exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...]
  • JMX Exporter can be set up and configured using JMX Exporter , while launching the Kafka server you can use the below command to launch
java -javaagent:./jmx_prometheus_javaagent-0.16.1.jar=8080:config.yaml -jar yourJar.jar
After the Kafka server is successfully enabled for the Prometheus metric, you can verify whether you are able to see some of the following metrics in Prometheus.
1# HELP kafka_topic_partitions Number of partitions for this Topic 2# TYPE kafka_topic_partitions gauge 3kafka_topic_partitions{topic="__consumer_offsets"} 50 4 5# HELP kafka_topic_partition_current_offset Current Offset of a Broker at Topic/Partition 6# TYPE kafka_topic_partition_current_offset gauge 7kafka_topic_partition_current_offset{partition="0",topic="__consumer_offsets"} 0 8 9# HELP kafka_topic_partition_oldest_offset Oldest Offset of a Broker at Topic/Partition 10# TYPE kafka_topic_partition_oldest_offset gauge 11kafka_topic_partition_oldest_offset{partition="0",topic="__consumer_offsets"} 0

Metrics

Metric
Key Performance Indicator(KPI)
Requests
kafka_server_brokertopicmetrics_totalproducerequests_total
kafka_server_brokertopicmetrics_messagesin_total
kafka_server_brokertopicmetrics_totalfetchrequests_total
kafka_topic_partition_current_offset
Request Rate
rate(kafka_server_brokertopicmetrics_totalproducerequests_total[5m])
rate(kafka_server_brokertopicmetrics_messagesin_total[5m])
Errors
kafka_server_brokertopicmetrics_failedfetchrequests_total
kafka_server_brokertopicmetrics_failedproducerequests_total
Error Ratio
rate(kafka_server_brokertopicmetrics_failedfetchrequests_total[5m])/ rate(kafka_server_brokertopicmetrics_totalproducerequests_total[5m])
Latency
kafka_network_requestmetrics_totaltimems
Latency P99
kafka_network_requestmetrics_totaltimems{request="Produce", quantile="0.99"} / 1000

Alerts

KPI
Alerts
KPI
Alerts
Request Rate
RequestRateAnomaly
Error Rate
ErrorRatioBreach
ErrorBuildup based on a 99.9 SLO
Latency P99
LatencyP99ErrorBuildup

Failure Alerts

KafkaTopicsUnderReplicatedPartitions
Kafka Partition is not replicate as expected
kafka_topic_partition_under_replicated_partition > 0
KafkaOfflinePartitions
When Kafka partitions are offline
kafka_controller_kafkacontroller_offlinepartitionscount > 0
KafkaActiveController
Kafka controller is not active/offline
kafka_controller_kafkacontroller_activecontrollercount != 1
KafkaUnderMinIsrPartitions
Kafka partitions are under the expected in-sync replicas
kafka_cluster_partition_underminisr > 0

Dashboards

The below dashboard shows information about Kafka server metrics
  • Messages Produced
  • Messages Consumes
  • Lag by Consumer
  • Partitions for Topics

Kafka Client

Setup

JMX Exporter can be set up and configured using JMX Exporter , while launching the Kafka client you can use the below command to launch
java -javaagent:./jmx_prometheus_javaagent-0.16.1.jar=8080:config.yaml -jar yourJar.jar
You can check whether following prometheus metrics are available to confirm Kafka client is instrumented
  • kafka_producer_topic_record_send_total
  • kafka_producer_record_send_total
  • kafka_consumer_records_consumed_total_records_total
  • kafka_consumer_fetch_manager_bytes_consumed_total

Metrics - Producer

Metric
Key Performance Indicator(KPI)
Requests
kafka_producer_record_send_total
kafka_producer_topic_record_send_total
Request Rate
rate(kafka_producer_record_send_total[5m])
rate(kafka_producer_topic_record_send_total[5m])
Errors
kafka_producer_record_error_total
Error Ratio
rate(kafka_producer_record_error_total[5m])/ rate(kafka_producer_record_send_total[5m])
Latency
kafka_producer_request_latency_avg
Latency Average
kkafka_producer_request_latency_avg/ 1000

Metrics - Consumer

Metric
Key Performance Indicator(KPI)
Metric
Key Performance Indicator(KPI)
Requests
kafka_consumer_records_consumed_total_records_total
kafka_consumer_fetch_total_requests_total
kafka_consumer_fetch_manager_fetch_total
Request Rate
rate(kafka_consumer_fetch_total_requests_total[5m])
rate(kafka_consumer_fetch_total_requests_total[5m])
rate(kafka_consumer_fetch_manager_fetch_total[5m])
Latency
kafka_consumer_fetch_latency_avg_seconds
kafka_consumer_fetch_manager_fetch_latency_avg
Latency Average
kafka_consumer_fetch_latency_avg_seconds
kafka_consumer_fetch_manager_fetch_latency_avg / 1000

Alerts

KPI
Alerts
Request Rate
RequestRateAnomaly
Error Ratio
ErrorRatioAnomaly ErrorRatioBreach
Latency Average
LatencyAverageBreach
LatencyAverageAnomaly

Dashboards

The following dashboard captures information about both producer and consumer of Kafka client.
It showcases the following information
  • Topics connected to producer/consumer
  • Producer records
  • Producer requests
  • Producer latency
  • Consumer records
  • Consumer Lag