Kafka
Kafka Server
Setup
Kafka server can be enabled for Prometheus metrics either using an external exporter or JMX exporter.
Kafka exporter can be set up using instructions mentioned here - Exporter
docker run -ti --rm -p 9308:9308 danielqsj/kafka-exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...]
JMX Exporter can be set up and configured using JMX Exporter , while launching the Kafka server you can use the below command to launch
java -javaagent:./jmx_prometheus_javaagent-0.16.1.jar=8080:config.yaml -jar yourJar.jar
An alternative way to configure in case Kafka is running using script
KAFKA_OPTS="$KAFKA_OPTS -javaagent:./jmx_prometheus_javaagent-0.16.1.jar=8080:./kafka-2_0_0.yml" kafka-server-start /usr/local/etc/kafka/server.properties
After the Kafka server is successfully enabled for the Prometheus metric, you can verify whether you are able to see some of the following metrics in Prometheus.
kafka_topic_partitions{topic="__consumer_offsets"}
kafka_topic_partition_current_offset gauge kafka_topic_partition_current_offset{partition="0",topic="__consumer_offsets"}
RED Metrics KPI
Request Rate
Asserts will automatically track the following list of Key performance indicators for your Request, Error, and Duration, aka RED metrics.
Kafka JMX RED Metrics KPI
Producer Requests
rate(kafka_server_brokertopicmetrics_totalproducerequests_total[5m])
Producer Records
rate(kafka_server_brokertopicmetrics_messagesin_total[5m])
Consumer Requests
rate(kafka_server_brokertopicmetrics_totalfetchrequestspersec_count{topic!=""}[5m])
Kafka Exporter RED Metrics KPI
Produced Messages
avg_over_time((delta(kafka_topic_partition_current_offset{topic!=""}[1m]) > 0 or delta(kafka_topic_partition_current_offset{topic!=""}[1m]) * 0) / 60 [5m])
Consumed Messages
avg_over_time((delta(kafka_consumergroup_current_offset{topic!=""}[1m]) > 0 or delta(kafka_consumergroup_current_offset{topic!=""}[1m]) * 0) / 60 [5m])
Error Ratio
Producer Errors
rate(kafka_server_brokertopicmetrics_failedproducerequests_total{topic!=""}[5m])/rate(kafka_server_brokertopicmetrics_totalproducerequests_total[5m])
Consumer Errors
rate(kafka_server_brokertopicmetrics_total_failedfetchrequestspersec_count{topic!=""}[5m])/ rate(kafka_server_brokertopicmetrics_totalfetchrequestspersec_count{topic!=""}[5m])
Latency
P99 - Consumer Request
kafka_network_requestmetrics_totaltimems{request="Fetch", quantile="0.99"} / 1000
P99 - Consumer Group
kafka_network_requestmetrics_totaltimems{request=~".*Group", quantile="0.99"}) / 1000
P99 - Producer Request
kafka_network_requestmetrics_totaltimems{request="Produce",quantile="0.99"} / 1000
P99 - Broker Request
kafka_controller_controllerchannelmanager_requestrateandqueuetimems{quantile="0.99"} /1000
RED Metrics Alerts
Asserts automatically tracks the short-term and long-term trends for request and latency for Anomaly detection. Similarly, thresholds can be set for Latency averages and P99 to record breaches. Error Ratios are tracked against availability goals (default, 99.9%) and breaches (default, 10%)
KPI | Alerts |
Request Rate | RequestRateAnomaly |
Error Ratio | ErrorRatioBreach ErrorBuildup - availability goal 99.9 % |
Latency P99 | LatencyP99ErrorBuildup |
Failure Alerts
KafkaTopicsUnderReplicatedPartitions
kafka_topic_partition_under_replicated_partition > 0
KafkaOfflinePartitions
kafka_controller_kafkacontroller_offlinepartitionscount > 0
KafkaActiveController
kafka_controller_kafkacontroller_activecontrollercount != 1
KafkaUnderMinIsrPartitions
kafka_cluster_partition_underminisr > 0
Dashboards
The below dashboard shows information about Kafka server metrics
Messages Produced
Messages Consumes
Lag by Consumer
Partitions for Topics
Kafka Client
Setup
JMX Exporter can be set up and configured using JMX Exporter , while launching the Kafka client you can use the below command to launch
java -javaagent:./jmx_prometheus_javaagent-0.16.1.jar=8080:config.yaml -jar yourJar.jar
You can check whether following prometheus metrics are available to confirm Kafka client is instrumented
kafka_producer_topic_record_send_total
kafka_producer_record_send_total
kafka_consumer_records_consumed_total_records_total
kafka_consumer_fetch_manager_bytes_consumed_total
RED Metrics - Producer
Requests
Producer Record
rate(kafka_producer_record_send_total[5m])
Producer Requests
rate(kafka_producer_request_total[5m])
Error Ratio
Producer Record
rate(kafka_producer_record_error_total[5m])
/
rate(kafka_producer_record_send_total[5m])
Latency
Average
max without(asserts_request_context)(kafka_producer_request_latency_avg/1000)
RED Metrics - Consumer
Requests
Consumer Record
rate(kafka_consumer_records_consumed_total_records_total[5m])
Consumer Requests
rate(kafka_consumer_fetch_total_requests_total[5m])
Consumer Fetch Requests
rate(kafka_consumer_fetch_manager_fetch_total[5m])
Consumer Fetch Record
rate(kafka_consumer_fetch_manager_records_consumed_total[5m])
Latency
Average
max without(asserts_request_context) (kafka_producer_request_latency_avg/1000)
Alerts
KPI | Alerts |
Request Rate | RequestRateAnomaly |
Error Ratio | ErrorRatioAnomaly ErrorRatioBreach |
Latency Average | LatencyAverageBreach LatencyAverageAnomaly |
Dashboards
The following dashboard captures information about both producer and consumer of Kafka client.
It showcases the following information
Topics connected to producer/consumer
Producer records
Producer requests
Producer latency
Consumer records
Consumer Lag
Last updated