Links

Redis

Setup

Prometheus metrics for redis can be enabled using redis exporter. Once the exporter is setup check the following metrics to verify the setup:
  • redis_up
  • redis_uptime_in_seconds

Metrics

Request, Errors, and Latency

Metric
KPI
Request Counter
redis_commands_total
redis_connections_received_total
Request Rate
rate(redis_commands_total[5m])
rate(redis_connections_received_total[5m])
Error Counter
redis_rejected_connections_total
Error Ratio
rate(request_connections[5m])/ rate(redis_connections_received_total[5m])
Latency Counter
redis_commands_duration_seconds_total
Latency Average
rate(redis_commands_duration_seconds_total[5m])/ rate(redis_commands_total[5m])

Resource

Metric
KPI
CPU Usage
redis_cpu_user_seconds_total
redis_cpu_sys_seconds_total
rate(redis_cpu_user_seconds_total[5m]) + rate(redis_cpu_sys_seconds_total[5m])
Memory Usage
redis_memory_used_rss_bytes
redis_memory_used_rss_bytes / redis_memory_max_bytes
Network Bytes Received
redis_net_input_bytes_total
Network Bytes Transmitted
redis_net_output_bytes_total
Data transfer rate
rate(redis_net_input_bytes_total[5m)
rate(redis_net_output_bytes_total[5m)
Current Redis Connected clients
redis_connected_clients
redis_connected_clients / redis_config_maxclients

Alerts

KPI
Alert
Request Rate
RequestRateAnomaly
Error Ratio
ErrorRatioBreach and ErrorBuildup based on an availability SLO of 99.9
Latency Average
LatencyAverageBreach and LatencyAverageAnomaly
CPU Usage
Saturation with severity level of warning and critical when cpu utilization exceeds 70% and 90% respectively
Memory Usage
Saturation with severity level of warning and critical when memory utilization exceeds 65% and 75% respectively
Network Bytes
ResourceRateAnomaly
Client Connections
Saturation with severity level of warning and critical when it exceeds 80% and 90% respectively
ResourceMayExhaust if connections are about to exceed the limit of 256 connections within the next 4 hours

Failure Alerts

RedisDown
Redis instance is down
1redis_up != 1
RedisUptimeReset
Redis instance restarted
1delta(redis_uptime_in_seconds[5m]) < 0
RedisMasterLinkDown
Redis master link down
1( 2 avg_over_time(redis_master_link_up[10m]) 3 and on (instance) 4 redis_instance_info{role="slave"} 5) == 0
RedisReplicationBroken
Redis instance lost a replica
1delta(redis_connected_slaves[1m]) < 0
RedisClusterFlapping
Changes have been detected in Redis replica connection
1changes(redis_connected_slaves[5m]) > 2
RedisRejectedConnections
Some connections to Redis have been rejected
1rate(redis_rejected_connections_total[1m]) * 60 > 0
RedisMissingMaster
Redis Master Missing
1count by (job, service, redis_mode, namespace,) 2 (redis_instance_info{role="master"}) == 0
RedisTooManyMasters
Standalone and HA setup should only have one master
1count by (job, service, namespace) 2 (redis_instance_info{role="master", redis_mode="standalone"}) > 1
RedisTooFewMastersInCluster
Redis cluster mode should have every instance in the role of "master"
1avg by (job, service, namespace) (redis_cluster_size) 2- 3count by (job, service, namespace) 4 (redis_instance_info{role="master", redis_mode="cluster"}) 5> 0

KPI Dashboard

Redis KPI Dashboard shows all the above mentioned KPIs