Redis

Setup

Prometheus metrics for redis can be enabled using redis exporter. Once the exporter is setup check the following metrics to verify the setup:

  • redis_up

  • redis_uptime_in_seconds

Metrics

Request, Errors, and Latency

Metric

KPI

Request Counter

redis_commands_total

redis_connections_received_total

Request Rate

rate(redis_commands_total[5m])

rate(redis_connections_received_total[5m])

Error Counter

redis_rejected_connections_total

Error Ratio

rate(request_connections[5m])/ rate(redis_connections_received_total[5m])

Latency Counter

redis_commands_duration_seconds_total

Latency Average

rate(redis_commands_duration_seconds_total[5m])/ rate(redis_commands_total[5m])

Resource

Metric

KPI

CPU Usage

redis_cpu_user_seconds_total

redis_cpu_sys_seconds_total

rate(redis_cpu_user_seconds_total[5m]) + rate(redis_cpu_sys_seconds_total[5m])

Memory Usage

redis_memory_used_rss_bytes

redis_memory_used_rss_bytes / redis_memory_max_bytes

Network Bytes Received

redis_net_input_bytes_total

Network Bytes Transmitted

redis_net_output_bytes_total

Data transfer rate

rate(redis_net_input_bytes_total[5m)

rate(redis_net_output_bytes_total[5m)

Current Redis Connected clients

redis_connected_clients

redis_connected_clients / redis_config_maxclients

Alerts

KPI

Alert

Request Rate

RequestRateAnomaly

Error Ratio

ErrorRatioBreach and ErrorBuildup based on an availability SLO of 99.9

Latency Average

LatencyAverageBreach and LatencyAverageAnomaly

CPU Usage

Saturation with severity level of warning and critical when cpu utilization exceeds 70% and 90% respectively

Memory Usage

Saturation with severity level of warning and critical when memory utilization exceeds 65% and 75% respectively

Network Bytes

ResourceRateAnomaly

Client Connections

Saturation with severity level of warning and critical when it exceeds 80% and 90% respectively

ResourceMayExhaust if connections are about to exceed the limit of 256 connections within the next 4 hours

Failure Alerts

RedisDown

Redis instance is down

1redis_up != 1

RedisUptimeReset

Redis instance restarted

1delta(redis_uptime_in_seconds[5m]) < 0

RedisMasterLinkDown

Redis master link down

1( 2 avg_over_time(redis_master_link_up[10m]) 3 and on (instance) 4 redis_instance_info{role="slave"} 5) == 0

RedisReplicationBroken

Redis instance lost a replica

1delta(redis_connected_slaves[1m]) < 0

RedisClusterFlapping

Changes have been detected in Redis replica connection

1changes(redis_connected_slaves[5m]) > 2

RedisRejectedConnections

Some connections to Redis have been rejected

1rate(redis_rejected_connections_total[1m]) * 60 > 0

RedisMissingMaster

Redis Master Missing

1count by (job, service, redis_mode, namespace,) 2 (redis_instance_info{role="master"}) == 0

RedisTooManyMasters

Standalone and HA setup should only have one master

1count by (job, service, namespace) 2 (redis_instance_info{role="master", redis_mode="standalone"}) > 1

RedisTooFewMastersInCluster

Redis cluster mode should have every instance in the role of "master"

1avg by (job, service, namespace) (redis_cluster_size) 2- 3count by (job, service, namespace) 4 (redis_instance_info{role="master", redis_mode="cluster"}) 5> 0

KPI Dashboard

Redis KPI Dashboard shows all the above mentioned KPIs

Last updated