Comment on page
Failure
Failure alerts indicate some kind of deviation in the system's configuration from it's desired state. For e.g., when replicas are configured in a redis database there must be at least one master instance. When there are none, redis is not operating as configured. These kind of problems are reported as failure alerts. Let's write an alert rule to report this failure.
# Redis Master Missing
# Note this covers both cluster mode and HA mode, thus we are counting by redis_mode
- alert: RedisMissingMaster
expr: |-
count by (job, service, redis_mode, namespace, asserts_env, asserts_site) (
redis_instance_info{role="master"}
) == 0
for: 1m
labels:
asserts_severity: critical
asserts_entity_type: Service
asserts_alert_category: failure
The
asserts_env, asserts_site, namespace, service
and job
label have been explained earlier. Similarly the asserts_entity_type
has also been explained. To refresh, this will determine to which entity type is this alert associated in the visualization. We also have some new meta labels in this rule. Let's try to understand them.asserts_severity
This is used to indicate the severity of the problem as either
warning
or critical
. This determines the color coding in the visualization. warning
is shown in yellow and critical
in red color.asserts_alert_category
Asserts categorizes all alerts into the SAAFE model, i.e. Saturation, Amend (i.e. Configuration changes to the system), Anomaly, Failure and Error. The meta label
asserts_alert_category
is used to categorize this alert as a failure
Last modified 5mo ago