Kubernetes

Setup

Please refer to the kube prometheus stack and it’s helm chart on how to install prometheus in your k8s environment.

Metrics and KPIs

Metric

KPI

Memory

container_memory_working_set_bytes

Memory Utilization

when limit is set

container_memory_working_set_bytes / kube_pod_container_resource_limits{resource="memory"}

when limit is not set

container_memory_working_set_bytes / kube_node_status_allocatable{resource="memory"}

CPU

container_cpu_usage_seconds_total

container_cpu_cfs_throttled_periods_total

container_cpu_cfs_periods_total

CPU Utilization

when limit is set

rate(container_cpu_usage_seconds_total[5m]) /

kube_pod_container_resource_limits{resource="cpu"}

When limit is not set

rate(container_cpu_usage_seconds_total[5m]) /

kube_node_status_allocatable{resource="cpu"}

CPU Throttle

rate(container_cpu_cfs_throttled_periods_total[5m]) / rate(container_cpu_cfs_periods_total[5m])

Network Bytes

container_network_transmit_bytes_total

container_network_receive_bytes_total

Data transfer rate

rate(container_network_transmit_bytes_total[5m])

rate(container_network_receive_bytes_total[5m])

Volume

kubelet_volume_stats_available_bytes

kubelet_volume_stats_capacity_bytes

Volume Available

kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes

PodVolumeClaim (PVC)

kube_pod_spec_volumes_persistentvolumeclaims_info

PVC Usage

kube_pod_spec_volumes_persistentvolumeclaims_info

Alerts

KPI

Alert

Memory Utilization

Saturation

CPU

Saturation

CPUThrottlingSustained

Network Bytes

ResourceRateAnomaly

Volume

KubeVolumeLow

Failure Alerts

KubePodCrashLooping
KubePodNotReady
KubeDeploymentGenerationMismatch
KubeDeploymentReplicasMismatch
KubeStatefulSetReplicasMismatch
KubeStatefulSetGenerationMismatch
KubeStatefulSetUpdateNotRolledOut
KubeDaemonSetRolloutStuck
KubeDaemonSetNotScheduled
KubeDaemonSetMisScheduled
KubeJobFailed
KubeHpaReplicasMismatch
KubeHpaMaxedOut
KubeContainerWaiting
KubeCronJobRunning
KubeJobCompletion
KubeCPUOvercommit
KubeMemoryOvercommit
KubeCPUQuotaOvercommit
KubeMemoryQuotaOvercommit
KubeQuotaExceeded
KubeAPILatencyHigh
KubeAPIErrorsHigh
KubeClientCertificateExpiration
KubeClientCertificateExpiration
KubeAPIDown
KubeAPIGone
KubeNodeNotReady
KubeNodeUnreachable
KubeletTooManyPods
KubeNodeReadinessFlapping
KubeletPlegDurationHigh
KubeletPodStartUpLatencyHigh
KubeletDown
KubeletGone

Dashboard

All workloads running on k8s are discovered as a Service in Asserts and have a Service KPI Dashboard. For each instance in the Service there is also a ServiceInstance dashboard. These dashboards show the key resource KPIs for memory, cpu, disk and network bytes transmitted. In addition to the Service KPI dashboard, the following additional dashboards are also available

Kubernetes

Setup

Metrics and KPIs

Alerts

Failure Alerts

Dashboard

Cluster

Namespace

Pod

Node

Proxy

API Server