Kubernetes

Setup

Please refer to the kube prometheus stack and it’s helm chart on how to install prometheus in your k8s environment.

Metrics and KPIs

Metric

KPI

Memory

container_memory_working_set_bytes

Memory Utilization

when limit is set

container_memory_working_set_bytes / kube_pod_container_resource_limits{resource="memory"}

when limit is not set

container_memory_working_set_bytes / kube_node_status_allocatable{resource="memory"}

CPU

container_cpu_usage_seconds_total

container_cpu_cfs_throttled_periods_total

container_cpu_cfs_periods_total

CPU Utilization

when limit is set

rate(container_cpu_usage_seconds_total[5m]) /

kube_pod_container_resource_limits{resource="cpu"}

When limit is not set

rate(container_cpu_usage_seconds_total[5m]) /

kube_node_status_allocatable{resource="cpu"}

CPU Throttle

rate(container_cpu_cfs_throttled_periods_total[5m]) / rate(container_cpu_cfs_periods_total[5m])

Network Bytes

container_network_transmit_bytes_total

container_network_receive_bytes_total

Data transfer rate

rate(container_network_transmit_bytes_total[5m])

rate(container_network_receive_bytes_total[5m])

Volume

kubelet_volume_stats_available_bytes

kubelet_volume_stats_capacity_bytes

Volume Available

kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes

PodVolumeClaim (PVC)

kube_pod_spec_volumes_persistentvolumeclaims_info

PVC Usage

kube_pod_spec_volumes_persistentvolumeclaims_info

Alerts

KPI

Alert

Memory Utilization

Saturation

CPU

Saturation

CPUThrottlingSustained

Network Bytes

ResourceRateAnomaly

Volume

KubeVolumeLow

Failure Alerts

  • KubePodCrashLooping

  • KubePodNotReady

  • KubeDeploymentGenerationMismatch

  • KubeDeploymentReplicasMismatch

  • KubeStatefulSetReplicasMismatch

  • KubeStatefulSetGenerationMismatch

  • KubeStatefulSetUpdateNotRolledOut

  • KubeDaemonSetRolloutStuck

  • KubeDaemonSetNotScheduled

  • KubeDaemonSetMisScheduled

  • KubeJobFailed

  • KubeHpaReplicasMismatch

  • KubeHpaMaxedOut

  • KubeContainerWaiting

  • KubeCronJobRunning

  • KubeJobCompletion

  • KubeCPUOvercommit

  • KubeMemoryOvercommit

  • KubeCPUQuotaOvercommit

  • KubeMemoryQuotaOvercommit

  • KubeQuotaExceeded

  • KubeAPILatencyHigh

  • KubeAPIErrorsHigh

  • KubeClientCertificateExpiration

  • KubeClientCertificateExpiration

  • KubeAPIDown

  • KubeAPIGone

  • KubeNodeNotReady

  • KubeNodeUnreachable

  • KubeletTooManyPods

  • KubeNodeReadinessFlapping

  • KubeletPlegDurationHigh

  • KubeletPodStartUpLatencyHigh

  • KubeletDown

  • KubeletGone

Dashboard

All workloads running on k8s are discovered as a Service in Asserts and have a Service KPI Dashboard. For each instance in the Service there is also a ServiceInstance dashboard. These dashboards show the key resource KPIs for memory, cpu, disk and network bytes transmitted. In addition to the Service KPI dashboard, the following additional dashboards are also available

Cluster

Namespace

Pod

Node

Proxy

API Server

Last updated