Kubernetes
Setup
Please refer to the kube prometheus stack and it’s helm chart on how to install prometheus in your k8s environment.
Metrics and KPIs
Metric | KPI |
Memory container_memory_working_set_bytes | Memory Utilization when limit is set container_memory_working_set_bytes / kube_pod_container_resource_limits{resource="memory"} when limit is not set container_memory_working_set_bytes / kube_node_status_allocatable{resource="memory"} |
CPU container_cpu_usage_seconds_total container_cpu_cfs_throttled_periods_total container_cpu_cfs_periods_total | CPU Utilization when limit is set rate(container_cpu_usage_seconds_total[5m]) / kube_pod_container_resource_limits{resource="cpu"} When limit is not set rate(container_cpu_usage_seconds_total[5m]) / kube_node_status_allocatable{resource="cpu"} CPU Throttle rate(container_cpu_cfs_throttled_periods_total[5m]) / rate(container_cpu_cfs_periods_total[5m]) |
Network Bytes container_network_transmit_bytes_total container_network_receive_bytes_total | Data transfer rate rate(container_network_transmit_bytes_total[5m]) rate(container_network_receive_bytes_total[5m]) |
Volume kubelet_volume_stats_available_bytes kubelet_volume_stats_capacity_bytes | Volume Available kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes |
PodVolumeClaim (PVC) kube_pod_spec_volumes_persistentvolumeclaims_info | PVC Usage kube_pod_spec_volumes_persistentvolumeclaims_info |
Alerts
KPI | Alert |
Memory Utilization | Saturation |
CPU | Saturation CPUThrottlingSustained |
Network Bytes | ResourceRateAnomaly |
Volume | KubeVolumeLow |
Failure Alerts
KubePodCrashLooping
KubePodNotReady
KubeDeploymentGenerationMismatch
KubeDeploymentReplicasMismatch
KubeStatefulSetReplicasMismatch
KubeStatefulSetGenerationMismatch
KubeStatefulSetUpdateNotRolledOut
KubeDaemonSetRolloutStuck
KubeDaemonSetNotScheduled
KubeDaemonSetMisScheduled
KubeJobFailed
KubeHpaReplicasMismatch
KubeHpaMaxedOut
KubeContainerWaiting
KubeCronJobRunning
KubeJobCompletion
KubeCPUOvercommit
KubeMemoryOvercommit
KubeCPUQuotaOvercommit
KubeMemoryQuotaOvercommit
KubeQuotaExceeded
KubeAPILatencyHigh
KubeAPIErrorsHigh
KubeClientCertificateExpiration
KubeClientCertificateExpiration
KubeAPIDown
KubeAPIGone
KubeNodeNotReady
KubeNodeUnreachable
KubeletTooManyPods
KubeNodeReadinessFlapping
KubeletPlegDurationHigh
KubeletPodStartUpLatencyHigh
KubeletDown
KubeletGone
Dashboard
All workloads running on k8s are discovered as a Service in Asserts and have a Service KPI Dashboard. For each instance in the Service there is also a ServiceInstance dashboard. These dashboards show the key resource KPIs for memory, cpu, disk and network bytes transmitted. In addition to the Service KPI dashboard, the following additional dashboards are also available
Cluster
Namespace
Pod
Node
Proxy
API Server
Last updated