Node
Setup
The Node Exporter needs to be installed for monitoring nodes.
Metrics and Key Performance Indicators (KPI)s
Metric | KPI |
CPU node_cpu_seconds_total | 1 - avg by(instance, job)(rate(node_cpu_seconds_total{mode="idle"}[5m])) |
Memory node_memory_MemTotal_bytes node_memory_Buffers_bytes node_memory_Cached_bytes node_memory_MemFree_bytes node_memory_Slab_bytes node_vmstat_pgmajfault | Memory Utilization 1 - (buffer + cached + free + slab)/total Page Fault Rate rate(node_vmstat_pgmajfault[1m]) |
Network Bytes node_network_receive_bytes_total node_network_transmit_bytes_total | Network Byte Rate rate(node_network_receive_bytes_total[5m]) rate(node_network_transmit_bytes_total[5m]) |
Disk node_filesystem_avail_bytes node_filesystem_size_bytes Read/Write byte rate node_disk_read_bytes_total node_disk_written_bytes_total Read Time and Count node_disk_read_time_seconds_total node_disk_reads_completed_total Write Time and Count node_disk_write_time_seconds_total node_disk_writes_completed_total | Disk Utilization 1 - available bytes / size bytes Disk IO Rate rate(node_disk_read_bytes_total[5m]) rate(node_disk_written_bytes_total[5m]) Disk Average Latency rate(...time_seconds_total[5m]) / rate(..._completed_total[5m]) |
Alerts
KPI | Alert |
Memory Utilization High Memory Page Faults | Saturation with resource_type=memory:utilization Saturation with resource_type=memory:page_fault |
CPU Utilization | Saturation |
Network Bytes Rate | ResourceRateAnomaly |
Disk Utilization | Saturation |
Disk Read/Write Rate | ResourceRateAnomaly |
Disk Read/Write Latency Average | Saturation when latency average breaches 100ms |
Dashboard
The following KPIs are shown in the Node Dashboard
Last updated