Skip to content

Monitoring Stack

Overview

The Monitoring stack provides observability and visualization of the infrastructure through metrics collection, log aggregation, and interactive dashboards. It includes Prometheus for metrics, Grafana for visualization, Loki for log aggregation, and Uptime Kuma for service availability monitoring.

Components

Prometheus

  • Image: quay.io/prometheus/prometheus:v3.9.1
  • Purpose: Metrics collection and storage
  • Container Name: prometheus
  • Access: https://prometheus.{{ main_domain }}
  • Data Retention: 1 year

Grafana

  • Image: grafana/grafana-oss:12.3.1
  • Purpose: Metrics visualization and dashboards
  • Container Name: grafana
  • Access: https://grafana.{{ main_domain }}

Loki

  • Image: Configured via Grafana Alloy
  • Purpose: Log aggregation and storage
  • Configuration: /{{ docker_mounts_directory }}/monitoring/loki/loki-config.yaml

Uptime Kuma

  • Purpose: Service availability and health monitoring
  • Access: https://uptime.{{ main_domain }}
  • Integration: Provides push-based health check endpoints for services

Key Features

  • Metrics Collection: Scrapes Prometheus endpoints from all monitored services
  • Custom Recording Rules: Node-level recording rules in node_rules.yml
  • Log Aggregation: Centralized log collection via Loki and Promtail
  • Service Monitoring: HTTP, JSON-query, and push-based monitors
  • Historical Data: 1-year retention of metrics for long-term trend analysis
  • Prometheus Metrics Export: Traefik metrics are scraped for API gateway insights

Used By

Multiple stacks report metrics and logs to the Monitoring stack:

  • Nextcloud: Exports health status and background job completion
  • Minecraft: Exports server performance metrics
  • Streaming (Jellyfin): Performance and transcode metrics
  • Matrix: Server performance and federation metrics
  • Any service with Prometheus exporters: Automatically discovered via labels

Network Configuration

  • monitoring network: Internal network for monitoring components
  • web network: Grafana and Prometheus dashboards exposed via Traefik

Deployment Notes

  • Deploy after Backbone stack for Traefik metrics integration
  • Prometheus configuration is templated from prometheus.yml and node_rules.yml
  • Loki requires specific ownership (UID 10001)
  • Uptime Kuma uses root permissions (known limitation)
  • Services register monitors dynamically via Ansible
  • Data stored in /mnt/storage/prometheus/ for persistence

Services can export metrics via Prometheus endpoints and are discovered through Traefik labels.