Overview

In the previous post, I pulled together the overall IDP structure using Helm chart-based project templates and ArgoCD ApplicationSet. This post covers how I added Prometheus and Grafana for metrics, plus Loki for centralized log collection and analysis. Together, they form the monitoring stack for the homelab Kubernetes cluster.

Grafana

The Need for Monitoring

When operating a homelab Kubernetes cluster, I need to keep an eye on node and pod status, resource usage such as CPU and memory, whether applications are behaving normally, and the logs that help explain failures. For that, I used the following tools.

What is Prometheus?

Prometheus is an open-source monitoring system that started at SoundCloud in 2012 and joined the CNCF (Cloud Native Computing Foundation) in 2016. It collects and stores metrics in a time-series database and allows data querying and analysis through a powerful query language called PromQL. It is the most widely used monitoring tool in Kubernetes environments.

What is Grafana?

Grafana is an open-source data visualization platform developed by Torkel Ödegaard in 2014. It can integrate with various data sources like Prometheus, Loki, and Elasticsearch to build dashboards, and provides an intuitive UI and rich visualization options to effectively present monitoring data.

What is Loki?

Loki is a log aggregation system developed by Grafana Labs in 2018. Inspired by Prometheus, it uses label-based indexing to collect and store logs, and enables resource-efficient log management by indexing only metadata rather than full log content.

Installing Kube-Prometheus-Stack

Installing and wiring Prometheus and Grafana separately felt like more setup than I wanted, so I used Kube-Prometheus-Stack instead. As in the earlier posts, I kept the deployment in the same GitOps flow.

1. Creating Directory and File Structure

mkdir -p k8s-resource/apps/kube-prometheus-stack/templates
cd k8s-resource/apps/kube-prometheus-stack

2. Creating Chart.yaml

The Chart.yaml file looked like this:

apiVersion: v2
name: kube-prometheus-stack
description: kube-prometheus-stack chart for Kubernetes
type: application
version: 1.0.0
appVersion: "v0.79.2"
dependencies:
    - name: kube-prometheus-stack
      version: "68.1.0"
      repository: "https://prometheus-community.github.io/helm-charts"

This configuration uses version 68.1.0 of the kube-prometheus-stack chart provided by the Prometheus Community. The chart includes the main monitoring components I needed, including Prometheus, Grafana, Alertmanager, Node Exporter, and Kube State Metrics.

3. Creating values.yaml

The values.yaml file I used looked like this:

kube-prometheus-stack:
    alertmanager:
        enabled: false

    grafana:
        enabled: true
        adminPassword: prom-operator
        persistence:
            enabled: true
            size: 5Gi
        resources:
            requests:
                cpu: 200m
                memory: 512Mi
            limits:
                cpu: 500m
                memory: 1Gi
        ingress:
            enabled: false
        grafana.ini:
            auth:
                disable_login_form: true
            auth.anonymous:
                enabled: true
                org_role: Admin
        additionalDataSources:
            - name: Loki
              type: loki
              url: http://loki-stack.loki-stack.svc.cluster.local:3100

    prometheus:
        enabled: true
        ingress:
            enabled: false
        prometheusSpec:
            retention: 5d
            resources:
                requests:
                    cpu: 500m
                    memory: 2Gi
                limits:
                    cpu: 1
                    memory: 2Gi
            storageSpec:
                volumeClaimTemplate:
                    spec:
                        resources:
                            requests:
                                storage: 20Gi

    prometheusOperator:
        enabled: true
        resources:
            requests:
                cpu: 100m
                memory: 128Mi
            limits:
                cpu: 200m
                memory: 256Mi

    kubeStateMetrics:
        enabled: true
        resources:
            requests:
                cpu: 100m
                memory: 128Mi
            limits:
                cpu: 200m
                memory: 256Mi

    nodeExporter:
        enabled: true
        resources:
            requests:
                cpu: 100m
                memory: 128Mi
            limits:
                cpu: 200m
                memory: 256Mi

    thanosRuler:
        enabled: false

The main points of this configuration are:

  • Alertmanager: Disabled to conserve resources since an alerting system is not essential in a homelab environment.
  • Grafana: Configured to allow anonymous access so dashboards can be viewed without login, and pre-adds the Loki data source to enable log querying.
  • Prometheus: Data retention period is set to 5 days to limit disk usage, and 20Gi storage is allocated.
  • Resource Limits: Appropriate CPU and memory limits are set for each component to efficiently use cluster resources.

4. Configuring Ingress

For access through Traefik, I used the following templates/ingressroute.yaml:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
    name: prometheus-grafana-route
    namespace: kube-prometheus-stack
spec:
    entryPoints:
        - intweb
        - intwebsec
    routes:
        - kind: Rule
          match: Host(`prometheus.injunweb.com`)
          services:
              - name: kube-prometheus-stack-prometheus
                port: 9090
        - kind: Rule
          match: Host(`grafana.injunweb.com`)
          services:
              - name: kube-prometheus-stack-grafana
                port: 80

This IngressRoute uses the intweb and intwebsec entry points, so it is only reachable from the internal network. prometheus.injunweb.com routes to the Prometheus server, and grafana.injunweb.com routes to Grafana.

5. Committing Changes and Deploying

Once those files were ready, I committed them to the Git repository:

git add .
git commit -m "Add kube-prometheus-stack configuration"
git push

After I pushed the changes, ArgoCD deployed Kube-Prometheus-Stack automatically. I checked the installation status with the following command:

kubectl get pods -n kube-prometheus-stack

Once the installation finished, the output looked like this:

NAME                                                       READY   STATUS    RESTARTS   AGE
kube-prometheus-stack-grafana-7dc95d688d-vwm6j             3/3     Running   0          2m
kube-prometheus-stack-kube-state-metrics-c6d6bc845-zrdbp   1/1     Running   0          2m
kube-prometheus-stack-operator-5dc88c8847-9xp6g            1/1     Running   0          2m
kube-prometheus-stack-prometheus-node-exporter-4jlnz       1/1     Running   0          2m
kube-prometheus-stack-prometheus-node-exporter-7m8nj       1/1     Running   0          2m
kube-prometheus-stack-prometheus-node-exporter-c445j       1/1     Running   0          2m
prometheus-kube-prometheus-stack-prometheus-0              2/2     Running   0          2m

Installing Loki-Stack

After the metrics side was working, I added Loki-Stack for log collection and analysis. Loki fit nicely here because its label-based model is similar to Prometheus.

1. Creating Directory and File Structure

mkdir -p k8s-resource/apps/loki-stack/templates
cd k8s-resource/apps/loki-stack

2. Creating Chart.yaml

The Chart.yaml file for Loki looked like this:

apiVersion: v2
name: loki-stack
description: loki-stack chart for Kubernetes
type: application
version: 1.0.0
appVersion: "v2.9.3"
dependencies:
    - name: loki-stack
      version: "2.10.2"
      repository: "https://grafana.github.io/helm-charts"

3. Creating values.yaml

The values.yaml I used for Loki looked like this:

loki-stack:
    loki:
        enabled: true
        persistence:
            enabled: true
            size: 20Gi
        config:
            limits_config:
                enforce_metric_name: false
                reject_old_samples: true
                reject_old_samples_max_age: 168h
            schema_config:
                configs:
                    - from: 2025-01-16
                      store: boltdb-shipper
                      object_store: filesystem
                      schema: v11
                      index:
                          prefix: index_
                          period: 24h
        resources:
            requests:
                cpu: 200m
                memory: 256Mi
            limits:
                cpu: 1000m
                memory: 1Gi

    promtail:
        enabled: true
    grafana:
        enabled: false
    prometheus:
        enabled: false
    filebeat:
        enabled: false
    fluent-bit:
        enabled: false
    logstash:
        enabled: false
    serviceMonitor:
        enabled: true

The main parts of this configuration were:

  • Loki: Allocates 20Gi storage for storing log data and is configured to reject logs older than 7 days (168 hours) to manage disk usage.
  • Promtail: Enables the DaemonSet agent that collects container logs from each node and sends them to Loki.
  • Grafana, Prometheus: Disabled since they were already installed with Kube-Prometheus-Stack.
  • ServiceMonitor: Enables ServiceMonitor so Prometheus can collect Loki metrics.

4. Committing Changes and Deploying

Once the configuration was ready, I committed and pushed it:

git add .
git commit -m "Add Loki-Stack configuration"
git push

After the installation finished, I verified it with the following command:

kubectl get pods -n loki-stack
NAME                            READY   STATUS    RESTARTS   AGE
loki-stack-0                    1/1     Running   0          2m
loki-stack-promtail-xxxxx       1/1     Running   0          2m
loki-stack-promtail-yyyyy       1/1     Running   0          2m

Accessing the Monitoring System

On my local machine, I updated the hosts file so I could reach Grafana and Prometheus directly:

192.168.0.200 prometheus.injunweb.com grafana.injunweb.com

You can now access the following URLs in your web browser:

  • Grafana: http://grafana.injunweb.com
  • Prometheus: http://prometheus.injunweb.com

Using Grafana Dashboards

Kube-Prometheus-Stack already shipped with several useful dashboards, which made Grafana immediately useful. In Grafana, I mostly browsed the preconfigured dashboards from the left-side “Dashboards” menu.

In particular, the dashboards under “Kubernetes / Compute Resources” in the “General” folder were very useful for understanding cluster CPU, memory, and network usage at the namespace, pod, and container level. The “Node Exporter” dashboards showed detailed hardware-level metrics for each node such as disk I/O, network traffic, and system load, which helped with infrastructure monitoring.

Exploring Logs with Loki

You can centrally explore all container logs in the cluster using the Loki data source in Grafana. Loki uses a query language called LogQL to filter and search logs.

Basic Log Queries

For logs, I mostly used Grafana’s “Explore” view with the Loki data source. These were the query patterns I found most useful:

Viewing logs for a specific namespace:

{namespace="kube-system"}

Viewing logs for a specific pod:

{namespace="argocd", pod=~"argocd-server.*"}

Filtering only error logs:

{namespace="traefik"} |= "error"

Viewing logs within a specific time range:

{namespace="default"} |= "timeout" | json

Through Loki, you can centrally manage logs distributed across multiple pods and nodes, and quickly identify the cause of issues using LogQL queries.

Conclusion

This post covered how I added Kube-Prometheus-Stack and Loki-Stack to the homelab Kubernetes cluster so I could see both metrics and logs in one place.

This concludes the homelab Kubernetes series. At this point, I had completed a full homelab Kubernetes environment, covering everything from the basic cluster setup to an ArgoCD GitOps environment, Longhorn distributed storage, the Traefik ingress controller, Vault secret management, a CI/CD pipeline, and a monitoring system built with Prometheus, Grafana, and Loki. With this infrastructure as a foundation, I could test and develop various projects in a production-like Kubernetes environment without the cost burden of cloud services.