Troubleshooting¶
Common issues encountered during bootstrap and day-to-day operations.
Minikube¶
Certificate error on start¶
Cause: Stale certificates from a previous install.
Fix:
Flux¶
Kustomization stuck on old SHA¶
Cause: Flux hasn't pulled the latest commit yet (default interval is 1h).
Fix: Force an immediate pull:
kustomize build failed: no such file or directory¶
Cause: clusters/local/kustomization.yaml is missing — Kustomize doesn't know which files to apply.
Fix: Ensure clusters/local/kustomization.yaml exists and lists all Flux Kustomization files:
resources:
- flux-system
- infrastructure.yaml
- infrastructure-configs.yaml
- observability.yaml
- network-policies.yaml
no HelmRelease objects found¶
Cause: Flux Kustomizations haven't reconciled yet, or HelmRepositories haven't been created.
Fix:
flux get ks -A # check Kustomization status first
flux get sources helm -A # check HelmRepositories exist
cert-manager¶
ClusterIssuer dry-run failed: no matches for kind "ClusterIssuer"¶
Cause: cert-manager CRDs not installed. The chart v1.15+ requires crds.enabled: true.
Fix: Ensure infrastructure/controllers/cert-manager.yaml contains:
startupapicheck Job in Error state¶
Cause: Post-install validation Job fails on local clusters due to timing. cert-manager itself is healthy — the main pods (cert-manager, webhook, cainjector) are all Running.
Fix: Disable the check for local clusters:
Cilium¶
Pods stuck in Pending after minikube start¶
Cause: No CNI installed yet. Minikube started with --network-plugin=cni but Cilium hasn't been installed.
Fix:
Connectivity test failures¶
If tests fail, check Cilium pod logs:
ingress-nginx / Networking¶
Services unreachable on macOS Docker driver¶
Cause: On macOS, the minikube Docker network (192.168.49.x) is not routable from the host. NodePort addresses are unreachable.
Fix:
1. Ensure minikube tunnel is running: sudo minikube tunnel
2. Ensure /etc/hosts points to 127.0.0.1, not $(minikube ip):
:30080 suffix): http://grafana.local
minikube tunnel drops / stops working¶
Cause: The tunnel process exited (terminal closed, session timeout, etc.).
Fix: Restart in a dedicated terminal:
Verify the external IP is 127.0.0.1:
Identity / Keycloak¶
ImagePullBackOff for bitnami/keycloak or bitnami/postgresql¶
Failed to pull image "docker.io/bitnami/keycloak:26.3.2-debian-12-r0":
manifest unknown: manifest unknown
Cause: Bitnami removed all images from Docker Hub (now paywalled). The Bitnami HelmRelease is no longer in use, but an old one may still exist in the cluster.
Fix: Delete the HelmRelease and switch to the official quay.io deployment:
kubectl delete helmrelease keycloak -n identity
kubectl delete helmrepository bitnami -n flux-system
flux reconcile kustomization identity --with-source
The current identity/kustomization.yaml uses a plain Deployment with quay.io/keycloak/keycloak:26.3.
Grafana SSO: "Failed to get token from provider"¶
Cause: Split-DNS issue. Grafana's pod cannot resolve keycloak.local because the /etc/hosts entry only exists on the macOS host, not inside Kubernetes pods.
Fix: Ensure token_url and api_url in observability/kube-prometheus-stack.yaml use the in-cluster DNS name:
token_url: http://keycloak.identity.svc.cluster.local/realms/homekube/protocol/openid-connect/token
api_url: http://keycloak.identity.svc.cluster.local/realms/homekube/protocol/openid-connect/userinfo
auth_url (browser-side redirect) should use keycloak.local.
Grafana redirects to keycloak.local:30080 (old port)¶
Cause: Grafana is still running with an old config that has :30080 in the OIDC URLs.
Fix:
# Force reconcile to pick up latest config
flux reconcile helmrelease kube-prometheus-stack -n monitoring --with-source
# Restart to apply new config
kubectl rollout restart deployment/kube-prometheus-stack-grafana -n monitoring
SOPS / Secrets¶
Flux decryption fails: sops-gpg secret not found¶
Cause: The sops-gpg Kubernetes secret was deleted (e.g. after minikube delete).
Fix: Re-add the GPG private key:
gpg --export-secret-keys --armor CF7169E94481219626AF34290D18AE7E58FB2D45 | \
kubectl create secret generic sops-gpg \
-n flux-system \
--from-file=sops.asc=/dev/stdin
Then force reconciliation:
flux reconcile kustomization identity --with-source
flux reconcile kustomization observability --with-source
sops --encrypt fails: "no matching creation rules found"¶
Cause: The file path doesn't match .sops.yaml's path_regex: \.sops\.yaml$.
Fix: The file must be named *.sops.yaml and located inside the repo directory. Encrypt from the repo root:
# Wrong — file in /tmp doesn't match path regex
sops --encrypt /tmp/my-secret.yaml
# Correct — write to repo path first, then encrypt in-place
cp /tmp/my-secret.yaml identity/my-secret.sops.yaml
sops --encrypt --in-place identity/my-secret.sops.yaml
General¶
Check all pod status¶
Check Flux controller logs¶
flux logs --all-namespaces --tail=50
flux logs --kind=HelmRelease --name=cert-manager -n cert-manager