Skip to content

Troubleshooting

Common issues encountered during bootstrap and day-to-day operations.


Minikube

Certificate error on start

error: certificate apiserver-kubelet-client not signed by CA

Cause: Stale certificates from a previous install.

Fix:

minikube delete
minikube start --driver=docker --network-plugin=cni --addons=metrics-server


Flux

Kustomization stuck on old SHA

flux-system  main@sha1:011ccf77  False  ...

Cause: Flux hasn't pulled the latest commit yet (default interval is 1h).

Fix: Force an immediate pull:

flux reconcile source git flux-system
flux reconcile ks flux-system --with-source


kustomize build failed: no such file or directory

accumulation err='accumulating resources from 'infrastructure.yaml': no such file

Cause: clusters/local/kustomization.yaml is missing — Kustomize doesn't know which files to apply.

Fix: Ensure clusters/local/kustomization.yaml exists and lists all Flux Kustomization files:

resources:
  - flux-system
  - infrastructure.yaml
  - infrastructure-configs.yaml
  - observability.yaml
  - network-policies.yaml


no HelmRelease objects found

Cause: Flux Kustomizations haven't reconciled yet, or HelmRepositories haven't been created.

Fix:

flux get ks -A          # check Kustomization status first
flux get sources helm -A  # check HelmRepositories exist


cert-manager

ClusterIssuer dry-run failed: no matches for kind "ClusterIssuer"

Cause: cert-manager CRDs not installed. The chart v1.15+ requires crds.enabled: true.

Fix: Ensure infrastructure/controllers/cert-manager.yaml contains:

values:
  crds:
    enabled: true


startupapicheck Job in Error state

Cause: Post-install validation Job fails on local clusters due to timing. cert-manager itself is healthy — the main pods (cert-manager, webhook, cainjector) are all Running.

Fix: Disable the check for local clusters:

values:
  startupapicheck:
    enabled: false


Cilium

Pods stuck in Pending after minikube start

Cause: No CNI installed yet. Minikube started with --network-plugin=cni but Cilium hasn't been installed.

Fix:

cilium install
cilium status --wait

Connectivity test failures

cilium connectivity test

If tests fail, check Cilium pod logs:

kubectl logs -n kube-system -l k8s-app=cilium --tail=50


ingress-nginx / Networking

Services unreachable on macOS Docker driver

curl: (7) Failed to connect to grafana.local port 30080: Network is unreachable

Cause: On macOS, the minikube Docker network (192.168.49.x) is not routable from the host. NodePort addresses are unreachable.

Fix: 1. Ensure minikube tunnel is running: sudo minikube tunnel 2. Ensure /etc/hosts points to 127.0.0.1, not $(minikube ip):

127.0.0.1  grafana.local keycloak.local
3. Access services on port 80 (no :30080 suffix): http://grafana.local


minikube tunnel drops / stops working

Cause: The tunnel process exited (terminal closed, session timeout, etc.).

Fix: Restart in a dedicated terminal:

sudo minikube tunnel

Verify the external IP is 127.0.0.1:

kubectl get svc ingress-nginx-controller -n ingress-nginx


Identity / Keycloak

ImagePullBackOff for bitnami/keycloak or bitnami/postgresql

Failed to pull image "docker.io/bitnami/keycloak:26.3.2-debian-12-r0":
manifest unknown: manifest unknown

Cause: Bitnami removed all images from Docker Hub (now paywalled). The Bitnami HelmRelease is no longer in use, but an old one may still exist in the cluster.

Fix: Delete the HelmRelease and switch to the official quay.io deployment:

kubectl delete helmrelease keycloak -n identity
kubectl delete helmrepository bitnami -n flux-system
flux reconcile kustomization identity --with-source

The current identity/kustomization.yaml uses a plain Deployment with quay.io/keycloak/keycloak:26.3.


Grafana SSO: "Failed to get token from provider"

Cause: Split-DNS issue. Grafana's pod cannot resolve keycloak.local because the /etc/hosts entry only exists on the macOS host, not inside Kubernetes pods.

Fix: Ensure token_url and api_url in observability/kube-prometheus-stack.yaml use the in-cluster DNS name:

token_url: http://keycloak.identity.svc.cluster.local/realms/homekube/protocol/openid-connect/token
api_url:   http://keycloak.identity.svc.cluster.local/realms/homekube/protocol/openid-connect/userinfo
Only auth_url (browser-side redirect) should use keycloak.local.


Grafana redirects to keycloak.local:30080 (old port)

Cause: Grafana is still running with an old config that has :30080 in the OIDC URLs.

Fix:

# Force reconcile to pick up latest config
flux reconcile helmrelease kube-prometheus-stack -n monitoring --with-source
# Restart to apply new config
kubectl rollout restart deployment/kube-prometheus-stack-grafana -n monitoring


SOPS / Secrets

Flux decryption fails: sops-gpg secret not found

decryption failed: secret "sops-gpg" not found

Cause: The sops-gpg Kubernetes secret was deleted (e.g. after minikube delete).

Fix: Re-add the GPG private key:

gpg --export-secret-keys --armor CF7169E94481219626AF34290D18AE7E58FB2D45 | \
  kubectl create secret generic sops-gpg \
    -n flux-system \
    --from-file=sops.asc=/dev/stdin

Then force reconciliation:

flux reconcile kustomization identity --with-source
flux reconcile kustomization observability --with-source


sops --encrypt fails: "no matching creation rules found"

Cause: The file path doesn't match .sops.yaml's path_regex: \.sops\.yaml$.

Fix: The file must be named *.sops.yaml and located inside the repo directory. Encrypt from the repo root:

# Wrong — file in /tmp doesn't match path regex
sops --encrypt /tmp/my-secret.yaml

# Correct — write to repo path first, then encrypt in-place
cp /tmp/my-secret.yaml identity/my-secret.sops.yaml
sops --encrypt --in-place identity/my-secret.sops.yaml


General

Check all pod status

kubectl get pods -A

Check Flux controller logs

flux logs --all-namespaces --tail=50
flux logs --kind=HelmRelease --name=cert-manager -n cert-manager

Force full reconciliation of everything

flux reconcile source git flux-system
for ks in flux-system infrastructure-controllers infrastructure-configs observability network-policies; do
  flux reconcile ks $ks --with-source
done