Day-2 Operations¶
Common tasks for managing the running cluster.
Check Status¶
# Flux Kustomizations (are all deployments healthy?)
flux get ks -A
# HelmReleases (are all charts deployed?)
flux get hr -A
# Git + Helm sources
flux get sources git -A
flux get sources helm -A
# All pods across all namespaces
kubectl get pods -A
Force Reconciliation¶
# Pull latest Git changes immediately (don't wait for 1h interval)
flux reconcile source git flux-system
# Re-apply a Kustomization
flux reconcile ks infrastructure-controllers --with-source
flux reconcile ks observability --with-source
# Re-apply a specific HelmRelease
flux reconcile hr cert-manager -n cert-manager
flux reconcile hr kube-prometheus-stack -n monitoring
Upgrade a Helm Chart¶
- Edit the
version:field in the relevant HelmRelease YAML - Commit and push
- Flux reconciles automatically (or force with
flux reconcile hr ...)
# Example: upgrade ingress-nginx
# Edit infrastructure/controllers/ingress-nginx.yaml
# version: "4.x" → version: "4.11.0"
git add . && git commit -m "chore: upgrade ingress-nginx to 4.11.0" && git push
Suspend / Resume¶
Suspend to temporarily stop Flux from reconciling (useful for debugging):
# Suspend a Kustomization
flux suspend ks infrastructure-controllers
# Suspend a HelmRelease
flux suspend hr cert-manager -n cert-manager
# Resume
flux resume ks infrastructure-controllers
flux resume hr cert-manager -n cert-manager
View Logs¶
# All Flux controller logs
flux logs --all-namespaces --tail=50
# Specific HelmRelease logs
flux logs --kind=HelmRelease --name=cert-manager -n cert-manager
# Specific pod logs
kubectl logs -n cert-manager deployment/cert-manager --tail=50
kubectl logs -n monitoring deployment/kube-prometheus-stack-grafana --tail=50
Restart the Cluster¶
# Stop (preserves state)
minikube stop
# Start again
minikube start
# Flux will automatically reconcile on startup
flux get ks -A --watch
Full Reset¶
# Delete everything — cluster, certs, all local state
minikube delete
# Start fresh (see Bootstrap guide)
minikube start --driver=docker --network-plugin=cni --addons=metrics-server
cilium install && cilium status --wait
flux bootstrap github ...
After minikube delete: restore sops-gpg
The sops-gpg secret is deleted with the cluster. Restore it before Flux reconciles:
minikube tunnel¶
The tunnel must be running for ingress to work on macOS. Run in a dedicated terminal:
If services become unreachable, check if the tunnel is still running:
ps aux | grep "minikube tunnel"
# Verify external IP is 127.0.0.1
kubectl get svc ingress-nginx-controller -n ingress-nginx
Secrets Management (SOPS)¶
# Edit an existing secret (decrypts → opens $EDITOR → re-encrypts on save)
sops identity/keycloak-secret.sops.yaml
sops identity/realm-config.sops.yaml
sops observability/grafana-admin-secret.sops.yaml
sops observability/grafana-oidc-secret.sops.yaml
# Verify a file decrypts correctly
sops --decrypt identity/keycloak-secret.sops.yaml
# After editing, commit and push — Flux auto-applies within 1h
git add <file> && git commit -m "chore: rotate secret" && git push
# Force immediate apply
flux reconcile kustomization identity --with-source
flux reconcile kustomization observability --with-source
Full guide: docs/identity/sops-secrets.md
Keycloak Operations¶
# View logs
kubectl logs -n identity deployment/keycloak --tail=50 -f
# Restart (reimports realm from realm-config.sops.yaml)
kubectl rollout restart deployment/keycloak -n identity
kubectl rollout status deployment/keycloak -n identity
# Check admin console is reachable (TLS — use -k if CA not trusted locally)
curl -sk -o /dev/null -w "%{http_code}" https://keycloak.local/realms/master
# Expected: 200
# Check in-cluster DNS works (used by Grafana for token exchange)
kubectl exec -n monitoring deploy/kube-prometheus-stack-grafana -c grafana -- \
curl -s -o /dev/null -w "%{http_code}" \
http://keycloak.identity.svc.cluster.local/realms/homekube/.well-known/openid-configuration
# Expected: 200
To update realm config (add clients, users, roles):
1. sops identity/realm-config.sops.yaml — edit the JSON
2. Commit and push
3. kubectl rollout restart deployment/keycloak -n identity
Cilium Operations¶
# Check health
cilium status
# Open Hubble flow UI
cilium hubble ui
# Run connectivity tests
cilium connectivity test
# Upgrade Cilium
cilium upgrade
Pre-commit Hooks¶
Hooks are installed once per local clone and run automatically on every commit.
# Install prek (if not already installed)
brew install prek
# Wire hooks into git (run once after cloning)
prek install # pre-commit stage
prek install --hook-type commit-msg # commit-msg stage (commitlint)
# Run all hooks manually against every file
prek run --all-files
# Run a single hook by ID
prek run gitleaks
prek run yamlfmt
If yamlfmt reformats files during a commit, it will stage the changes and exit with a non-zero code — just run git add and commit again.
See ADR-013 for the full hook list and rationale.
Notifications (GitHub Commit Status)¶
Flux posts ✅/❌ status checks to GitHub after each reconciliation cycle.
# Check Provider and Alert are healthy
kubectl get provider,alert -n flux-system
# View notification-controller logs (sent events, API errors)
kubectl logs -n flux-system deployment/notification-controller --tail=50
# Force a reconciliation to trigger a new status post
flux reconcile source git flux-system
Rotate the GitHub token¶
# 1. Edit the encrypted secret (opens $EDITOR, re-encrypts on save)
sops notifications/github-token.sops.yaml
# 2. Commit and push
git add notifications/github-token.sops.yaml
git commit -m "chore: rotate github notifications token"
git push
# 3. Force Flux to pick up the new secret immediately
flux reconcile kustomization notifications --with-source
Troubleshoot missing status checks¶
# Confirm the Provider is not suspended and shows no errors
kubectl describe provider github-status -n flux-system
# Check the token has repo:status (Commit statuses) scope
# GitHub → Settings → Developer settings → Personal access tokens → your token
# Verify the secret has the correct key name
kubectl get secret github-token -n flux-system -o jsonpath='{.data}' | \
python3 -c "import sys,json; print(list(json.load(sys.stdin).keys()))"
# Expected: ['token']
See docs/notifications/github-status.md and ADR-014.
Image Automation¶
The image-reflector-controller and image-automation-controller are installed but idle until ImageRepository, ImagePolicy, and ImageUpdateAutomation resources are added to image-automation/.
# Verify both image controllers are running
kubectl get deploy -n flux-system | grep image
# Check the image-automation Kustomization is healthy
flux get ks image-automation
# Once resources are added — check image scan results
kubectl get imagerepository -n flux-system
kubectl get imagepolicy -n flux-system
# Force an immediate image scan
flux reconcile image repository <name> -n flux-system
# Check automation commit log
kubectl get imageupdateautomation -n flux-system
When image automation is active, tag updates are pushed to the flux/image-updates branch on GitHub for review. Merge to main to deploy.
See ADR-015 for the full setup guide including ImageRepository, ImagePolicy, and ImageUpdateAutomation resource templates.
Useful kubectl Shortcuts¶
# Get all resources in a namespace
kubectl get all -n cert-manager
kubectl get all -n ingress-nginx
kubectl get all -n monitoring
# Describe a failing pod
kubectl describe pod <pod-name> -n <namespace>
# Execute into a pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
# Port-forward a service directly (bypass ingress)
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring
# Open: http://localhost:3000