Skip to content

Day-2 Operations

Common tasks for managing the running cluster.

Check Status

# Flux Kustomizations (are all deployments healthy?)
flux get ks -A

# HelmReleases (are all charts deployed?)
flux get hr -A

# Git + Helm sources
flux get sources git -A
flux get sources helm -A

# All pods across all namespaces
kubectl get pods -A

Force Reconciliation

# Pull latest Git changes immediately (don't wait for 1h interval)
flux reconcile source git flux-system

# Re-apply a Kustomization
flux reconcile ks infrastructure-controllers --with-source
flux reconcile ks observability --with-source

# Re-apply a specific HelmRelease
flux reconcile hr cert-manager -n cert-manager
flux reconcile hr kube-prometheus-stack -n monitoring

Upgrade a Helm Chart

  1. Edit the version: field in the relevant HelmRelease YAML
  2. Commit and push
  3. Flux reconciles automatically (or force with flux reconcile hr ...)
# Example: upgrade ingress-nginx
# Edit infrastructure/controllers/ingress-nginx.yaml
#   version: "4.x"  →  version: "4.11.0"
git add . && git commit -m "chore: upgrade ingress-nginx to 4.11.0" && git push

Suspend / Resume

Suspend to temporarily stop Flux from reconciling (useful for debugging):

# Suspend a Kustomization
flux suspend ks infrastructure-controllers

# Suspend a HelmRelease
flux suspend hr cert-manager -n cert-manager

# Resume
flux resume ks infrastructure-controllers
flux resume hr cert-manager -n cert-manager

View Logs

# All Flux controller logs
flux logs --all-namespaces --tail=50

# Specific HelmRelease logs
flux logs --kind=HelmRelease --name=cert-manager -n cert-manager

# Specific pod logs
kubectl logs -n cert-manager deployment/cert-manager --tail=50
kubectl logs -n monitoring deployment/kube-prometheus-stack-grafana --tail=50

Restart the Cluster

# Stop (preserves state)
minikube stop

# Start again
minikube start

# Flux will automatically reconcile on startup
flux get ks -A --watch

Full Reset

# Delete everything — cluster, certs, all local state
minikube delete

# Start fresh (see Bootstrap guide)
minikube start --driver=docker --network-plugin=cni --addons=metrics-server
cilium install && cilium status --wait
flux bootstrap github ...

After minikube delete: restore sops-gpg

The sops-gpg secret is deleted with the cluster. Restore it before Flux reconciles:

gpg --export-secret-keys --armor CF7169E94481219626AF34290D18AE7E58FB2D45 | \
  kubectl create secret generic sops-gpg -n flux-system --from-file=sops.asc=/dev/stdin

minikube tunnel

The tunnel must be running for ingress to work on macOS. Run in a dedicated terminal:

sudo minikube tunnel

If services become unreachable, check if the tunnel is still running:

ps aux | grep "minikube tunnel"

# Verify external IP is 127.0.0.1
kubectl get svc ingress-nginx-controller -n ingress-nginx

Secrets Management (SOPS)

# Edit an existing secret (decrypts → opens $EDITOR → re-encrypts on save)
sops identity/keycloak-secret.sops.yaml
sops identity/realm-config.sops.yaml
sops observability/grafana-admin-secret.sops.yaml
sops observability/grafana-oidc-secret.sops.yaml

# Verify a file decrypts correctly
sops --decrypt identity/keycloak-secret.sops.yaml

# After editing, commit and push — Flux auto-applies within 1h
git add <file> && git commit -m "chore: rotate secret" && git push

# Force immediate apply
flux reconcile kustomization identity --with-source
flux reconcile kustomization observability --with-source

Full guide: docs/identity/sops-secrets.md

Keycloak Operations

# View logs
kubectl logs -n identity deployment/keycloak --tail=50 -f

# Restart (reimports realm from realm-config.sops.yaml)
kubectl rollout restart deployment/keycloak -n identity
kubectl rollout status deployment/keycloak -n identity

# Check admin console is reachable (TLS — use -k if CA not trusted locally)
curl -sk -o /dev/null -w "%{http_code}" https://keycloak.local/realms/master
# Expected: 200

# Check in-cluster DNS works (used by Grafana for token exchange)
kubectl exec -n monitoring deploy/kube-prometheus-stack-grafana -c grafana -- \
  curl -s -o /dev/null -w "%{http_code}" \
  http://keycloak.identity.svc.cluster.local/realms/homekube/.well-known/openid-configuration
# Expected: 200

To update realm config (add clients, users, roles): 1. sops identity/realm-config.sops.yaml — edit the JSON 2. Commit and push 3. kubectl rollout restart deployment/keycloak -n identity

Cilium Operations

# Check health
cilium status

# Open Hubble flow UI
cilium hubble ui

# Run connectivity tests
cilium connectivity test

# Upgrade Cilium
cilium upgrade

Pre-commit Hooks

Hooks are installed once per local clone and run automatically on every commit.

# Install prek (if not already installed)
brew install prek

# Wire hooks into git (run once after cloning)
prek install                          # pre-commit stage
prek install --hook-type commit-msg   # commit-msg stage (commitlint)

# Run all hooks manually against every file
prek run --all-files

# Run a single hook by ID
prek run gitleaks
prek run yamlfmt

If yamlfmt reformats files during a commit, it will stage the changes and exit with a non-zero code — just run git add and commit again.

See ADR-013 for the full hook list and rationale.

Notifications (GitHub Commit Status)

Flux posts ✅/❌ status checks to GitHub after each reconciliation cycle.

# Check Provider and Alert are healthy
kubectl get provider,alert -n flux-system

# View notification-controller logs (sent events, API errors)
kubectl logs -n flux-system deployment/notification-controller --tail=50

# Force a reconciliation to trigger a new status post
flux reconcile source git flux-system

Rotate the GitHub token

# 1. Edit the encrypted secret (opens $EDITOR, re-encrypts on save)
sops notifications/github-token.sops.yaml

# 2. Commit and push
git add notifications/github-token.sops.yaml
git commit -m "chore: rotate github notifications token"
git push

# 3. Force Flux to pick up the new secret immediately
flux reconcile kustomization notifications --with-source

Troubleshoot missing status checks

# Confirm the Provider is not suspended and shows no errors
kubectl describe provider github-status -n flux-system

# Check the token has repo:status (Commit statuses) scope
# GitHub → Settings → Developer settings → Personal access tokens → your token

# Verify the secret has the correct key name
kubectl get secret github-token -n flux-system -o jsonpath='{.data}' | \
  python3 -c "import sys,json; print(list(json.load(sys.stdin).keys()))"
# Expected: ['token']

See docs/notifications/github-status.md and ADR-014.

Image Automation

The image-reflector-controller and image-automation-controller are installed but idle until ImageRepository, ImagePolicy, and ImageUpdateAutomation resources are added to image-automation/.

# Verify both image controllers are running
kubectl get deploy -n flux-system | grep image

# Check the image-automation Kustomization is healthy
flux get ks image-automation

# Once resources are added — check image scan results
kubectl get imagerepository -n flux-system
kubectl get imagepolicy -n flux-system

# Force an immediate image scan
flux reconcile image repository <name> -n flux-system

# Check automation commit log
kubectl get imageupdateautomation -n flux-system

When image automation is active, tag updates are pushed to the flux/image-updates branch on GitHub for review. Merge to main to deploy.

See ADR-015 for the full setup guide including ImageRepository, ImagePolicy, and ImageUpdateAutomation resource templates.

Useful kubectl Shortcuts

# Get all resources in a namespace
kubectl get all -n cert-manager
kubectl get all -n ingress-nginx
kubectl get all -n monitoring

# Describe a failing pod
kubectl describe pod <pod-name> -n <namespace>

# Execute into a pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Port-forward a service directly (bypass ingress)
kubectl port-forward svc/kube-prometheus-stack-grafana 3000:80 -n monitoring
# Open: http://localhost:3000