Complete Setup Guide¶
Step-by-step instructions to deploy your entire K3s GitOps infrastructure from scratch.
Phase 1: Preparation (Friday Evening - 1 Hour)¶
Prerequisites¶
Ensure you have: - GitHub account with personal access token - Git installed locally - kubectl, kustomize, k3d, flux CLIs installed - SSH access to all Proxmox nodes (r2d2, butthole-ice-cream, windows, schwifty)
Install Required Tools¶
# macOS
brew install flux kustomize k3d kubectl
# Linux (Ubuntu/Debian)
curl -s https://fluxcd.io/install.sh | sudo bash
curl -s https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh | bash
curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash
kubectl version --client # Should already be installed
Create GitHub Repository¶
# 1. On GitHub: Create new repo "homelab-gitops" (empty)
# 2. Clone locally
git clone https://github.com/yourusername/homelab-gitops
cd homelab-gitops
# 3. Create folder structure
mkdir -p clusters/{local,production,staging}
mkdir -p infrastructure/{base,production,local}
mkdir -p apps/{base,production,local}
mkdir -p docs scripts
# 4. Create initial files
touch clusters/local/.gitkeep
touch clusters/production/.gitkeep
touch .gitignore
# 5. Add docs
# Copy README.md, ARCHITECTURE.md, SERVICE-CATALOG.md to repo root/docs
# 6. First commit
git add .
git commit -m "chore: initial structure"
git push -u origin main
Phase 2: Local Development (Friday Evening - 2 Hours)¶
Create Local k3d Cluster¶
# Create cluster matching production topology
k3d cluster create local \
--servers 1 \
--agents 1 \
--port "8080:80@loadbalancer" \
--port "8443:443@loadbalancer" \
--volume "/tmp/k3d-storage:/var/lib/rancher/k3s/storage" \
--wait
# Verify
kubectl cluster-info
kubectl get nodes
Bootstrap Flux on Local Cluster¶
# Set GitHub credentials
export GITHUB_TOKEN=your_personal_access_token
export GITHUB_USER=yourusername
# Bootstrap Flux (creates deploy key automatically)
kubectl config use-context k3d-local
flux bootstrap github \
--owner=$GITHUB_USER \
--repo=homelab-gitops \
--branch=main \
--path=clusters/local \
--personal
Verify Flux is Running¶
# Check Flux controllers
kubectl -n flux-system get pods
# Watch reconciliation
flux get kustomizations --all-namespaces --watch
# Should see: flux-system (Reconcile succeeded)
Phase 3: Proxmox VM Provisioning (Saturday Morning - 2 Hours)¶
Create VMs on r2d2 (192.168.1.10)¶
SSH into r2d2:
ssh root@192.168.1.10
# Create VMs (use Proxmox UI or script below)
# VMs for K3s:
# - leia (100): 4c, 6GB RAM, 50GB disk
# - luke-1 (101): 4c, 6GB RAM, 100GB disk
# - luke-2 (102): 4c, 6GB RAM, 100GB disk
# Example using Proxmox CLI:
qm create 100 --name k3s-leia --cores 4 --memory 6144 --scsihw virtio-scsi-pci
qm set 100 --scsi0 local-lvm:50
qm set 100 --net0 virtio,bridge=vmbr0
qm start 100
Create VMs on butthole-ice-cream (192.168.1.20)¶
ssh root@192.168.1.20
# VMs:
# - obi-wan (110): 2c, 3GB RAM, 30GB disk
# - yoda-1 (111): 2c, 3GB RAM, 50GB disk
qm create 110 --name k3s-obi-wan --cores 2 --memory 3072 --scsihw virtio-scsi-pci
qm set 110 --scsi0 local-lvm:30
qm set 110 --net0 virtio,bridge=vmbr0
qm start 110
Create VMs on windows (192.168.1.30)¶
ssh root@192.168.1.30
# VMs:
# - lando (120): 2c, 4GB RAM, 40GB disk
qm create 120 --name k3s-lando --cores 2 --memory 4096 --scsihw virtio-scsi-pci
qm set 120 --scsi0 local-lvm:40
qm set 120 --net0 virtio,bridge=vmbr0
qm start 120
Create VMs on schwifty (10.0.2.30)¶
ssh root@10.0.2.30
# VMs:
# - rick (100): 6c, 12GB RAM, 100GB disk
# - morty-1 (101): 6c, 12GB RAM, 300GB disk
# - morty-2 (102): 4c, 8GB RAM, 400GB disk
qm create 100 --name k3s-rick --cores 6 --memory 12288 --scsihw virtio-scsi-pci
qm set 100 --scsi0 local-lvm:100
qm set 100 --net0 virtio,bridge=vmbr0
qm start 100
# (repeat for morty-1 and morty-2)
Phase 4: OS Setup on VMs (Saturday Afternoon - 2 Hours)¶
Boot and Configure Each VM¶
For each VM (access via console or SSH once booted):
# Login to VM (default: root/proxmox or ubuntu/ubuntu)
# Configure hostname
hostnamectl set-hostname k3s-leia
echo "192.168.1.100 k3s-leia" >> /etc/hosts
# Update OS
apt update && apt upgrade -y
# Install packages needed by K3s
apt install -y curl wget git vim htop
# Set up sudo for non-root user (optional)
# Create user, add to sudoers
# Enable IP forwarding (required for K3s networking)
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
sysctl -p
Verify Network Access¶
# From your local machine, verify you can reach all VMs
ping 192.168.1.100 # leia
ping 192.168.1.110 # obi-wan
ping 192.168.1.120 # lando
ping 10.0.2.100 # rick
Phase 5: K3s Installation (Saturday Evening - 3 Hours)¶
Install Primary Master (leia)¶
# SSH into leia VM
ssh root@192.168.1.100
# Install K3s server (master)
curl -sfL https://get.k3s.io | sh -
sudo /usr/local/bin/k3s server &
# Wait ~30 seconds for startup
sleep 30
# Get node token (needed to join other servers)
TOKEN=$(sudo cat /var/lib/rancher/k3s/server/node-token)
echo "Token: $TOKEN" # Save this!
# Verify K3s is running
sudo k3s kubectl get nodes
Install Secondary Masters (obi-wan, lando)¶
# SSH into obi-wan VM
ssh root@192.168.1.110
# Join as secondary master
K3S_URL=https://192.168.1.100:6443 \
K3S_TOKEN=<TOKEN_FROM_LEIA> \
curl -sfL https://get.k3s.io | sh -
# Verify (run on leia)
sudo k3s kubectl get nodes
# Should show: leia, obi-wan, lando (3 masters)
Install Worker Nodes (luke-1, luke-2, yoda-1, morty-1, morty-2)¶
# SSH into luke-1 VM
ssh root@192.168.1.101
# Join as agent (worker)
K3S_URL=https://192.168.1.100:6443 \
K3S_TOKEN=<TOKEN_FROM_LEIA> \
INSTALL_K3S_EXEC="agent" \
curl -sfL https://get.k3s.io | sh -
# Repeat for luke-2, yoda-1
Verify Cluster Health¶
# From leia, check all nodes joined
sudo k3s kubectl get nodes -o wide
# Expected output:
# NAME STATUS ROLES CPU MEMORY
# leia Ready control-plane,master 4c 6Gi
# obi-wan Ready control-plane,master 2c 3Gi
# lando Ready control-plane,master 2c 4Gi
# luke-1 Ready <none> 4c 6Gi
# luke-2 Ready <none> 4c 6Gi
# yoda-1 Ready <none> 2c 3Gi
Copy kubeconfig Locally¶
# From leia, copy config
sudo cat /etc/rancher/k3s/k3s.yaml > /tmp/k3s-prod.yaml
# Download to local machine
scp root@192.168.1.100:/tmp/k3s-prod.yaml ~/.kube/config-prod
# Update server IP in file (change 127.0.0.1 to 192.168.1.100)
sed -i 's/127.0.0.1/192.168.1.100/g' ~/.kube/config-prod
# Add to kubeconfig
cat ~/.kube/config-prod >> ~/.kube/config
# Add context alias
kubectl config rename-context k3s-prod-leia k3s-prod
# Test access from local machine
kubectl --context=k3s-prod get nodes
Phase 6: Infrastructure Deployment (Sunday Morning - 2 Hours)¶
Add Infrastructure to Git¶
Create the base infrastructure manifests in your repo:
cd homelab-gitops
# Create infrastructure manifests
mkdir -p infrastructure/base/{traefik,sablier,longhorn,cloudflare-tunnel,monitoring,storage}
# Create manifests in each directory
# See previous documentation for complete YAML files
git add infrastructure/
git commit -m "feat: add infrastructure manifests"
git push
Create Clusters Config¶
# clusters/production/kustomization.yaml
cat > clusters/production/kustomization.yaml <<'EOF'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../infrastructure/base
- ../../infrastructure/production
- ../../apps/base
- ../../apps/production
namespace: default
EOF
git add clusters/production/
git commit -m "feat: add production cluster config"
git push
Bootstrap Flux on Production¶
# From local machine
kubectl config use-context k3s-prod
flux bootstrap github \
--owner=$GITHUB_USER \
--repo=homelab-gitops \
--branch=main \
--path=clusters/production \
--personal
Monitor Flux Deployment¶
# Watch Flux reconcile infrastructure
flux get kustomizations --all-namespaces --watch
# Check Flux logs
flux logs --follow
# Verify resources deployed
kubectl get pods -A
kubectl get svc -A
kubectl get ingress -A
Wait for Longhorn¶
# Longhorn takes ~3 minutes to deploy
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/instance=longhorn \
-n longhorn-system --timeout=300s
# Verify storageclass
kubectl get storageclass
# Should show: local-path, longhorn
Phase 7: Verify Infrastructure (Sunday Afternoon - 1 Hour)¶
Check All Nodes¶
kubectl get nodes -o wide
kubectl top nodes
kubectl describe nodes
Check Core Services¶
# Traefik
kubectl -n traefik get pods,svc
# Sablier
kubectl -n default get pods | grep sablier
# Longhorn
kubectl -n longhorn-system get pods
# Monitoring (Prometheus/Grafana)
kubectl -n monitoring get pods,svc
Test Persistence¶
# Create test PVC
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
storageClassName: longhorn
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: test
image: alpine
command: ['sh', '-c', 'echo "Test data" > /data/test.txt && sleep 3600']
volumeMounts:
- name: storage
mountPath: /data
volumes:
- name: storage
persistentVolumeClaim:
claimName: test-pvc
EOF
# Check PVC status
kubectl get pvc
kubectl describe pvc test-pvc
# Verify pod wrote data
kubectl exec test-pod -- cat /data/test.txt
# Output: "Test data"
# Cleanup
kubectl delete pod test-pod pvc test-pvc
Phase 8: Deploy First Service - Moodle (Optional - Sunday Evening)¶
Add Moodle to Git¶
# Create Moodle manifests
mkdir -p apps/base/moodle/{00-database,01-storage,02-moodle-app,03-monitoring}
# Copy YAML files from SERVICE-CATALOG.md
# Create secret
kubectl create secret generic moodle-db-secret \
--from-literal=password=$(openssl rand -base64 32) \
-n moodle \
--dry-run=client \
-o yaml > apps/base/moodle/secret.yaml
git add apps/base/moodle/
git commit -m "feat: add Moodle deployment"
git push
Verify Moodle Deployment¶
# Wait for pods
kubectl wait --for=condition=ready pod \
-l app=moodle \
-n moodle --timeout=300s
# Check pods
kubectl -n moodle get pods
# Check PVCs
kubectl -n moodle get pvc
# View logs
kubectl -n moodle logs -f deployment/moodle
Test Moodle¶
# Port-forward Moodle service
kubectl port-forward -n moodle svc/moodle 8080:80 &
# Open browser
# http://localhost:8080
Phase 9: Set Up Cloudflare Tunnel (Monday)¶
Create Cloudflare Tunnel¶
# On Cloudflare dashboard:
# 1. Create Tunnel → Named: "homelab-prod"
# 2. Download credentials JSON
# 3. Copy token
# Create secret in cluster
kubectl create secret generic cloudflare-tunnel-secret \
--from-file=tunnel-credentials.json=<path-to-json> \
-n cloudflare \
--dry-run=client \
-o yaml > infrastructure/base/cloudflare-tunnel/secret.yaml
# Update infrastructure/base/cloudflare-tunnel/config.yaml with:
# - Tunnel ID
# - Hostname mappings (moodle.yourdomain.com → http://traefik:80)
git add infrastructure/base/cloudflare-tunnel/
git commit -m "feat: set up Cloudflare Tunnel"
git push
# Flux auto-deploys
Verify External Access¶
# Wait for cloudflared pod
kubectl -n cloudflare get pods
# Test from external machine
curl https://moodle.yourdomain.com
# Should work!
Post-Deployment Checklist¶
- [ ] All K3s nodes reporting Ready
- [ ] Longhorn replicas working (test PVC)
- [ ] Traefik responding to requests
- [ ] Cloudflare Tunnel connected
- [ ] Services accessible via domain names
- [ ] Prometheus collecting metrics
- [ ] Grafana dashboards loading
- [ ] Backup CronJobs running
- [ ] DNS resolving correctly
Backup & Restore¶
Backup etcd (Control Plane Database)¶
# Backup etcd from leia
ssh root@192.168.1.100
sudo k3s etcd-snapshot save -n etcd-backup-20250101
# Verify
sudo k3s etcd-snapshot list
# Copy to external storage
scp /var/lib/rancher/k3s/server/db/snapshots/* backup@backuphost:/backups/
Restore etcd¶
# If cluster breaks, restore from backup on leia
sudo k3s server --cluster-reset-restore-path=/path/to/snapshot.db
# Restart K3s
sudo systemctl restart k3s
Troubleshooting¶
Nodes not joining cluster¶
# On worker node, check logs
sudo journalctl -u k3s-agent -f
# Common issues:
# - Wrong token
# - Firewall blocking 6443
# - Network issue
# Check connectivity
ping 192.168.1.100 (leia IP)
Pods stuck in Pending¶
# Check node resources
kubectl top nodes
kubectl describe nodes
# Check PVC
kubectl describe pvc
# Common causes:
# - Not enough disk space
# - PVC not bound
# - Resource requests too high
Flux not syncing¶
# Check Flux status
flux get all
# Check logs
flux logs --follow
# Common issues:
# - GitHub token expired
# - Deploy key removed from repo
# - Kustomize build errors
Next Steps¶
- Add more services: Follow SERVICE-CATALOG.md
- Set up monitoring: Configure Grafana dashboards
- Enable backups: Configure external backup storage
- Add users: Create RBAC policies
- Implement GitOps: Use PRs for all changes
- Monitor costs: Track resource usage
Timeline Summary¶
| Phase | Tasks | Duration |
|---|---|---|
| 1 | Prep, tools, Git setup | 1h |
| 2 | k3d local cluster, Flux bootstrap | 2h |
| 3 | Proxmox VM provisioning | 2h |
| 4 | OS setup on VMs | 2h |
| 5 | K3s installation (all nodes) | 3h |
| 6 | Infrastructure deployment | 2h |
| 7 | Verification & testing | 1h |
| 8 | Moodle deployment | 1h |
| 9 | Cloudflare Tunnel setup | 1h |
| Total | Complete production setup | ~15 hours |
Estimated timeline: Friday evening → Sunday evening (two weekends) to have a fully operational, production-grade K3s cluster with persistent storage, GitOps automation, and your first service (Moodle) running.
Good luck! 🚀