Architecture Overview¶
Complete system architecture for your K3s multi-cluster setup.
Cluster Topology¶
┌─────────────────────────────────────────────────────────────────┐
│ HOMELAB GITOPS SETUP │
└─────────────────────────────────────────────────────────────────┘
YOUR MACHINE (Laptop/Workstation)
└── k3d Cluster (Local Dev)
├── 1 Server node (k3d-server)
├── 1-2 Agent nodes
└── Used for: Testing, development, CI/CD
↓ (git push)
GITHUB REPOSITORY (Source of Truth)
├── clusters/local
├── clusters/production
└── clusters/staging (not used - all on local)
↓ (auto-sync via Flux)
PRODUCTION CLUSTERS
┌──────────────────────────────────────────────────────────────────┐
│ CLUSTER 1: "Rebellion" (192.168.1.x) │
│ Proxmox: r2d2 + butthole-ice-cream + windows │
├──────────────────────────────────────────────────────────────────┤
│ │
│ Master Nodes (High Availability Control Plane): │
│ ├─ leia (r2d2, 4c/6GB, 192.168.1.100) │
│ ├─ obi-wan (butthole-ice-cream, 2c/3GB, 192.168.1.110) │
│ └─ lando (windows, 2c/4GB, 192.168.1.120) │
│ │
│ Worker Nodes: │
│ ├─ luke-1 (r2d2, 4c/6GB, 192.168.1.101) │
│ ├─ luke-2 (r2d2, 4c/6GB, 192.168.1.102) │
│ └─ yoda-1 (butthole-ice-cream, 2c/3GB, 192.168.1.111) │
│ │
│ Cluster Resources: │
│ ├─ Total CPU: 22 cores │
│ ├─ Total RAM: 60 GB │
│ ├─ Storage: ~400GB replicated via Longhorn │
│ └─ Network: Bridged to vmbr0, gateway 192.168.1.1 │
│ │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ CLUSTER 2: "Empire" (10.0.2.x) │
│ Proxmox: schwifty (isolated network 10.0.2.0/24) │
├──────────────────────────────────────────────────────────────────┤
│ │
│ Master/Worker Nodes: │
│ ├─ rick (schwifty, 6c/12GB, 10.0.2.100) │
│ ├─ morty-1 (schwifty, 6c/12GB, 10.0.2.101) │
│ └─ morty-2 (schwifty, 4c/8GB, 10.0.2.102) │
│ │
│ Cluster Resources: │
│ ├─ Total CPU: 16 cores │
│ ├─ Total RAM: 47 GB │
│ ├─ Storage: ~800GB replicated via Longhorn │
│ └─ Network: Isolated (10.0.2.x), Tailscale mesh tunnel │
│ │
└──────────────────────────────────────────────────────────────────┘
↑ Tailscale Mesh ↑
Clusters communicate securely over VPN
Traffic Flow Architecture¶
INTERNET TRAFFIC
↓
Cloudflare Global Network
↓
Cloudflare Tunnel (Secure WebSocket from cluster → CF)
↓
Traefik Ingress Controller Pod (runs in K3s)
├─ Receives HTTPS traffic for all services
├─ Terminates TLS (or passes through)
└─ Routes to backend services via HTTP/TCP
↓
Sablier Middleware (Optional - Scale to Zero)
├─ Intercepts HTTP requests
├─ If service scaled to 0: Buffers request ~10s while pod starts
└─ If service running: Passes through immediately
↓
Service DNS (Kubernetes DNS: CoreDNS)
├─ myapp.default → Service (ClusterIP)
├─ Service → Pod endpoints
└─ Pods communicate directly (no additional routing)
↓
Application Pods
├─ Pod 1: moodle (192.168.1.101)
├─ Pod 2: moodle (192.168.1.102)
└─ Pod 3: moodle (192.168.1.103)
↓
PostgreSQL StatefulSet
├─ Single pod or replicated
└─ PersistentVolume (Longhorn, 2-3 replicas)
Storage Architecture¶
Persistent Volume Lifecycle¶
CREATION (First deployment)
↓
PVC created → Longhorn finds available nodes → Allocates space
↓
Longhorn creates replicas (default: 2)
├─ Replica 1: Node A (e.g., leia)
├─ Replica 2: Node B (e.g., luke-1)
└─ Replica 3: Optional (Node C, e.g., luke-2) for critical data
↓
POD MOUNTS & WRITES DATA
↓
All writes go through Longhorn → Replicated to replica nodes
↓
Data lives on Proxmox local storage (NVMe or HDD)
↓
NODE FAILURE SCENARIO
↓
If leia goes down:
├─ Longhorn detects node down
├─ Promotes replica from luke-1 to primary
├─ Pod reschedules to healthy node (luke-2 or obi-wan)
└─ Data is available (zero data loss)
↓
BACKUPS
↓
CronJob runs daily (2 AM)
├─ PostgreSQL: pg_dump → gzip → Backup PVC
├─ Files: tar + rsync → External storage
└─ Snapshots: Longhorn auto-snapshots every 6 hours
Storage Classes¶
┌─ local-path (Bootstrap)
│ ├─ Where: /var/lib/rancher/k3s/storage per node
│ ├─ Replicas: None (single node)
│ ├─ Use for: Traefik cache, temporary data, metrics
│ ├─ Data loss: If node dies, data is lost
│ └─ Deploy timeline: Day 1 (ready immediately)
│
└─ longhorn (Production)
├─ Where: Distributed across nodes + underlying disks
├─ Replicas: 2-3 copies across nodes
├─ Use for: Databases, file storage, critical configs
├─ Data loss: Survives 1-2 node failures
└─ Deploy timeline: Day 2-3 (after cluster stable)
Network Segmentation¶
Kubernetes Networks (All automatic)¶
Pod Network (overlay):
├─ All pods can reach each other directly
├─ IP range: Allocated by K3s (typically 10.42.x.x)
└─ Isolated from Proxmox network
Service Network (ClusterIP):
├─ Internal DNS: servicename.namespace.svc.cluster.local
├─ Automatically load-balances to pod IPs
└─ Only reachable from inside cluster (no external access)
Ingress Network (External):
├─ External traffic enters via Traefik Ingress
├─ Routes based on hostname (moodle.yourdomain.com)
└─ Gets NodePort internally (e.g., 30123)
Proxmox Network Integration¶
Proxmox vmbr0 (Bridge)
├─ Cluster 1 (192.168.1.0/24): r2d2, butthole-ice-cream, windows
│ ├─ Gateway: 192.168.1.1
│ ├─ K3s VMs: 192.168.1.100-199
│ └─ Pod overlay: 10.42.x.x (inside VMs)
│
└─ Cluster 2 (10.0.2.0/24): schwifty only (isolated)
├─ Gateway: 10.0.2.1
├─ K3s VMs: 10.0.2.100-199
└─ Pod overlay: 10.43.x.x (inside VMs)
Tailscale VPN (Mesh):
├─ Secure tunnel between clusters
├─ Both clusters can reach each other's services
└─ Used for backup/disaster recovery
High Availability Strategy¶
Master Node Quorum¶
3 Master Nodes (leia, obi-wan, lando)
├─ Embedded etcd (consensus database)
├─ Requires 2/3 online for cluster availability
└─ Architecture:
├─ Node 1 (leia) down: Cluster OK (2/3 online)
├─ Node 2 (obi-wan) down: Cluster OK (2/3 online)
└─ Node 3 (lando) down: Cluster OK (2/3 online)
└─ Any 2 down: Cluster BROKEN (wait for recovery)
Data Persistence:
├─ etcd stored on Longhorn
├─ Automatic snapshots every 6 hours
└─ Full backup daily
Pod Replication¶
Example: Moodle Service
Deployment: moodle (replicas: 3)
├─ Pod 1: Runs on leia
├─ Pod 2: Runs on luke-1
└─ Pod 3: Runs on luke-2
If leia fails:
├─ Pod 1 reschedules to obi-wan (or other healthy node)
├─ Service automatically discovers new Pod 1 location
├─ Traffic redirects within 30 seconds
└─ Pod 2 + 3 keep serving (zero downtime)
Storage attached to all pods:
├─ /moodledata PVC (Longhorn, ReadWriteMany)
├─ All pods read/write same files
└─ Longhorn ensures 2+ replicas exist
Automatic Failover¶
Pod crashes:
├─ kubelet detects pod death
├─ Deployment controller sees fewer replicas than desired
├─ New pod scheduled on healthy node
└─ Online within 30-60 seconds
Node crashes:
├─ API server detects heartbeat loss (40 seconds)
├─ Pods are drained and rescheduled
├─ Longhorn promotes replicas from other nodes
└─ Online within 60-120 seconds
Longhorn replica fails:
├─ Longhorn detects replica lag
├─ New replica allocated on healthy node
├─ Data synchronized in background
└─ Always maintains 2+ replicas
GitOps Deployment Flow¶
1. Developer makes changes locally
git checkout -b feature/my-change
# Edit YAML files
git push
2. GitHub Pull Request
├─ CI runs: lint YAML, check Kustomize builds
├─ Human review
└─ Merge to main
3. Flux CD watches GitHub (every 10 seconds)
├─ Detects main branch updated
├─ Pulls latest YAML from repo
├─ Builds Kustomize files
└─ Reconciles cluster state
4. Cluster state updates
├─ New pods start
├─ Services reconfigure
├─ Ingress rules update
└─ Metrics recorded
5. Monitoring alerts
├─ Prometheus scrapes metrics
├─ Grafana shows dashboard
└─ AlertManager notifies on errors
6. Complete
├─ Git is source of truth
├─ Cluster matches Git exactly
└─ All changes auditable
Resource Allocation¶
Cluster 1 (Rebellion) - Detailed Breakdown¶
PROXMOX HOST: r2d2 (Physical: 12c/16t, 30GB RAM)
├─ Allocated to VMs: 12c, 18GB RAM
│ ├─ leia (4c/6GB)
│ ├─ luke-1 (4c/6GB)
│ ├─ luke-2 (4c/6GB)
│ └─ Remaining on Proxmox: 0c, 12GB RAM (Proxmox overhead)
│
├─ Inside leia VM (192.168.1.100):
│ ├─ K3s control plane: 1c, 2GB RAM
│ ├─ Traefik pod: 0.1c, 128MB RAM
│ ├─ Available for workloads: 2.9c, 3.8GB RAM
│ └─ Local storage: 50GB
│
├─ Inside luke-1 VM (192.168.1.101):
│ ├─ kubelet + system: 0.1c, 512MB RAM
│ ├─ Moodle pod: 1c, 1GB RAM
│ ├─ Available for more workloads: 2.9c, 4.4GB RAM
│ └─ Local storage: 150GB
│
└─ Inside luke-2 VM (192.168.1.102):
├─ kubelet + system: 0.1c, 512MB RAM
├─ Moodle pod: 1c, 1GB RAM
├─ Available for more workloads: 2.9c, 4.4GB RAM
└─ Local storage: 150GB
PROXMOX HOST: butthole-ice-cream (Physical: 6c/12t, 15GB RAM)
├─ Allocated to VMs: 4c, 6GB RAM
│ ├─ obi-wan (2c/3GB)
│ ├─ yoda-1 (2c/3GB)
│ └─ Remaining on Proxmox: 2c, 9GB RAM
│
├─ Inside obi-wan VM (192.168.1.110):
│ ├─ K3s control plane: 1c, 2GB RAM
│ ├─ Available for workloads: 1c, 1GB RAM
│ └─ Local storage: 30GB SSD
│
└─ Inside yoda-1 VM (192.168.1.111):
├─ kubelet + system: 0.1c, 256MB RAM
├─ Available for workloads: 1.9c, 2.7GB RAM
└─ Local storage: 50GB SSD + 200GB HDD
PROXMOX HOST: windows (Physical: 4c/8t, 15GB RAM)
├─ Allocated to VMs: 2c, 4GB RAM
│ ├─ lando (2c/4GB)
│ └─ Remaining on Proxmox: 2c, 11GB RAM
│
└─ Inside lando VM (192.168.1.120):
├─ K3s control plane: 1c, 2GB RAM
├─ Available for workloads: 1c, 2GB RAM
└─ Local storage: 40GB
Cluster 2 (Empire) - Detailed Breakdown¶
PROXMOX HOST: schwifty (Physical: 12c/24t, 47GB RAM)
├─ Allocated to VMs: 16c, 32GB RAM
│ ├─ rick (6c/12GB)
│ ├─ morty-1 (6c/12GB)
│ ├─ morty-2 (4c/8GB)
│ └─ Remaining on Proxmox: 8c, 15GB RAM (Docker Compose, monitoring)
│
├─ Inside rick VM (10.0.2.100):
│ ├─ K3s control plane: 1c, 2GB RAM
│ ├─ Available for workloads: 5c, 10GB RAM
│ └─ Local storage: 100GB
│
├─ Inside morty-1 VM (10.0.2.101):
│ ├─ kubelet + system: 0.1c, 512MB RAM
│ ├─ Available for workloads: 5.9c, 11.5GB RAM
│ └─ Local storage: 300GB
│
└─ Inside morty-2 VM (10.0.2.102):
├─ kubelet + system: 0.1c, 256MB RAM
├─ Available for workloads: 3.9c, 7.7GB RAM
└─ Local storage: 400GB (storage-optimized)
Summary: System Design Principles¶
✅ High Availability: 3 masters, pod replication, automatic failover
✅ Data Durability: Multi-replica persistent storage (Longhorn)
✅ Scalability: Auto-scaling pods based on CPU/memory
✅ GitOps: All infrastructure as code, versioned in Git
✅ Observability: Prometheus metrics, Grafana dashboards, centralized logs
✅ Security: Network policies, RBAC, encrypted secrets (via SealedSecrets)
✅ Maintainability: Clear documentation, automation via Flux
✅ Cost Efficiency: Using existing hardware (3x Proxmox nodes)
Next Steps¶
- SETUP.md: Deploy this architecture
- SERVICE-CATALOG.md: Add your services
- QUICK-REFERENCE.md: Common operations and troubleshooting