Architecture Overview¶

Complete system architecture for your K3s multi-cluster setup.

Cluster Topology¶

┌─────────────────────────────────────────────────────────────────┐
│                     HOMELAB GITOPS SETUP                        │
└─────────────────────────────────────────────────────────────────┘

YOUR MACHINE (Laptop/Workstation)
└── k3d Cluster (Local Dev)
    ├── 1 Server node (k3d-server)
    ├── 1-2 Agent nodes
    └── Used for: Testing, development, CI/CD

        ↓ (git push)

GITHUB REPOSITORY (Source of Truth)
├── clusters/local
├── clusters/production
└── clusters/staging (not used - all on local)

        ↓ (auto-sync via Flux)

PRODUCTION CLUSTERS

┌──────────────────────────────────────────────────────────────────┐
│ CLUSTER 1: "Rebellion" (192.168.1.x)                            │
│ Proxmox: r2d2 + butthole-ice-cream + windows                    │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│ Master Nodes (High Availability Control Plane):                │
│ ├─ leia         (r2d2, 4c/6GB, 192.168.1.100)                  │
│ ├─ obi-wan      (butthole-ice-cream, 2c/3GB, 192.168.1.110)   │
│ └─ lando        (windows, 2c/4GB, 192.168.1.120)              │
│                                                                  │
│ Worker Nodes:                                                   │
│ ├─ luke-1       (r2d2, 4c/6GB, 192.168.1.101)                 │
│ ├─ luke-2       (r2d2, 4c/6GB, 192.168.1.102)                 │
│ └─ yoda-1       (butthole-ice-cream, 2c/3GB, 192.168.1.111)   │
│                                                                  │
│ Cluster Resources:                                              │
│ ├─ Total CPU: 22 cores                                          │
│ ├─ Total RAM: 60 GB                                             │
│ ├─ Storage: ~400GB replicated via Longhorn                      │
│ └─ Network: Bridged to vmbr0, gateway 192.168.1.1             │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│ CLUSTER 2: "Empire" (10.0.2.x)                                  │
│ Proxmox: schwifty (isolated network 10.0.2.0/24)              │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│ Master/Worker Nodes:                                            │
│ ├─ rick         (schwifty, 6c/12GB, 10.0.2.100)               │
│ ├─ morty-1      (schwifty, 6c/12GB, 10.0.2.101)               │
│ └─ morty-2      (schwifty, 4c/8GB, 10.0.2.102)                │
│                                                                  │
│ Cluster Resources:                                              │
│ ├─ Total CPU: 16 cores                                          │
│ ├─ Total RAM: 47 GB                                             │
│ ├─ Storage: ~800GB replicated via Longhorn                      │
│ └─ Network: Isolated (10.0.2.x), Tailscale mesh tunnel        │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

                        ↑ Tailscale Mesh ↑
           Clusters communicate securely over VPN

Traffic Flow Architecture¶

INTERNET TRAFFIC
    ↓
Cloudflare Global Network
    ↓
Cloudflare Tunnel (Secure WebSocket from cluster → CF)
    ↓
Traefik Ingress Controller Pod (runs in K3s)
├─ Receives HTTPS traffic for all services
├─ Terminates TLS (or passes through)
└─ Routes to backend services via HTTP/TCP

    ↓

Sablier Middleware (Optional - Scale to Zero)
├─ Intercepts HTTP requests
├─ If service scaled to 0: Buffers request ~10s while pod starts
└─ If service running: Passes through immediately

    ↓

Service DNS (Kubernetes DNS: CoreDNS)
├─ myapp.default → Service (ClusterIP)
├─ Service → Pod endpoints
└─ Pods communicate directly (no additional routing)

    ↓

Application Pods
├─ Pod 1: moodle (192.168.1.101)
├─ Pod 2: moodle (192.168.1.102)
└─ Pod 3: moodle (192.168.1.103)

    ↓

PostgreSQL StatefulSet
├─ Single pod or replicated
└─ PersistentVolume (Longhorn, 2-3 replicas)

Storage Architecture¶

Persistent Volume Lifecycle¶

CREATION (First deployment)
    ↓
PVC created → Longhorn finds available nodes → Allocates space
    ↓
Longhorn creates replicas (default: 2)
├─ Replica 1: Node A (e.g., leia)
├─ Replica 2: Node B (e.g., luke-1)
└─ Replica 3: Optional (Node C, e.g., luke-2) for critical data

    ↓

POD MOUNTS & WRITES DATA
    ↓
All writes go through Longhorn → Replicated to replica nodes
    ↓
Data lives on Proxmox local storage (NVMe or HDD)

    ↓

NODE FAILURE SCENARIO
    ↓
If leia goes down:
├─ Longhorn detects node down
├─ Promotes replica from luke-1 to primary
├─ Pod reschedules to healthy node (luke-2 or obi-wan)
└─ Data is available (zero data loss)

    ↓

BACKUPS
    ↓
CronJob runs daily (2 AM)
├─ PostgreSQL: pg_dump → gzip → Backup PVC
├─ Files: tar + rsync → External storage
└─ Snapshots: Longhorn auto-snapshots every 6 hours

Storage Classes¶

┌─ local-path (Bootstrap)
│  ├─ Where: /var/lib/rancher/k3s/storage per node
│  ├─ Replicas: None (single node)
│  ├─ Use for: Traefik cache, temporary data, metrics
│  ├─ Data loss: If node dies, data is lost
│  └─ Deploy timeline: Day 1 (ready immediately)
│
└─ longhorn (Production)
   ├─ Where: Distributed across nodes + underlying disks
   ├─ Replicas: 2-3 copies across nodes
   ├─ Use for: Databases, file storage, critical configs
   ├─ Data loss: Survives 1-2 node failures
   └─ Deploy timeline: Day 2-3 (after cluster stable)

Network Segmentation¶

Kubernetes Networks (All automatic)¶

Pod Network (overlay):
├─ All pods can reach each other directly
├─ IP range: Allocated by K3s (typically 10.42.x.x)
└─ Isolated from Proxmox network

Service Network (ClusterIP):
├─ Internal DNS: servicename.namespace.svc.cluster.local
├─ Automatically load-balances to pod IPs
└─ Only reachable from inside cluster (no external access)

Ingress Network (External):
├─ External traffic enters via Traefik Ingress
├─ Routes based on hostname (moodle.yourdomain.com)
└─ Gets NodePort internally (e.g., 30123)

Proxmox Network Integration¶

Proxmox vmbr0 (Bridge)
├─ Cluster 1 (192.168.1.0/24): r2d2, butthole-ice-cream, windows
│  ├─ Gateway: 192.168.1.1
│  ├─ K3s VMs: 192.168.1.100-199
│  └─ Pod overlay: 10.42.x.x (inside VMs)
│
└─ Cluster 2 (10.0.2.0/24): schwifty only (isolated)
   ├─ Gateway: 10.0.2.1
   ├─ K3s VMs: 10.0.2.100-199
   └─ Pod overlay: 10.43.x.x (inside VMs)

Tailscale VPN (Mesh):
├─ Secure tunnel between clusters
├─ Both clusters can reach each other's services
└─ Used for backup/disaster recovery

High Availability Strategy¶

Master Node Quorum¶

3 Master Nodes (leia, obi-wan, lando)
├─ Embedded etcd (consensus database)
├─ Requires 2/3 online for cluster availability
└─ Architecture:
    ├─ Node 1 (leia) down: Cluster OK (2/3 online)
    ├─ Node 2 (obi-wan) down: Cluster OK (2/3 online)
    └─ Node 3 (lando) down: Cluster OK (2/3 online)
    └─ Any 2 down: Cluster BROKEN (wait for recovery)

Data Persistence:
├─ etcd stored on Longhorn
├─ Automatic snapshots every 6 hours
└─ Full backup daily

Pod Replication¶

Example: Moodle Service

Deployment: moodle (replicas: 3)
├─ Pod 1: Runs on leia
├─ Pod 2: Runs on luke-1
└─ Pod 3: Runs on luke-2

If leia fails:
├─ Pod 1 reschedules to obi-wan (or other healthy node)
├─ Service automatically discovers new Pod 1 location
├─ Traffic redirects within 30 seconds
└─ Pod 2 + 3 keep serving (zero downtime)

Storage attached to all pods:
├─ /moodledata PVC (Longhorn, ReadWriteMany)
├─ All pods read/write same files
└─ Longhorn ensures 2+ replicas exist

Automatic Failover¶

Pod crashes:
├─ kubelet detects pod death
├─ Deployment controller sees fewer replicas than desired
├─ New pod scheduled on healthy node
└─ Online within 30-60 seconds

Node crashes:
├─ API server detects heartbeat loss (40 seconds)
├─ Pods are drained and rescheduled
├─ Longhorn promotes replicas from other nodes
└─ Online within 60-120 seconds

Longhorn replica fails:
├─ Longhorn detects replica lag
├─ New replica allocated on healthy node
├─ Data synchronized in background
└─ Always maintains 2+ replicas

GitOps Deployment Flow¶

1. Developer makes changes locally
   git checkout -b feature/my-change
   # Edit YAML files
   git push

2. GitHub Pull Request
   ├─ CI runs: lint YAML, check Kustomize builds
   ├─ Human review
   └─ Merge to main

3. Flux CD watches GitHub (every 10 seconds)
   ├─ Detects main branch updated
   ├─ Pulls latest YAML from repo
   ├─ Builds Kustomize files
   └─ Reconciles cluster state

4. Cluster state updates
   ├─ New pods start
   ├─ Services reconfigure
   ├─ Ingress rules update
   └─ Metrics recorded

5. Monitoring alerts
   ├─ Prometheus scrapes metrics
   ├─ Grafana shows dashboard
   └─ AlertManager notifies on errors

6. Complete
   ├─ Git is source of truth
   ├─ Cluster matches Git exactly
   └─ All changes auditable

Resource Allocation¶

Cluster 1 (Rebellion) - Detailed Breakdown¶

PROXMOX HOST: r2d2 (Physical: 12c/16t, 30GB RAM)
├─ Allocated to VMs: 12c, 18GB RAM
│  ├─ leia (4c/6GB)
│  ├─ luke-1 (4c/6GB)
│  ├─ luke-2 (4c/6GB)
│  └─ Remaining on Proxmox: 0c, 12GB RAM (Proxmox overhead)
│
├─ Inside leia VM (192.168.1.100):
│  ├─ K3s control plane: 1c, 2GB RAM
│  ├─ Traefik pod: 0.1c, 128MB RAM
│  ├─ Available for workloads: 2.9c, 3.8GB RAM
│  └─ Local storage: 50GB
│
├─ Inside luke-1 VM (192.168.1.101):
│  ├─ kubelet + system: 0.1c, 512MB RAM
│  ├─ Moodle pod: 1c, 1GB RAM
│  ├─ Available for more workloads: 2.9c, 4.4GB RAM
│  └─ Local storage: 150GB
│
└─ Inside luke-2 VM (192.168.1.102):
   ├─ kubelet + system: 0.1c, 512MB RAM
   ├─ Moodle pod: 1c, 1GB RAM
   ├─ Available for more workloads: 2.9c, 4.4GB RAM
   └─ Local storage: 150GB


PROXMOX HOST: butthole-ice-cream (Physical: 6c/12t, 15GB RAM)
├─ Allocated to VMs: 4c, 6GB RAM
│  ├─ obi-wan (2c/3GB)
│  ├─ yoda-1 (2c/3GB)
│  └─ Remaining on Proxmox: 2c, 9GB RAM
│
├─ Inside obi-wan VM (192.168.1.110):
│  ├─ K3s control plane: 1c, 2GB RAM
│  ├─ Available for workloads: 1c, 1GB RAM
│  └─ Local storage: 30GB SSD
│
└─ Inside yoda-1 VM (192.168.1.111):
   ├─ kubelet + system: 0.1c, 256MB RAM
   ├─ Available for workloads: 1.9c, 2.7GB RAM
   └─ Local storage: 50GB SSD + 200GB HDD


PROXMOX HOST: windows (Physical: 4c/8t, 15GB RAM)
├─ Allocated to VMs: 2c, 4GB RAM
│  ├─ lando (2c/4GB)
│  └─ Remaining on Proxmox: 2c, 11GB RAM
│
└─ Inside lando VM (192.168.1.120):
   ├─ K3s control plane: 1c, 2GB RAM
   ├─ Available for workloads: 1c, 2GB RAM
   └─ Local storage: 40GB

Cluster 2 (Empire) - Detailed Breakdown¶

PROXMOX HOST: schwifty (Physical: 12c/24t, 47GB RAM)
├─ Allocated to VMs: 16c, 32GB RAM
│  ├─ rick (6c/12GB)
│  ├─ morty-1 (6c/12GB)
│  ├─ morty-2 (4c/8GB)
│  └─ Remaining on Proxmox: 8c, 15GB RAM (Docker Compose, monitoring)
│
├─ Inside rick VM (10.0.2.100):
│  ├─ K3s control plane: 1c, 2GB RAM
│  ├─ Available for workloads: 5c, 10GB RAM
│  └─ Local storage: 100GB
│
├─ Inside morty-1 VM (10.0.2.101):
│  ├─ kubelet + system: 0.1c, 512MB RAM
│  ├─ Available for workloads: 5.9c, 11.5GB RAM
│  └─ Local storage: 300GB
│
└─ Inside morty-2 VM (10.0.2.102):
   ├─ kubelet + system: 0.1c, 256MB RAM
   ├─ Available for workloads: 3.9c, 7.7GB RAM
   └─ Local storage: 400GB (storage-optimized)

Summary: System Design Principles¶

✅ High Availability: 3 masters, pod replication, automatic failover
✅ Data Durability: Multi-replica persistent storage (Longhorn)
✅ Scalability: Auto-scaling pods based on CPU/memory
✅ GitOps: All infrastructure as code, versioned in Git
✅ Observability: Prometheus metrics, Grafana dashboards, centralized logs
✅ Security: Network policies, RBAC, encrypted secrets (via SealedSecrets)
✅ Maintainability: Clear documentation, automation via Flux
✅ Cost Efficiency: Using existing hardware (3x Proxmox nodes)

Next Steps¶

SETUP.md: Deploy this architecture
SERVICE-CATALOG.md: Add your services
QUICK-REFERENCE.md: Common operations and troubleshooting