Service Catalog¶
Template and examples for documenting and deploying services on your K3s cluster.
How to Add a Service¶
Step 1: Use the Service Template¶
Create a new directory following the pattern:
mkdir -p apps/base/myservice
Copy the template files (see below) and customize for your service.
Step 2: Document Your Service¶
Fill in the SERVICE.md file (see template below) with: - What it does - Dependencies (databases, storage, etc) - Networking requirements - Backup strategy - Monitoring/alerting
Step 3: Deploy Locally First¶
Test on your k3d dev cluster:
kustomize build clusters/local/apps | kubectl apply -f -
# Test that it works
Step 4: Push to Production¶
git add apps/
git commit -m "feat: add myservice"
git push
# Flux auto-deploys within 10 seconds
Service Template¶
Directory Structure¶
apps/base/myservice/
├── SERVICE.md # Documentation (START HERE)
├── kustomization.yaml # Kustomize entry point
├── namespace.yaml # Namespace definition
├── deployment.yaml # Pod definition
├── service.yaml # Internal service
├── ingress.yaml # External access (if needed)
├── configmap.yaml # Non-secret config
├── secret.yaml # Secret values (use SealedSecrets!)
├── pvc.yaml # Persistent storage (if needed)
├── hpa.yaml # Auto-scaling (if needed)
└── monitoring.yaml # Prometheus scrape config (if applicable)
SERVICE.md Template¶
Create apps/base/myservice/SERVICE.md:
# My Service
## Overview
**Name**: myservice
**Purpose**: What this service does
**Owner**: Your name or team
**Last Updated**: YYYY-MM-DD
**Status**: ✅ Active / ⚠️ Testing / ❌ Deprecated
## Quick Facts
| Property | Value |
|----------|-------|
| **Image** | registry/myapp:1.0 |
| **Replicas** | 2 (minimum for HA) |
| **CPU Request** | 500m |
| **Memory Request** | 512Mi |
| **Storage** | 10Gi Longhorn |
| **Port** | 8080 |
| **URL** | https://myservice.yourdomain.com |
## Architecture
## Dependencies
### Required Services
- [ ] PostgreSQL (database)
- [ ] Redis (caching)
- [ ] Minio (object storage)
### Required Secrets
- [ ] `myservice-db-secret` (database password)
- [ ] `myservice-api-key` (external API key)
### Storage Requirements
- [ ] `/data` - PVC `myservice-data` (Longhorn, 10Gi, 2 replicas)
- [ ] `/tmp` - EmptyDir (ephemeral)
## Deployment
### Local Testing (k3d)
```bash
# Deploy to local cluster
kustomize build apps/local/myservice | kubectl apply -f -
# Wait for pod
kubectl wait --for=condition=ready pod -l app=myservice -n apps --timeout=120s
# Check logs
kubectl logs -f deployment/myservice -n apps
# Test connectivity
kubectl port-forward svc/myservice 8080:8080 -n apps
curl http://localhost:8080
Production Deployment¶
# Git push triggers Flux auto-deployment
git add apps/base/myservice/
git commit -m "feat: add myservice"
git push
# Verify on production
kubectl -n apps get pods,svc
Networking¶
Internal¶
- Service Name: myservice
- DNS: myservice.apps.svc.cluster.local
- Port: 8080
- Protocol: HTTP/TCP
External (if applicable)¶
- Hostname: myservice.yourdomain.com
- Ingress: Traefik (via Cloudflare Tunnel)
- Port: 443 (HTTPS)
- Auth: Basic auth (if needed)
Storage¶
Persistent Volumes¶
- Name: myservice-data
- Type: Longhorn (replicated)
- Size: 10Gi
- Replicas: 2 (auto-replicate if node fails)
- Mount Path: /data
- Access Mode: ReadWriteOnce
Data Retention¶
- Backup frequency: Daily at 2 AM
- Retention: 30 days
- Backup location: Longhorn snapshots + external NAS
Scaling & Performance¶
Horizontal Pod Autoscaling (HPA)¶
- Min replicas: 2
- Max replicas: 5
- CPU threshold: 70% (scale up when avg > 70%)
- Memory threshold: 80% (scale up when avg > 80%)
Performance Targets¶
- Response time: < 200ms (p99)
- Throughput: 100 req/sec per pod
- Availability: 99.9% SLA
Monitoring & Alerting¶
Prometheus Metrics¶
- Scrape interval: 30s
- Port: 9090
- Endpoint: /metrics
Key Metrics¶
http_requests_total- Total requestshttp_request_duration_seconds- Response timemyservice_db_connections- Active DB connections
Alerts¶
- [ ] Pod restarts > 5 in 1 hour
- [ ] CPU > 90% for 5 minutes
- [ ] Memory > 95%
- [ ] PVC > 80% full
- [ ] Response time > 1s (p99)
Backup & Disaster Recovery¶
Data Backup¶
# Database backup (if applicable)
# Automated: CronJob daily at 2 AM
# Location: /backup/myservice_db_YYYYMMDD_HHMMSS.sql.gz
# Restore procedure
kubectl exec -it deployment/myservice-db -- psql -U admin -d mydb < backup.sql
Recovery Time Objective (RTO)¶
- Pod failure: < 1 minute (auto-restart)
- Node failure: < 5 minutes (pod reschedule)
- Data loss: < 1 day (daily backups)
Troubleshooting¶
Pod stuck in Pending¶
kubectl describe pod <pod-name> -n apps
# Check: PVC status, resource requests, node availability
High memory usage¶
kubectl top pods -n apps
kubectl get pvc -n apps
# Scale up memory requests in deployment
Database connection errors¶
kubectl logs -f deployment/myservice -n apps | grep -i database
# Verify: database pod running, credentials correct, network access
Slow response times¶
kubectl top pod <pod-name> -n apps
# Check: CPU/memory pressure, database queries, disk I/O
Maintenance¶
Regular Tasks¶
- [ ] Review logs weekly
- [ ] Check disk usage monthly
- [ ] Update image tags quarterly
- [ ] Test backup restoration annually
Upgrade Procedure¶
# 1. Test on k3d first
kustomize build apps/local/myservice | kubectl apply -f -
# 2. Update image tag in deployment.yaml
# 3. Commit to Git
git push
# 4. Flux auto-deploys with rolling update (zero downtime)
# 5. Monitor metrics during rollout
kubectl rollout status deployment/myservice -n apps
Rollback Procedure¶
# Option 1: Revert Git commit
git revert <bad-commit-hash>
git push
# Flux auto-reverts deployment
# Option 2: Manual rollback
kubectl rollout undo deployment/myservice -n apps
Cost Analysis¶
| Component | CPU | Memory | Storage | Cost/Month |
|---|---|---|---|---|
| Pod 1 | 500m | 512Mi | 10Gi | $X |
| Pod 2 | 500m | 512Mi | 10Gi | $X |
| Database | 1c | 1Gi | 20Gi | $Y |
| Total | 2c | 2.5Gi | 50Gi | $Z |
Note: Costs are for homelab (no cloud charges). Electricity: ~$X/month
Related Services¶
- Depends on: PostgreSQL, Redis
- Depended on by: Web frontend, Mobile app
- Related: Monitoring stack (Prometheus, Grafana)
Change Log¶
| Date | Change | Author |
|---|---|---|
| 2025-11-28 | Initial deployment | You |
Contact & Support¶
- Owner: Your name
- Slack channel: #myservice
- Documentation: See docs/ folder
- On-call: PagerDuty integration (if applicable)
### kustomization.yaml Template Create `apps/base/myservice/kustomization.yaml`: ```yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization namespace: apps resources: - namespace.yaml - deployment.yaml - service.yaml - ingress.yaml - configmap.yaml - secret.yaml - pvc.yaml - hpa.yaml # Image management (auto-update version in Git) images: - name: myapp newName: registry/myapp newTag: "1.0" # Common labels applied to all resources commonLabels: app: myservice tier: web version: "1.0" # ConfigMap from files configMapGenerator: - name: myservice-config files: - config.yaml - app.properties # Secrets (use SealedSecrets for production!) secretGenerator: - name: myservice-secret literals: - db-password=changeme - api-key=changeme # Resource limits replicas: - name: myservice count: 2
Example 1: Moodle (Already Documented)¶
See docs/MOODLE.md for complete Moodle setup with: - PostgreSQL database - Persistent file storage - Multi-replica deployment - Backup strategy - Monitoring
Example 2: Simple Web App¶
apps/base/nginx/SERVICE.md¶
# Nginx Web Server
## Overview
**Purpose**: Static website hosting
**Status**: ✅ Active
## Quick Facts
| Property | Value |
|----------|-------|
| **Image** | nginx:1.25-alpine |
| **Replicas** | 2 |
| **CPU** | 100m request, 500m limit |
| **Memory** | 64Mi request, 256Mi limit |
| **Storage** | 5Gi for website files |
| **URL** | https://mywebsite.com |
## Deployment
Simple: Just deploy from base manifest, no special requirements.
## Monitoring
- Monitor: HTTP response codes
- Alert: 5xx errors > 1%
apps/base/nginx/deployment.yaml¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: apps
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.25-alpine
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
volumeMounts:
- name: content
mountPath: /usr/share/nginx/html
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 10
volumes:
- name: content
persistentVolumeClaim:
claimName: nginx-content
Example 3: Redis Cache¶
apps/base/redis/SERVICE.md¶
# Redis Cache
## Overview
**Purpose**: In-memory data caching
**Status**: ✅ Active
## Quick Facts
| Property | Value |
|----------|-------|
| **Image** | redis:7-alpine |
| **Replicas** | 1 (stateful) |
| **CPU** | 200m |
| **Memory** | 512Mi request, 1Gi limit |
| **Storage** | 5Gi for persistence |
| **Port** | 6379 |
## Network
- **Service**: redis
- **DNS**: redis.apps.svc.cluster.local:6379
## Storage
- **RDB snapshots**: Persistent via Longhorn
- **Backup**: Nightly RDB export to backup storage
Service Naming Conventions¶
Use consistent naming for predictability:
Deployment: {service-name}
Service: {service-name}
ConfigMap: {service-name}-config
Secret: {service-name}-secret
PVC: {service-name}-{data-type} (e.g., redis-data, moodle-files)
Namespace: apps (default for all user services)
Ingress & External Access¶
For External Services (accessible from internet via Cloudflare)¶
Add to ingress.yaml:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myservice
namespace: apps
annotations:
traefik.ingress.kubernetes.io/router.middlewares: apps-sablier@kubernetescrd
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: traefik
tls:
- hosts:
- myservice.yourdomain.com
secretName: myservice-tls
rules:
- host: myservice.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myservice
port:
number: 8080
For Internal-Only Services (not accessible from internet)¶
Skip the ingress.yaml. Services are still accessible via:
kubectl port-forward svc/myservice 8080:8080 -n apps
Service Dependencies Map¶
Web Frontend (nginx)
↓
Backend API (myservice)
↓
PostgreSQL Database
↓
Longhorn Storage
Optional:
Redis ← Cache layer
Minio ← Object storage
Elasticsearch ← Logging
Quick Commands for Service Management¶
# Deploy a service
kustomize build apps/base/myservice | kubectl apply -f -
# Restart all pods for a service
kubectl rollout restart deployment/myservice -n apps
# Scale service
kubectl scale deployment/myservice --replicas=5 -n apps
# View logs
kubectl logs -f deployment/myservice -n apps
# Port-forward for local access
kubectl port-forward svc/myservice 8080:8080 -n apps
# Check resource usage
kubectl top pods -n apps
kubectl describe deployment/myservice -n apps
# Check storage
kubectl get pvc -n apps
kubectl describe pvc myservice-data -n apps
# Execute command in pod
kubectl exec -it pod/myservice-xyz-abc -- /bin/sh
# Delete service (including PVCs!)
kustomize build apps/base/myservice | kubectl delete -f -
Service Lifecycle¶
CREATE
↓
DEPLOY (test on k3d first)
↓
PROMOTE (push to Git → Flux deploys to prod)
↓
MONITOR (watch metrics, logs)
↓
MAINTAIN (update image, patch vulnerabilities)
↓
UPGRADE (test locally, deploy via Git)
↓
DEPRECATE (mark as deprecated, set timeout)
↓
RETIRE (remove manifests, delete from production)
Next Steps¶
- Choose a service: Pick something you want to deploy
- Use the template: Copy
SERVICE.md+ manifest files - Customize: Edit for your service's needs
- Test locally: Deploy to k3d, verify it works
- Push to Git: Commit and push
- Monitor: Watch Flux deploy and metrics update
Example 4: Skooner (Kubernetes Dashboard)¶
Overview¶
Name: skooner
Purpose: Web-based Kubernetes dashboard for cluster management and monitoring
Namespace: monitoring
Status: ✅ Active
URL: https://k3stat.serlo.lu
Quick Facts¶
| Property | Value |
|---|---|
| Chart | christianhuth/skooner |
| Repository | https://christianhuth.github.io/helm-charts |
| Replicas | 1 |
| Service Type | ClusterIP |
| Port | 80 |
| Ingress | Traefik |
| Metrics | Enabled |
Architecture¶
Internet
↓
Traefik Ingress (k3stat.serlo.lu)
↓
Skooner Service (ClusterIP)
↓
Skooner Pod
↓
Kubernetes API Server
Deployment¶
Skooner is deployed via Flux HelmRelease in infrastructure/base/skooner/:
# Check deployment status
kubectl get pods -n monitoring -l app=skooner
kubectl get svc -n monitoring -l app=skooner
kubectl get ingress -n monitoring -l app=skooner
# Check Flux HelmRelease
flux get helmrelease skooner -n monitoring
Access & Authentication¶
Get Login Token¶
# Create token for Skooner service account (if exists)
kubectl create token skooner-sa -n monitoring --duration=24h
# If service account doesn't exist, use default
kubectl create token default -n monitoring --duration=24h
# Or use cluster-admin token
kubectl create token default -n kube-system --duration=24h
Login Steps¶
- Navigate to https://k3stat.serlo.lu
- Paste the token from the command above
- Click "Sign In"
Networking¶
- Internal Service:
skooner.monitoring.svc.cluster.local:80 - External URL: https://k3stat.serlo.lu
- Ingress Class: traefik
- Protocol: HTTP/HTTPS
Troubleshooting¶
Skooner Not Accessible¶
# Check pod status
kubectl get pods -n monitoring -l app=skooner
kubectl describe pod -l app=skooner -n monitoring
# Check service
kubectl get svc -n monitoring -l app=skooner
kubectl describe svc skooner -n monitoring
# Check ingress
kubectl get ingress -n monitoring -l app=skooner
kubectl describe ingress -n monitoring -l app=skooner
# Check logs
kubectl logs -f deployment/skooner -n monitoring
Flux Not Deploying¶
# Check HelmRelease status
flux get helmrelease skooner -n monitoring
# Check HelmRepository source
flux get source helm skooner -n flux-system
# Force reconciliation
flux reconcile helmrelease skooner -n monitoring
flux reconcile source helm skooner -n flux-system
# Check Flux logs
flux logs --follow helmrelease/skooner -n monitoring
Token Issues¶
# Verify service account exists
kubectl get sa -n monitoring
# Create service account if missing
kubectl create sa skooner-sa -n monitoring
# Grant permissions (adjust as needed)
kubectl create clusterrolebinding skooner-admin \
--clusterrole=cluster-admin \
--serviceaccount=monitoring:skooner-sa
Maintenance¶
Update Skooner¶
# Check current chart version
flux get helmrelease skooner -n monitoring
# Update version in release.yaml (if pinned)
# Edit: infrastructure/base/skooner/release.yaml
# Commit and push - Flux will auto-update
Restart Skooner¶
kubectl rollout restart deployment/skooner -n monitoring
Check Resource Usage¶
kubectl top pods -n monitoring -l app=skooner
kubectl describe pod -l app=skooner -n monitoring
Configuration¶
Configuration is managed via HelmRelease values in infrastructure/base/skooner/release.yaml:
values:
service:
type: ClusterIP
port: 80
ingress:
enabled: true
ingressClassName: traefik
hosts:
- host: k3stat.serlo.lu
paths:
- path: /
pathType: ImplementationSpecific
metrics:
enabled: true
Related Services¶
- Depends on: Traefik (ingress), Kubernetes API Server
- Related: Prometheus (metrics), Grafana (dashboards)
Change Log¶
| Date | Change | Author |
|---|---|---|
| 2025-01-XX | Initial deployment via Flux | - |
Resources¶
- Official Examples: https://github.com/fluxcd/flux2-kustomize-helm-example
- Kubernetes Docs: https://kubernetes.io/docs/
- Service Mesh: Optional (Linkerd, Istio) for advanced use cases
- Observability: See monitoring/ infrastructure docs
- Skooner Docs: https://skooner.io
- Skooner GitHub: https://github.com/skooner-k8s/skooner
Last Updated: January 2025