Service Catalog¶

Template and examples for documenting and deploying services on your K3s cluster.

How to Add a Service¶

Step 1: Use the Service Template¶

Create a new directory following the pattern:

mkdir -p apps/base/myservice

Copy the template files (see below) and customize for your service.

Step 2: Document Your Service¶

Fill in the SERVICE.md file (see template below) with: - What it does - Dependencies (databases, storage, etc) - Networking requirements - Backup strategy - Monitoring/alerting

Step 3: Deploy Locally First¶

Test on your k3d dev cluster:

kustomize build clusters/local/apps | kubectl apply -f -
# Test that it works

Step 4: Push to Production¶

git add apps/
git commit -m "feat: add myservice"
git push
# Flux auto-deploys within 10 seconds

Service Template¶

Directory Structure¶

apps/base/myservice/
├── SERVICE.md                    # Documentation (START HERE)
├── kustomization.yaml            # Kustomize entry point
├── namespace.yaml                # Namespace definition
├── deployment.yaml               # Pod definition
├── service.yaml                  # Internal service
├── ingress.yaml                  # External access (if needed)
├── configmap.yaml                # Non-secret config
├── secret.yaml                   # Secret values (use SealedSecrets!)
├── pvc.yaml                      # Persistent storage (if needed)
├── hpa.yaml                      # Auto-scaling (if needed)
└── monitoring.yaml               # Prometheus scrape config (if applicable)

SERVICE.md Template¶

Create apps/base/myservice/SERVICE.md:

# My Service

## Overview
**Name**: myservice  
**Purpose**: What this service does  
**Owner**: Your name or team  
**Last Updated**: YYYY-MM-DD  
**Status**: ✅ Active / ⚠️ Testing / ❌ Deprecated

## Quick Facts

| Property | Value |
|----------|-------|
| **Image** | registry/myapp:1.0 |
| **Replicas** | 2 (minimum for HA) |
| **CPU Request** | 500m |
| **Memory Request** | 512Mi |
| **Storage** | 10Gi Longhorn |
| **Port** | 8080 |
| **URL** | https://myservice.yourdomain.com |

## Architecture

Internet (Optional) ↓ Cloudflare Tunnel (Optional) ↓ Traefik Ingress (Optional - if external) ↓ myservice-svc (Kubernetes Service, internal) ↓ myservice Pod (1, 2, 3...) ↓ myservice-db (PostgreSQL, if needed)

## Dependencies

### Required Services
- [ ] PostgreSQL (database)
- [ ] Redis (caching)
- [ ] Minio (object storage)

### Required Secrets
- [ ] `myservice-db-secret` (database password)
- [ ] `myservice-api-key` (external API key)

### Storage Requirements
- [ ] `/data` - PVC `myservice-data` (Longhorn, 10Gi, 2 replicas)
- [ ] `/tmp` - EmptyDir (ephemeral)

## Deployment

### Local Testing (k3d)
```bash
# Deploy to local cluster
kustomize build apps/local/myservice | kubectl apply -f -

# Wait for pod
kubectl wait --for=condition=ready pod -l app=myservice -n apps --timeout=120s

# Check logs
kubectl logs -f deployment/myservice -n apps

# Test connectivity
kubectl port-forward svc/myservice 8080:8080 -n apps
curl http://localhost:8080

Production Deployment¶

# Git push triggers Flux auto-deployment
git add apps/base/myservice/
git commit -m "feat: add myservice"
git push

# Verify on production
kubectl -n apps get pods,svc

Networking¶

Internal¶

Service Name: myservice
DNS: myservice.apps.svc.cluster.local
Port: 8080
Protocol: HTTP/TCP

External (if applicable)¶

Hostname: myservice.yourdomain.com
Ingress: Traefik (via Cloudflare Tunnel)
Port: 443 (HTTPS)
Auth: Basic auth (if needed)

Storage¶

Persistent Volumes¶

Name: myservice-data
Type: Longhorn (replicated)
Size: 10Gi
Replicas: 2 (auto-replicate if node fails)
Mount Path: /data
Access Mode: ReadWriteOnce

Data Retention¶

Backup frequency: Daily at 2 AM
Retention: 30 days
Backup location: Longhorn snapshots + external NAS

Scaling & Performance¶

Horizontal Pod Autoscaling (HPA)¶

Min replicas: 2
Max replicas: 5
CPU threshold: 70% (scale up when avg > 70%)
Memory threshold: 80% (scale up when avg > 80%)

Performance Targets¶

Response time: < 200ms (p99)
Throughput: 100 req/sec per pod
Availability: 99.9% SLA

Monitoring & Alerting¶

Prometheus Metrics¶

Scrape interval: 30s
Port: 9090
Endpoint: /metrics

Key Metrics¶

http_requests_total - Total requests
http_request_duration_seconds - Response time
myservice_db_connections - Active DB connections

Alerts¶

[ ] Pod restarts > 5 in 1 hour
[ ] CPU > 90% for 5 minutes
[ ] Memory > 95%
[ ] PVC > 80% full
[ ] Response time > 1s (p99)

Backup & Disaster Recovery¶

Data Backup¶

# Database backup (if applicable)
# Automated: CronJob daily at 2 AM
# Location: /backup/myservice_db_YYYYMMDD_HHMMSS.sql.gz

# Restore procedure
kubectl exec -it deployment/myservice-db -- psql -U admin -d mydb < backup.sql

Recovery Time Objective (RTO)¶

Pod failure: < 1 minute (auto-restart)
Node failure: < 5 minutes (pod reschedule)
Data loss: < 1 day (daily backups)

Troubleshooting¶

Pod stuck in Pending¶

kubectl describe pod <pod-name> -n apps
# Check: PVC status, resource requests, node availability

High memory usage¶

kubectl top pods -n apps
kubectl get pvc -n apps
# Scale up memory requests in deployment

Database connection errors¶

kubectl logs -f deployment/myservice -n apps | grep -i database
# Verify: database pod running, credentials correct, network access

Slow response times¶

kubectl top pod <pod-name> -n apps
# Check: CPU/memory pressure, database queries, disk I/O

Maintenance¶

Regular Tasks¶

[ ] Review logs weekly
[ ] Check disk usage monthly
[ ] Update image tags quarterly
[ ] Test backup restoration annually

Upgrade Procedure¶

# 1. Test on k3d first
kustomize build apps/local/myservice | kubectl apply -f -

# 2. Update image tag in deployment.yaml
# 3. Commit to Git
git push

# 4. Flux auto-deploys with rolling update (zero downtime)
# 5. Monitor metrics during rollout
kubectl rollout status deployment/myservice -n apps

Rollback Procedure¶

# Option 1: Revert Git commit
git revert <bad-commit-hash>
git push
# Flux auto-reverts deployment

# Option 2: Manual rollback
kubectl rollout undo deployment/myservice -n apps

Cost Analysis¶

Component	CPU	Memory	Storage	Cost/Month
Pod 1	500m	512Mi	10Gi	$X
Pod 2	500m	512Mi	10Gi	$X
Database	1c	1Gi	20Gi	$Y
Total	2c	2.5Gi	50Gi	$Z

Note: Costs are for homelab (no cloud charges). Electricity: ~$X/month

Depends on: PostgreSQL, Redis
Depended on by: Web frontend, Mobile app
Related: Monitoring stack (Prometheus, Grafana)

Change Log¶

Date	Change	Author
2025-11-28	Initial deployment	You

Contact & Support¶

Owner: Your name
Slack channel: #myservice
Documentation: See docs/ folder

On-call: PagerDuty integration (if applicable)

### kustomization.yaml Template

Create `apps/base/myservice/kustomization.yaml`:

```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: apps

resources:
  - namespace.yaml
  - deployment.yaml
  - service.yaml
  - ingress.yaml
  - configmap.yaml
  - secret.yaml
  - pvc.yaml
  - hpa.yaml

# Image management (auto-update version in Git)
images:
  - name: myapp
    newName: registry/myapp
    newTag: "1.0"

# Common labels applied to all resources
commonLabels:
  app: myservice
  tier: web
  version: "1.0"

# ConfigMap from files
configMapGenerator:
  - name: myservice-config
    files:
      - config.yaml
      - app.properties

# Secrets (use SealedSecrets for production!)
secretGenerator:
  - name: myservice-secret
    literals:
      - db-password=changeme
      - api-key=changeme

# Resource limits
replicas:
  - name: myservice
    count: 2

Example 1: Moodle (Already Documented)¶

See docs/MOODLE.md for complete Moodle setup with: - PostgreSQL database - Persistent file storage - Multi-replica deployment - Backup strategy - Monitoring

Example 2: Simple Web App¶

apps/base/nginx/SERVICE.md¶

# Nginx Web Server

## Overview
**Purpose**: Static website hosting  
**Status**: ✅ Active  

## Quick Facts
| Property | Value |
|----------|-------|
| **Image** | nginx:1.25-alpine |
| **Replicas** | 2 |
| **CPU** | 100m request, 500m limit |
| **Memory** | 64Mi request, 256Mi limit |
| **Storage** | 5Gi for website files |
| **URL** | https://mywebsite.com |

## Deployment
Simple: Just deploy from base manifest, no special requirements.

## Monitoring
- Monitor: HTTP response codes
- Alert: 5xx errors > 1%

apps/base/nginx/deployment.yaml¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: apps
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.25-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 64Mi
          limits:
            cpu: 500m
            memory: 256Mi
        volumeMounts:
        - name: content
          mountPath: /usr/share/nginx/html
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 10
      volumes:
      - name: content
        persistentVolumeClaim:
          claimName: nginx-content

Example 3: Redis Cache¶

apps/base/redis/SERVICE.md¶

# Redis Cache

## Overview
**Purpose**: In-memory data caching  
**Status**: ✅ Active  

## Quick Facts
| Property | Value |
|----------|-------|
| **Image** | redis:7-alpine |
| **Replicas** | 1 (stateful) |
| **CPU** | 200m |
| **Memory** | 512Mi request, 1Gi limit |
| **Storage** | 5Gi for persistence |
| **Port** | 6379 |

## Network
- **Service**: redis
- **DNS**: redis.apps.svc.cluster.local:6379

## Storage
- **RDB snapshots**: Persistent via Longhorn
- **Backup**: Nightly RDB export to backup storage

Service Naming Conventions¶

Use consistent naming for predictability:

Deployment: {service-name}
Service: {service-name}
ConfigMap: {service-name}-config
Secret: {service-name}-secret
PVC: {service-name}-{data-type}  (e.g., redis-data, moodle-files)
Namespace: apps (default for all user services)

Ingress & External Access¶

For External Services (accessible from internet via Cloudflare)¶

Add to ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myservice
  namespace: apps
  annotations:
    traefik.ingress.kubernetes.io/router.middlewares: apps-sablier@kubernetescrd
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: traefik
  tls:
  - hosts:
    - myservice.yourdomain.com
    secretName: myservice-tls

  rules:
  - host: myservice.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myservice
            port:
              number: 8080

For Internal-Only Services (not accessible from internet)¶

Skip the ingress.yaml. Services are still accessible via:

kubectl port-forward svc/myservice 8080:8080 -n apps

Service Dependencies Map¶

Web Frontend (nginx)
    ↓
Backend API (myservice)
    ↓
PostgreSQL Database
    ↓
Longhorn Storage

Optional:
Redis ← Cache layer
Minio ← Object storage
Elasticsearch ← Logging

Quick Commands for Service Management¶

# Deploy a service
kustomize build apps/base/myservice | kubectl apply -f -

# Restart all pods for a service
kubectl rollout restart deployment/myservice -n apps

# Scale service
kubectl scale deployment/myservice --replicas=5 -n apps

# View logs
kubectl logs -f deployment/myservice -n apps

# Port-forward for local access
kubectl port-forward svc/myservice 8080:8080 -n apps

# Check resource usage
kubectl top pods -n apps
kubectl describe deployment/myservice -n apps

# Check storage
kubectl get pvc -n apps
kubectl describe pvc myservice-data -n apps

# Execute command in pod
kubectl exec -it pod/myservice-xyz-abc -- /bin/sh

# Delete service (including PVCs!)
kustomize build apps/base/myservice | kubectl delete -f -

Service Lifecycle¶

CREATE
  ↓
DEPLOY (test on k3d first)
  ↓
PROMOTE (push to Git → Flux deploys to prod)
  ↓
MONITOR (watch metrics, logs)
  ↓
MAINTAIN (update image, patch vulnerabilities)
  ↓
UPGRADE (test locally, deploy via Git)
  ↓
DEPRECATE (mark as deprecated, set timeout)
  ↓
RETIRE (remove manifests, delete from production)

Next Steps¶

Choose a service: Pick something you want to deploy
Use the template: Copy SERVICE.md + manifest files
Customize: Edit for your service's needs
Test locally: Deploy to k3d, verify it works
Push to Git: Commit and push
Monitor: Watch Flux deploy and metrics update

Example 4: Skooner (Kubernetes Dashboard)¶

Overview¶

Name: skooner
Purpose: Web-based Kubernetes dashboard for cluster management and monitoring
Namespace: monitoring
Status: ✅ Active
URL: https://k3stat.serlo.lu

Quick Facts¶

Property	Value
Chart	christianhuth/skooner
Repository	https://christianhuth.github.io/helm-charts
Replicas	1
Service Type	ClusterIP
Port	80
Ingress	Traefik
Metrics	Enabled

Architecture¶

Internet
  ↓
Traefik Ingress (k3stat.serlo.lu)
  ↓
Skooner Service (ClusterIP)
  ↓
Skooner Pod
  ↓
Kubernetes API Server

Deployment¶

Skooner is deployed via Flux HelmRelease in infrastructure/base/skooner/:

# Check deployment status
kubectl get pods -n monitoring -l app=skooner
kubectl get svc -n monitoring -l app=skooner
kubectl get ingress -n monitoring -l app=skooner

# Check Flux HelmRelease
flux get helmrelease skooner -n monitoring

Access & Authentication¶

# Create token for Skooner service account (if exists)
kubectl create token skooner-sa -n monitoring --duration=24h

# If service account doesn't exist, use default
kubectl create token default -n monitoring --duration=24h

# Or use cluster-admin token
kubectl create token default -n kube-system --duration=24h

Navigate to https://k3stat.serlo.lu
Paste the token from the command above
Click "Sign In"

Networking¶

Internal Service: skooner.monitoring.svc.cluster.local:80
External URL: https://k3stat.serlo.lu
Ingress Class: traefik
Protocol: HTTP/HTTPS

Troubleshooting¶

Skooner Not Accessible¶

# Check pod status
kubectl get pods -n monitoring -l app=skooner
kubectl describe pod -l app=skooner -n monitoring

# Check service
kubectl get svc -n monitoring -l app=skooner
kubectl describe svc skooner -n monitoring

# Check ingress
kubectl get ingress -n monitoring -l app=skooner
kubectl describe ingress -n monitoring -l app=skooner

# Check logs
kubectl logs -f deployment/skooner -n monitoring

Flux Not Deploying¶

# Check HelmRelease status
flux get helmrelease skooner -n monitoring

# Check HelmRepository source
flux get source helm skooner -n flux-system

# Force reconciliation
flux reconcile helmrelease skooner -n monitoring
flux reconcile source helm skooner -n flux-system

# Check Flux logs
flux logs --follow helmrelease/skooner -n monitoring

Token Issues¶

# Verify service account exists
kubectl get sa -n monitoring

# Create service account if missing
kubectl create sa skooner-sa -n monitoring

# Grant permissions (adjust as needed)
kubectl create clusterrolebinding skooner-admin \
  --clusterrole=cluster-admin \
  --serviceaccount=monitoring:skooner-sa

Maintenance¶

Update Skooner¶

# Check current chart version
flux get helmrelease skooner -n monitoring

# Update version in release.yaml (if pinned)
# Edit: infrastructure/base/skooner/release.yaml
# Commit and push - Flux will auto-update

Restart Skooner¶

kubectl rollout restart deployment/skooner -n monitoring

Check Resource Usage¶

kubectl top pods -n monitoring -l app=skooner
kubectl describe pod -l app=skooner -n monitoring

Configuration¶

Configuration is managed via HelmRelease values in infrastructure/base/skooner/release.yaml:

values:
  service:
    type: ClusterIP
    port: 80
  ingress:
    enabled: true
    ingressClassName: traefik
    hosts:
      - host: k3stat.serlo.lu
        paths:
          - path: /
            pathType: ImplementationSpecific
  metrics:
    enabled: true

Depends on: Traefik (ingress), Kubernetes API Server
Related: Prometheus (metrics), Grafana (dashboards)

Change Log¶

Date	Change	Author
2025-01-XX	Initial deployment via Flux	-

Resources¶

Official Examples: https://github.com/fluxcd/flux2-kustomize-helm-example
Kubernetes Docs: https://kubernetes.io/docs/
Service Mesh: Optional (Linkerd, Istio) for advanced use cases
Observability: See monitoring/ infrastructure docs
Skooner Docs: https://skooner.io
Skooner GitHub: https://github.com/skooner-k8s/skooner

Last Updated: January 2025

Service Catalog¶

How to Add a Service¶

Step 1: Use the Service Template¶

Step 2: Document Your Service¶

Step 3: Deploy Locally First¶

Step 4: Push to Production¶

Service Template¶

Directory Structure¶

SERVICE.md Template¶

Production Deployment¶

Networking¶

Internal¶

External (if applicable)¶

Storage¶

Persistent Volumes¶

Data Retention¶

Scaling & Performance¶

Horizontal Pod Autoscaling (HPA)¶

Performance Targets¶

Monitoring & Alerting¶

Prometheus Metrics¶

Key Metrics¶

Alerts¶

Backup & Disaster Recovery¶

Data Backup¶

Recovery Time Objective (RTO)¶

Troubleshooting¶

Pod stuck in Pending¶

High memory usage¶

Database connection errors¶

Slow response times¶

Maintenance¶

Regular Tasks¶

Upgrade Procedure¶

Rollback Procedure¶

Cost Analysis¶

Related Services¶

Change Log¶

Contact & Support¶

Example 1: Moodle (Already Documented)¶

Example 2: Simple Web App¶

apps/base/nginx/SERVICE.md¶

apps/base/nginx/deployment.yaml¶

Example 3: Redis Cache¶

apps/base/redis/SERVICE.md¶

Service Naming Conventions¶

Ingress & External Access¶

For External Services (accessible from internet via Cloudflare)¶

For Internal-Only Services (not accessible from internet)¶

Service Dependencies Map¶

Quick Commands for Service Management¶

Service Lifecycle¶

Next Steps¶

Example 4: Skooner (Kubernetes Dashboard)¶

Overview¶

Quick Facts¶

Architecture¶

Deployment¶

Access & Authentication¶

Get Login Token¶

Login Steps¶

Networking¶

Troubleshooting¶

Skooner Not Accessible¶

Flux Not Deploying¶

Token Issues¶

Maintenance¶

Update Skooner¶

Restart Skooner¶

Check Resource Usage¶

Configuration¶

Related Services¶

Change Log¶

Resources¶