Skip to content

Service Catalog

Template and examples for documenting and deploying services on your K3s cluster.


How to Add a Service

Step 1: Use the Service Template

Create a new directory following the pattern:

mkdir -p apps/base/myservice

Copy the template files (see below) and customize for your service.

Step 2: Document Your Service

Fill in the SERVICE.md file (see template below) with: - What it does - Dependencies (databases, storage, etc) - Networking requirements - Backup strategy - Monitoring/alerting

Step 3: Deploy Locally First

Test on your k3d dev cluster:

kustomize build clusters/local/apps | kubectl apply -f -
# Test that it works

Step 4: Push to Production

git add apps/
git commit -m "feat: add myservice"
git push
# Flux auto-deploys within 10 seconds

Service Template

Directory Structure

apps/base/myservice/
├── SERVICE.md                    # Documentation (START HERE)
├── kustomization.yaml            # Kustomize entry point
├── namespace.yaml                # Namespace definition
├── deployment.yaml               # Pod definition
├── service.yaml                  # Internal service
├── ingress.yaml                  # External access (if needed)
├── configmap.yaml                # Non-secret config
├── secret.yaml                   # Secret values (use SealedSecrets!)
├── pvc.yaml                      # Persistent storage (if needed)
├── hpa.yaml                      # Auto-scaling (if needed)
└── monitoring.yaml               # Prometheus scrape config (if applicable)

SERVICE.md Template

Create apps/base/myservice/SERVICE.md:

# My Service

## Overview
**Name**: myservice  
**Purpose**: What this service does  
**Owner**: Your name or team  
**Last Updated**: YYYY-MM-DD  
**Status**: ✅ Active / ⚠️ Testing / ❌ Deprecated

## Quick Facts

| Property | Value |
|----------|-------|
| **Image** | registry/myapp:1.0 |
| **Replicas** | 2 (minimum for HA) |
| **CPU Request** | 500m |
| **Memory Request** | 512Mi |
| **Storage** | 10Gi Longhorn |
| **Port** | 8080 |
| **URL** | https://myservice.yourdomain.com |

## Architecture
Internet (Optional) ↓ Cloudflare Tunnel (Optional) ↓ Traefik Ingress (Optional - if external) ↓ myservice-svc (Kubernetes Service, internal) ↓ myservice Pod (1, 2, 3...) ↓ myservice-db (PostgreSQL, if needed)
## Dependencies

### Required Services
- [ ] PostgreSQL (database)
- [ ] Redis (caching)
- [ ] Minio (object storage)

### Required Secrets
- [ ] `myservice-db-secret` (database password)
- [ ] `myservice-api-key` (external API key)

### Storage Requirements
- [ ] `/data` - PVC `myservice-data` (Longhorn, 10Gi, 2 replicas)
- [ ] `/tmp` - EmptyDir (ephemeral)

## Deployment

### Local Testing (k3d)
```bash
# Deploy to local cluster
kustomize build apps/local/myservice | kubectl apply -f -

# Wait for pod
kubectl wait --for=condition=ready pod -l app=myservice -n apps --timeout=120s

# Check logs
kubectl logs -f deployment/myservice -n apps

# Test connectivity
kubectl port-forward svc/myservice 8080:8080 -n apps
curl http://localhost:8080

Production Deployment

# Git push triggers Flux auto-deployment
git add apps/base/myservice/
git commit -m "feat: add myservice"
git push

# Verify on production
kubectl -n apps get pods,svc

Networking

Internal

  • Service Name: myservice
  • DNS: myservice.apps.svc.cluster.local
  • Port: 8080
  • Protocol: HTTP/TCP

External (if applicable)

  • Hostname: myservice.yourdomain.com
  • Ingress: Traefik (via Cloudflare Tunnel)
  • Port: 443 (HTTPS)
  • Auth: Basic auth (if needed)

Storage

Persistent Volumes

  • Name: myservice-data
  • Type: Longhorn (replicated)
  • Size: 10Gi
  • Replicas: 2 (auto-replicate if node fails)
  • Mount Path: /data
  • Access Mode: ReadWriteOnce

Data Retention

  • Backup frequency: Daily at 2 AM
  • Retention: 30 days
  • Backup location: Longhorn snapshots + external NAS

Scaling & Performance

Horizontal Pod Autoscaling (HPA)

  • Min replicas: 2
  • Max replicas: 5
  • CPU threshold: 70% (scale up when avg > 70%)
  • Memory threshold: 80% (scale up when avg > 80%)

Performance Targets

  • Response time: < 200ms (p99)
  • Throughput: 100 req/sec per pod
  • Availability: 99.9% SLA

Monitoring & Alerting

Prometheus Metrics

  • Scrape interval: 30s
  • Port: 9090
  • Endpoint: /metrics

Key Metrics

  • http_requests_total - Total requests
  • http_request_duration_seconds - Response time
  • myservice_db_connections - Active DB connections

Alerts

  • [ ] Pod restarts > 5 in 1 hour
  • [ ] CPU > 90% for 5 minutes
  • [ ] Memory > 95%
  • [ ] PVC > 80% full
  • [ ] Response time > 1s (p99)

Backup & Disaster Recovery

Data Backup

# Database backup (if applicable)
# Automated: CronJob daily at 2 AM
# Location: /backup/myservice_db_YYYYMMDD_HHMMSS.sql.gz

# Restore procedure
kubectl exec -it deployment/myservice-db -- psql -U admin -d mydb < backup.sql

Recovery Time Objective (RTO)

  • Pod failure: < 1 minute (auto-restart)
  • Node failure: < 5 minutes (pod reschedule)
  • Data loss: < 1 day (daily backups)

Troubleshooting

Pod stuck in Pending

kubectl describe pod <pod-name> -n apps
# Check: PVC status, resource requests, node availability

High memory usage

kubectl top pods -n apps
kubectl get pvc -n apps
# Scale up memory requests in deployment

Database connection errors

kubectl logs -f deployment/myservice -n apps | grep -i database
# Verify: database pod running, credentials correct, network access

Slow response times

kubectl top pod <pod-name> -n apps
# Check: CPU/memory pressure, database queries, disk I/O

Maintenance

Regular Tasks

  • [ ] Review logs weekly
  • [ ] Check disk usage monthly
  • [ ] Update image tags quarterly
  • [ ] Test backup restoration annually

Upgrade Procedure

# 1. Test on k3d first
kustomize build apps/local/myservice | kubectl apply -f -

# 2. Update image tag in deployment.yaml
# 3. Commit to Git
git push

# 4. Flux auto-deploys with rolling update (zero downtime)
# 5. Monitor metrics during rollout
kubectl rollout status deployment/myservice -n apps

Rollback Procedure

# Option 1: Revert Git commit
git revert <bad-commit-hash>
git push
# Flux auto-reverts deployment

# Option 2: Manual rollback
kubectl rollout undo deployment/myservice -n apps

Cost Analysis

Component CPU Memory Storage Cost/Month
Pod 1 500m 512Mi 10Gi $X
Pod 2 500m 512Mi 10Gi $X
Database 1c 1Gi 20Gi $Y
Total 2c 2.5Gi 50Gi $Z

Note: Costs are for homelab (no cloud charges). Electricity: ~$X/month

  • Depends on: PostgreSQL, Redis
  • Depended on by: Web frontend, Mobile app
  • Related: Monitoring stack (Prometheus, Grafana)

Change Log

Date Change Author
2025-11-28 Initial deployment You

Contact & Support

  • Owner: Your name
  • Slack channel: #myservice
  • Documentation: See docs/ folder
  • On-call: PagerDuty integration (if applicable)
    ### kustomization.yaml Template
    
    Create `apps/base/myservice/kustomization.yaml`:
    
    ```yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    namespace: apps
    
    resources:
      - namespace.yaml
      - deployment.yaml
      - service.yaml
      - ingress.yaml
      - configmap.yaml
      - secret.yaml
      - pvc.yaml
      - hpa.yaml
    
    # Image management (auto-update version in Git)
    images:
      - name: myapp
        newName: registry/myapp
        newTag: "1.0"
    
    # Common labels applied to all resources
    commonLabels:
      app: myservice
      tier: web
      version: "1.0"
    
    # ConfigMap from files
    configMapGenerator:
      - name: myservice-config
        files:
          - config.yaml
          - app.properties
    
    # Secrets (use SealedSecrets for production!)
    secretGenerator:
      - name: myservice-secret
        literals:
          - db-password=changeme
          - api-key=changeme
    
    # Resource limits
    replicas:
      - name: myservice
        count: 2
    

Example 1: Moodle (Already Documented)

See docs/MOODLE.md for complete Moodle setup with: - PostgreSQL database - Persistent file storage - Multi-replica deployment - Backup strategy - Monitoring


Example 2: Simple Web App

apps/base/nginx/SERVICE.md

# Nginx Web Server

## Overview
**Purpose**: Static website hosting  
**Status**: ✅ Active  

## Quick Facts
| Property | Value |
|----------|-------|
| **Image** | nginx:1.25-alpine |
| **Replicas** | 2 |
| **CPU** | 100m request, 500m limit |
| **Memory** | 64Mi request, 256Mi limit |
| **Storage** | 5Gi for website files |
| **URL** | https://mywebsite.com |

## Deployment
Simple: Just deploy from base manifest, no special requirements.

## Monitoring
- Monitor: HTTP response codes
- Alert: 5xx errors > 1%

apps/base/nginx/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: apps
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.25-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 64Mi
          limits:
            cpu: 500m
            memory: 256Mi
        volumeMounts:
        - name: content
          mountPath: /usr/share/nginx/html
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 10
      volumes:
      - name: content
        persistentVolumeClaim:
          claimName: nginx-content

Example 3: Redis Cache

apps/base/redis/SERVICE.md

# Redis Cache

## Overview
**Purpose**: In-memory data caching  
**Status**: ✅ Active  

## Quick Facts
| Property | Value |
|----------|-------|
| **Image** | redis:7-alpine |
| **Replicas** | 1 (stateful) |
| **CPU** | 200m |
| **Memory** | 512Mi request, 1Gi limit |
| **Storage** | 5Gi for persistence |
| **Port** | 6379 |

## Network
- **Service**: redis
- **DNS**: redis.apps.svc.cluster.local:6379

## Storage
- **RDB snapshots**: Persistent via Longhorn
- **Backup**: Nightly RDB export to backup storage

Service Naming Conventions

Use consistent naming for predictability:

Deployment: {service-name}
Service: {service-name}
ConfigMap: {service-name}-config
Secret: {service-name}-secret
PVC: {service-name}-{data-type}  (e.g., redis-data, moodle-files)
Namespace: apps (default for all user services)

Ingress & External Access

For External Services (accessible from internet via Cloudflare)

Add to ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myservice
  namespace: apps
  annotations:
    traefik.ingress.kubernetes.io/router.middlewares: apps-sablier@kubernetescrd
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: traefik
  tls:
  - hosts:
    - myservice.yourdomain.com
    secretName: myservice-tls

  rules:
  - host: myservice.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myservice
            port:
              number: 8080

For Internal-Only Services (not accessible from internet)

Skip the ingress.yaml. Services are still accessible via:

kubectl port-forward svc/myservice 8080:8080 -n apps


Service Dependencies Map

Web Frontend (nginx)
    ↓
Backend API (myservice)
    ↓
PostgreSQL Database
    ↓
Longhorn Storage

Optional:
Redis ← Cache layer
Minio ← Object storage
Elasticsearch ← Logging

Quick Commands for Service Management

# Deploy a service
kustomize build apps/base/myservice | kubectl apply -f -

# Restart all pods for a service
kubectl rollout restart deployment/myservice -n apps

# Scale service
kubectl scale deployment/myservice --replicas=5 -n apps

# View logs
kubectl logs -f deployment/myservice -n apps

# Port-forward for local access
kubectl port-forward svc/myservice 8080:8080 -n apps

# Check resource usage
kubectl top pods -n apps
kubectl describe deployment/myservice -n apps

# Check storage
kubectl get pvc -n apps
kubectl describe pvc myservice-data -n apps

# Execute command in pod
kubectl exec -it pod/myservice-xyz-abc -- /bin/sh

# Delete service (including PVCs!)
kustomize build apps/base/myservice | kubectl delete -f -

Service Lifecycle

CREATE
  ↓
DEPLOY (test on k3d first)
  ↓
PROMOTE (push to Git → Flux deploys to prod)
  ↓
MONITOR (watch metrics, logs)
  ↓
MAINTAIN (update image, patch vulnerabilities)
  ↓
UPGRADE (test locally, deploy via Git)
  ↓
DEPRECATE (mark as deprecated, set timeout)
  ↓
RETIRE (remove manifests, delete from production)

Next Steps

  1. Choose a service: Pick something you want to deploy
  2. Use the template: Copy SERVICE.md + manifest files
  3. Customize: Edit for your service's needs
  4. Test locally: Deploy to k3d, verify it works
  5. Push to Git: Commit and push
  6. Monitor: Watch Flux deploy and metrics update

Example 4: Skooner (Kubernetes Dashboard)

Overview

Name: skooner
Purpose: Web-based Kubernetes dashboard for cluster management and monitoring
Namespace: monitoring
Status: ✅ Active
URL: https://k3stat.serlo.lu

Quick Facts

Property Value
Chart christianhuth/skooner
Repository https://christianhuth.github.io/helm-charts
Replicas 1
Service Type ClusterIP
Port 80
Ingress Traefik
Metrics Enabled

Architecture

Internet
  ↓
Traefik Ingress (k3stat.serlo.lu)
  ↓
Skooner Service (ClusterIP)
  ↓
Skooner Pod
  ↓
Kubernetes API Server

Deployment

Skooner is deployed via Flux HelmRelease in infrastructure/base/skooner/:

# Check deployment status
kubectl get pods -n monitoring -l app=skooner
kubectl get svc -n monitoring -l app=skooner
kubectl get ingress -n monitoring -l app=skooner

# Check Flux HelmRelease
flux get helmrelease skooner -n monitoring

Access & Authentication

Get Login Token

# Create token for Skooner service account (if exists)
kubectl create token skooner-sa -n monitoring --duration=24h

# If service account doesn't exist, use default
kubectl create token default -n monitoring --duration=24h

# Or use cluster-admin token
kubectl create token default -n kube-system --duration=24h

Login Steps

  1. Navigate to https://k3stat.serlo.lu
  2. Paste the token from the command above
  3. Click "Sign In"

Networking

  • Internal Service: skooner.monitoring.svc.cluster.local:80
  • External URL: https://k3stat.serlo.lu
  • Ingress Class: traefik
  • Protocol: HTTP/HTTPS

Troubleshooting

Skooner Not Accessible

# Check pod status
kubectl get pods -n monitoring -l app=skooner
kubectl describe pod -l app=skooner -n monitoring

# Check service
kubectl get svc -n monitoring -l app=skooner
kubectl describe svc skooner -n monitoring

# Check ingress
kubectl get ingress -n monitoring -l app=skooner
kubectl describe ingress -n monitoring -l app=skooner

# Check logs
kubectl logs -f deployment/skooner -n monitoring

Flux Not Deploying

# Check HelmRelease status
flux get helmrelease skooner -n monitoring

# Check HelmRepository source
flux get source helm skooner -n flux-system

# Force reconciliation
flux reconcile helmrelease skooner -n monitoring
flux reconcile source helm skooner -n flux-system

# Check Flux logs
flux logs --follow helmrelease/skooner -n monitoring

Token Issues

# Verify service account exists
kubectl get sa -n monitoring

# Create service account if missing
kubectl create sa skooner-sa -n monitoring

# Grant permissions (adjust as needed)
kubectl create clusterrolebinding skooner-admin \
  --clusterrole=cluster-admin \
  --serviceaccount=monitoring:skooner-sa

Maintenance

Update Skooner

# Check current chart version
flux get helmrelease skooner -n monitoring

# Update version in release.yaml (if pinned)
# Edit: infrastructure/base/skooner/release.yaml
# Commit and push - Flux will auto-update

Restart Skooner

kubectl rollout restart deployment/skooner -n monitoring

Check Resource Usage

kubectl top pods -n monitoring -l app=skooner
kubectl describe pod -l app=skooner -n monitoring

Configuration

Configuration is managed via HelmRelease values in infrastructure/base/skooner/release.yaml:

values:
  service:
    type: ClusterIP
    port: 80
  ingress:
    enabled: true
    ingressClassName: traefik
    hosts:
      - host: k3stat.serlo.lu
        paths:
          - path: /
            pathType: ImplementationSpecific
  metrics:
    enabled: true
  • Depends on: Traefik (ingress), Kubernetes API Server
  • Related: Prometheus (metrics), Grafana (dashboards)

Change Log

Date Change Author
2025-01-XX Initial deployment via Flux -

Resources

  • Official Examples: https://github.com/fluxcd/flux2-kustomize-helm-example
  • Kubernetes Docs: https://kubernetes.io/docs/
  • Service Mesh: Optional (Linkerd, Istio) for advanced use cases
  • Observability: See monitoring/ infrastructure docs
  • Skooner Docs: https://skooner.io
  • Skooner GitHub: https://github.com/skooner-k8s/skooner

Last Updated: January 2025