Kubernetes Cluster Management for Qelos
This guide covers managing a Kubernetes cluster for Qelos deployment, including setup, monitoring, scaling, and maintenance.
Prerequisites
- Kubernetes cluster (1.24+)
- kubectl CLI installed
- Helm 3.x installed
- Basic understanding of Kubernetes concepts
Cluster Setup Options
Option 1: Managed Kubernetes Services
Google Kubernetes Engine (GKE)
bash
# Create a GKE cluster
gcloud container clusters create qelos-cluster \
--zone us-central1-a \
--num-nodes 3 \
--machine-type n1-standard-2 \
--enable-autoscaling \
--min-nodes 2 \
--max-nodes 5
# Get credentials
gcloud container clusters get-credentials qelos-cluster --zone us-central1-aAmazon EKS
bash
# Create an EKS cluster (using eksctl)
eksctl create cluster \
--name qelos-cluster \
--region us-west-2 \
--nodegroup-name standard-workers \
--node-type t3.medium \
--nodes 3 \
--nodes-min 2 \
--nodes-max 5 \
--managed
# Update kubeconfig
aws eks update-kubeconfig --name qelos-cluster --region us-west-2Azure Kubernetes Service (AKS)
bash
# Create resource group
az group create --name qelos-rg --location eastus
# Create AKS cluster
az aks create \
--resource-group qelos-rg \
--name qelos-cluster \
--node-count 3 \
--node-vm-size Standard_DS2_v2 \
--enable-cluster-autoscaler \
--min-count 2 \
--max-count 5 \
--generate-ssh-keys
# Get credentials
az aks get-credentials --resource-group qelos-rg --name qelos-clusterDigitalOcean Kubernetes (DOKS)
bash
# Create a DOKS cluster
doctl kubernetes cluster create qelos-cluster \
--region nyc1 \
--version 1.28.2-do.0 \
--node-pool "name=worker-pool;size=s-2vcpu-4gb;count=3;auto-scale=true;min-nodes=2;max-nodes=5"
# Get credentials
doctl kubernetes cluster kubeconfig save qelos-clusterOption 2: Self-Managed Kubernetes
Using kubeadm (On-Premise/VMs)
bash
# On master node
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
# Set up kubectl for your user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Install a pod network (Calico example)
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# On worker nodes, join the cluster (use the command from kubeadm init output)
sudo kubeadm join <master-ip>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash>Using k3s (Lightweight)
bash
# On master node
curl -sfL https://get.k3s.io | sh -
# Get kubeconfig
sudo cat /var/lib/rancher/k3s/server/node-token
# On worker nodes
curl -sfL https://get.k3s.io | K3S_URL=https://<master-ip>:6443 K3S_TOKEN=<token> sh -Cluster Configuration for Qelos
1. Create Namespace
bash
# Create a dedicated namespace for Qelos
kubectl create namespace qelos
# Set as default namespace (optional)
kubectl config set-context --current --namespace=qelos2. Configure Storage Classes
Qelos requires persistent storage for MongoDB and Redis.
bash
# Check available storage classes
kubectl get storageclass
# Example: Create a storage class (adjust based on your provider)
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: qelos-storage
provisioner: kubernetes.io/gce-pd # Change based on provider
parameters:
type: pd-standard
replication-type: regional-pd
allowVolumeExpansion: true
EOF3. Set Up Ingress Controller
Install an ingress controller to expose services:
NGINX Ingress Controller
bash
# Install using Helm
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=LoadBalancerTraefik (Alternative)
bash
helm repo add traefik https://helm.traefik.io/traefik
helm repo update
helm install traefik traefik/traefik \
--namespace traefik \
--create-namespace4. Configure TLS/SSL Certificates
Using cert-manager
bash
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
# Create a ClusterIssuer for Let's Encrypt
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: your-email@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
EOF5. Set Up Monitoring
Prometheus and Grafana
bash
# Add Prometheus Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install Prometheus stack
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
# Access Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Default credentials: admin / prom-operator6. Configure Resource Quotas
bash
# Create resource quota for Qelos namespace
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
name: qelos-quota
namespace: qelos
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
persistentvolumeclaims: "10"
EOFDeploying Qelos with Helm
1. Prepare Helm Values
Create your values.yaml file (see GitHub Fork Setup for details):
bash
cp helm/qelos/values.yaml.github.tpl helm/qelos/values-production.yaml
# Edit values-production.yaml with your configuration2. Install Qelos
bash
# Install Qelos in the qelos namespace
helm install qelos ./helm/qelos \
-f ./helm/qelos/values-production.yaml \
--namespace qelos \
--create-namespace
# Or upgrade if already installed
helm upgrade --install qelos ./helm/qelos \
-f ./helm/qelos/values-production.yaml \
--namespace qelos3. Verify Deployment
bash
# Check all pods are running
kubectl get pods -n qelos
# Check services
kubectl get svc -n qelos
# Check deployments
kubectl get deployments -n qelos
# View logs for a specific service
kubectl logs -f deployment/gateway-deployment -n qelosCluster Management Tasks
Scaling Services
Manual Scaling
bash
# Scale a specific service
kubectl scale deployment gateway-deployment --replicas=5 -n qelos
# Scale using Helm
helm upgrade qelos ./helm/qelos \
-f ./helm/qelos/values-production.yaml \
--set gateway.replicas=5 \
--namespace qelosHorizontal Pod Autoscaling (HPA)
bash
# Create HPA for gateway service
kubectl autoscale deployment gateway-deployment \
--cpu-percent=70 \
--min=2 \
--max=10 \
-n qelos
# View HPA status
kubectl get hpa -n qelos
# Describe HPA
kubectl describe hpa gateway-deployment -n qelosUpdating Services
Rolling Updates
bash
# Update image for a service
kubectl set image deployment/gateway-deployment \
gateway=ghcr.io/your-username/qelos/gateway:v2.0.0 \
-n qelos
# Check rollout status
kubectl rollout status deployment/gateway-deployment -n qelos
# View rollout history
kubectl rollout history deployment/gateway-deployment -n qelosRollback
bash
# Rollback to previous version
kubectl rollout undo deployment/gateway-deployment -n qelos
# Rollback to specific revision
kubectl rollout undo deployment/gateway-deployment --to-revision=2 -n qelosManaging Secrets
bash
# Create a secret
kubectl create secret generic qelos-secrets \
--from-literal=jwt-secret=your-secret \
--from-literal=db-password=your-password \
-n qelos
# Update a secret
kubectl delete secret qelos-secrets -n qelos
kubectl create secret generic qelos-secrets \
--from-literal=jwt-secret=new-secret \
--from-literal=db-password=new-password \
-n qelos
# View secrets (base64 encoded)
kubectl get secret qelos-secrets -o yaml -n qelos
# Decode a secret
kubectl get secret qelos-secrets -o jsonpath='{.data.jwt-secret}' -n qelos | base64 --decodeManaging ConfigMaps
bash
# Create a ConfigMap
kubectl create configmap qelos-config \
--from-literal=environment=production \
--from-literal=log-level=info \
-n qelos
# Update a ConfigMap
kubectl edit configmap qelos-config -n qelos
# View ConfigMap
kubectl get configmap qelos-config -o yaml -n qelosBackup and Restore
Backup MongoDB Data
bash
# Port forward to MongoDB
kubectl port-forward svc/mongodb-service 27017:27017 -n qelos
# In another terminal, backup using mongodump
mongodump --host localhost --port 27017 --out ./backup-$(date +%Y%m%d)
# Or exec into MongoDB pod
kubectl exec -it mongodb-deployment-xxx -n qelos -- mongodump --out /tmp/backup
kubectl cp qelos/mongodb-deployment-xxx:/tmp/backup ./backup-$(date +%Y%m%d)Restore MongoDB Data
bash
# Restore using mongorestore
mongorestore --host localhost --port 27017 ./backup-20240101
# Or exec into MongoDB pod
kubectl cp ./backup-20240101 qelos/mongodb-deployment-xxx:/tmp/backup
kubectl exec -it mongodb-deployment-xxx -n qelos -- mongorestore /tmp/backupBackup Persistent Volumes
bash
# Create a snapshot (cloud provider specific)
# GKE example:
gcloud compute disks snapshot DISK_NAME --snapshot-names=qelos-backup-$(date +%Y%m%d)
# AWS example:
aws ec2 create-snapshot --volume-id vol-xxxxx --description "Qelos backup $(date +%Y%m%d)"Monitoring and Logging
View Logs
bash
# View logs for a pod
kubectl logs pod-name -n qelos
# Follow logs
kubectl logs -f pod-name -n qelos
# View logs for all pods in a deployment
kubectl logs -f deployment/gateway-deployment -n qelos
# View logs for a specific container in a pod
kubectl logs pod-name -c container-name -n qelos
# View previous container logs (if crashed)
kubectl logs pod-name --previous -n qelosResource Usage
bash
# View resource usage for nodes
kubectl top nodes
# View resource usage for pods
kubectl top pods -n qelos
# View detailed resource usage
kubectl describe node node-nameEvents
bash
# View cluster events
kubectl get events -n qelos --sort-by='.lastTimestamp'
# Watch events in real-time
kubectl get events -n qelos --watchTroubleshooting
Pod Issues
bash
# Describe a pod to see events and status
kubectl describe pod pod-name -n qelos
# Check pod status
kubectl get pods -n qelos -o wide
# Get pod YAML
kubectl get pod pod-name -n qelos -o yaml
# Execute commands in a pod
kubectl exec -it pod-name -n qelos -- /bin/sh
# Check pod resource usage
kubectl top pod pod-name -n qelosService Issues
bash
# Check service endpoints
kubectl get endpoints -n qelos
# Describe a service
kubectl describe svc service-name -n qelos
# Test service connectivity from within cluster
kubectl run test-pod --image=busybox -it --rm -n qelos -- wget -O- http://service-name:portNetwork Issues
bash
# Check network policies
kubectl get networkpolicies -n qelos
# Test DNS resolution
kubectl run test-pod --image=busybox -it --rm -n qelos -- nslookup service-name
# Check ingress
kubectl get ingress -n qelos
kubectl describe ingress ingress-name -n qelosStorage Issues
bash
# Check persistent volumes
kubectl get pv
# Check persistent volume claims
kubectl get pvc -n qelos
# Describe a PVC
kubectl describe pvc pvc-name -n qelosCluster Maintenance
Node Maintenance
bash
# Drain a node (before maintenance)
kubectl drain node-name --ignore-daemonsets --delete-emptydir-data
# Mark node as unschedulable
kubectl cordon node-name
# Mark node as schedulable
kubectl uncordon node-nameCluster Upgrades
bash
# Check current version
kubectl version
# Upgrade cluster (managed services)
# GKE:
gcloud container clusters upgrade qelos-cluster --master --cluster-version 1.28
# EKS:
eksctl upgrade cluster --name qelos-cluster --version 1.28
# AKS:
az aks upgrade --resource-group qelos-rg --name qelos-cluster --kubernetes-version 1.28.0Cleanup
bash
# Delete failed pods
kubectl delete pod --field-selector=status.phase=Failed -n qelos
# Delete completed jobs
kubectl delete job --field-selector=status.successful=1 -n qelos
# Clean up unused images (on nodes)
kubectl get nodes -o name | xargs -I {} kubectl debug {} -it --image=alpine -- crictl rmi --pruneSecurity Best Practices
Network Policies
bash
# Example: Restrict traffic to MongoDB
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: mongodb-policy
namespace: qelos
spec:
podSelector:
matchLabels:
app: mongodb
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
access-mongodb: "true"
ports:
- protocol: TCP
port: 27017
EOFRBAC Configuration
bash
# Create a service account
kubectl create serviceaccount qelos-sa -n qelos
# Create a role
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: qelos-role
namespace: qelos
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "list", "watch"]
EOF
# Bind role to service account
kubectl create rolebinding qelos-binding \
--role=qelos-role \
--serviceaccount=qelos:qelos-sa \
-n qelosPod Security Standards
bash
# Label namespace with pod security standard
kubectl label namespace qelos \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/audit=restricted \
pod-security.kubernetes.io/warn=restrictedCost Optimization
Resource Right-Sizing
bash
# Analyze resource usage
kubectl top pods -n qelos --containers
# Use Vertical Pod Autoscaler (VPA) for recommendations
kubectl describe vpa vpa-name -n qelosNode Auto-Scaling
bash
# Enable cluster autoscaler (cloud provider specific)
# GKE:
gcloud container clusters update qelos-cluster \
--enable-autoscaling \
--min-nodes 2 \
--max-nodes 10 \
--zone us-central1-aSpot/Preemptible Instances
bash
# Create node pool with spot instances (GKE example)
gcloud container node-pools create spot-pool \
--cluster qelos-cluster \
--preemptible \
--num-nodes 2 \
--zone us-central1-aNext Steps
- Production Guide - Production deployment best practices
- Troubleshooting - Common issues and solutions
- Configuration - Advanced configuration options