Kubernetes: Difference between revisions

From David's Wiki
 
(88 intermediate revisions by the same user not shown)
Line 1: Line 1:
Kubernetes, also known as K8s, is a container orchestration service by Google.
Kubernetes, also known as K8s, is a container orchestration service by Google.<br>
It supposedly has a harder learning curve than docker-swarm but is heavily inspired by Google's internal [https://research.google/pubs/pub43438/#:~:text=Google's%20Borg%20system%20is%20a,tens%20of%20thousands%20of%20machines. borg system].
This means it runs containers across a cluster of machines for you and handles networking and container failures<br>
This document contains notes on both administrating a self-hosted Kubernetes cluster and deploying applications to one.


==Getting Started==
==Getting Started==
===Background===
===Background===
Kubernetes runs applications across nodes which are physical or virtual machines.<br>
Kubernetes runs applications across '''nodes''' which are (physical or virtual) Linux machines.<br>
Each node contains a kubelet process, a container runtime, and possibly one or more pods.<br>
Each node contains a kubelet process, a container runtime (typically containerd), and any running pods.<br>
Pods contain resources needed to host your application including volumes and one or more containers.
'''Pods''' contain resources needed to host your application including volumes and containers.<br>
Typically you will want one container per pod since deployments scale by creating multiple pods.<br>
A '''deployment''' is a rule which spawns and manages pods.<br>
A '''service''' is a networking rule which allows connecting to pods.


Typically you will want one container per pod since deployments scale by creating multiple pods.
In addition to standard Kubernetes objects, '''operators''' watch for and allow you to instantiate custom resources (CR).
 
==Administration==
Notes on administering kubernetes clusters.


===Installation===
===Installation===
Line 15: Line 22:


====kubeadm====
====kubeadm====
kubeadm install
Deploy a Kubernetes cluster using kubeadm
{{hidden | Install Commands |
{{hidden | Install Commands |
<pre>
[https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ Install Kubeadm]
<syntaxhighlight lang="bash">
KUBE_VERSION=1.23.1-00
# Setup docker repos and install containerd.io
# Setup docker repos and install containerd.io
sudo apt update
sudo apt update
Line 46: Line 55:
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-get install -y kubelet=$KUBE_VERSION kubeadm=$KUBE_VERSION kubectl=$KUBE_VERSION
sudo apt-mark hold kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
</syntaxhighlight>


;Install Containerd
<syntaxhighlight lang="bash">
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install containerd.io
</syntaxhighlight>


 
;Setup containerd
;[https://kubernetes.io/docs/setup/production-environment/container-runtimes/ Container runtimes]
<syntaxhighlight lang="bash">
# Configure containerd
# Configure containerd
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
Line 74: Line 97:
sudo systemctl restart containerd
sudo systemctl restart containerd


# Systemd cgroup
# See https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd
sudo vim /etc/containerd/config.toml
sudo sed -i '/\[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options\]/a \ \ \ \ \ \ \ \ \ \ \ \ SystemdCgroup = true' /etc/containerd/config.toml
 
# Under this line, add the line below.
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
#    SystemdCgroup = true
sudo systemctl restart containerd
sudo systemctl restart containerd
</pre>
</syntaxhighlight>
}}
}}


{{hidden | Control Plane Init |
{{hidden | Control Plane Init |
<pre>
<syntaxhighlight lang="bash">
# Disable swap
# Disable swap
sudo swapoff -a
sudo swapoff -a && sudo sed -i '/swap/s/^/#/' /etc/fstab
# Comment out any swap in /etc/fstab
sudo kubeadm init \
sudo kubeadm init \
   --cri-socket=/run/containerd/containerd.sock \
   --cri-socket=/run/containerd/containerd.sock \
   --pod-network-cidr=10.0.0.0/16
   --pod-network-cidr=10.0.0.0/16
    
    
# (Optional) Remove taint on control-node to allow job scheduling
kubectl taint nodes --all node-role.kubernetes.io/master-
</syntaxhighlight>
}}
{{hidden | Setup Networking With Calico |
After creating you control plane, you need to deploy a network plugin.<br>
Popular choices are Calico and Flannel.<br>
See [https://projectcalico.docs.tigera.io/getting-started/kubernetes/quickstart Quickstart]
<syntaxhighlight lang="bash">
# Setup calico networking
# Setup calico networking
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
kubectl create -f -<<EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
    - blockSize: 26
      cidr: 10.0.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()
    nodeAddressAutodetectionV4:
      canReach: "192.168.1.1"
---
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}
EOF
</syntaxhighlight>
 
;Notes
* [https://stackoverflow.com/questions/57504063/calico-kubernetes-pods-cant-ping-each-other-use-cluster-ip https://stackoverflow.com/questions/57504063/calico-kubernetes-pods-cant-ping-each-other-use-cluster-ip]
}}
{{hidden | Local Balancer (MetalLB) |
See https://metallb.universe.tf/installation/.<br>
<syntaxhighlight lang="bash">
helm repo add metallb https://metallb.github.io/metallb
helm upgrade --install --create-namespace -n metallb metallb metallb/metallb
 
cat <<EOF >ipaddresspool.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: metallb
spec:
  addresses:
  - 192.168.1.2-192.168.1.11
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: example
  namespace: metallb
EOF
 
kubectl apply -f ipaddresspool.yaml
</syntaxhighlight>
}}
{{hidden | Ingress Controller (ingress-nginx) |
The ingress controller is used to forward HTTP requests to the appropriate ingress.<br>
See https://kubernetes.github.io/ingress-nginx/.
}}
{{hidden | cert-manager |
See https://cert-manager.io/docs/installation/helm/


# (Optional) Remove taint on control-node to allow job scheduling
You may also want to setup DNS challenges to support wildcard certificates.<br>
kubectl taint nodes --all node-role.kubernetes.io/master-
See https://cert-manager.io/docs/configuration/acme/dns01/cloudflare/ if you are using Cloudflare.
</pre>
}}
}}
{{hidden | Add worker nodes |
{{hidden | Add worker nodes |
Run the following on worker nodes.
Run the following on worker nodes.
<pre>
<syntaxhighlight lang="bash">
# Disable swap
# Disable swap
sudo swapoff -a
sudo swapoff -a && sudo sed -i '/swap/s/^/#/' /etc/fstab
# Comment out any swap in /etc/fstab
# Add the line to join the cluster here
# Add the line to join the cluster here
# kubeadm join <ip>:6443 --token <...> --discovery-token-ca-cert-hash <...>
# kubeadm join <ip>:6443 --token <...> --discovery-token-ca-cert-hash <...>
</pre>
</syntaxhighlight>
}}
}}


;Notes
===Certificates===
* [https://stackoverflow.com/questions/57504063/calico-kubernetes-pods-cant-ping-each-other-use-cluster-ip https://stackoverflow.com/questions/57504063/calico-kubernetes-pods-cant-ping-each-other-use-cluster-ip]
[https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/ Certificate Management with kubeadm]
Kubernetes requires several TLS certificates which are automatically generated by Kubeadm.
These expire in one year but are automatically renewed whenever you upgrade your cluster with <code>kubeadm upgrade apply</code>
 
To renew the certificates manually, run <code>kubeadm certs renew all</code> and restart your control plane services.
Note that if you lets the certificates expire, you will need to setup kubectl again.


===Pods per node===
===Pods per node===
[http://blog.schoolofdevops.com/how-to-increase-the-number-of-pods-limit-per-kubernetes-node/ increase pods per node]<br>
[http://blog.schoolofdevops.com/how-to-increase-the-number-of-pods-limit-per-kubernetes-node/ How to increase pods per node]<br>
By default, Kubernetes allows 110 pods per node.<br>
By default, Kubernetes allows 110 pods per node.<br>
You may increase this up to a limit of 255 with the default networking subnet.<br>
You may increase this up to a limit of 255 with the default networking subnet.<br>
For reference, GCP GKE uses 110 pods per node and AWS EKS uses 250 pods per node.
For reference, GCP GKE uses 110 pods per node and AWS EKS uses 250 pods per node.
===Changing Master Address===
See https://ystatit.medium.com/how-to-change-kubernetes-kube-apiserver-ip-address-402d6ddb8aa2


==kubectl==
==kubectl==
Line 124: Line 219:


===nodes===
===nodes===
<pre>
<syntaxhighlight lang="bash">
kubectl get nodes
kubectl get nodes
</pre>
 
# Drain evicts all pods from a node.
kubectl drain $NODE_NAME
# Uncordon to reenable scheduling
kubectl uncordon $NODE_NAME
</syntaxhighlight>


===pods===
===pods===
<pre>
<syntaxhighlight lang="bash">
# List all pods
kubectl get pods
kubectl get pods
kubectl describe pods
kubectl describe pods
# List pods and node name
kubectl get pods -o=custom-columns='NAME:metadata.name,Node:spec.nodeName'


# Access a port on a pod
# Access a port on a pod
kubectl port-forward <pod> <localport:podport>
kubectl port-forward <pod> <localport:podport>
</pre>
</syntaxhighlight>


===deployment===
===deployment===
<pre>
<syntaxhighlight lang="bash">
kubectl get deployments
kubectl get deployments
kubectl logs $POD_NAME
kubectl logs $POD_NAME
Line 145: Line 249:
# For one-off deployments of an image.
# For one-off deployments of an image.
kubectl create deployment <name> --image=<image> [--replicas=1]
kubectl create deployment <name> --image=<image> [--replicas=1]
</pre>
</syntaxhighlight>


===proxy===
===proxy===
<pre>
<syntaxhighlight lang="bash">
kubectl proxy
kubectl proxy
</pre>
</syntaxhighlight>


===service===
===service===
Services handle routing to your pods.
Services handle routing to your pods.
<pre>
<syntaxhighlight lang="bash">
kubectl get services
kubectl get services


kubectl expose deployment/<name> --type=<type> --port <port>
kubectl expose deployment/<name> --type=<type> --port <port>
kubectl describe services/<name>
kubectl describe services/<name>
 
</syntaxhighlight>
</pre>


===run===
===run===
[https://gc-taylor.com/blog/2016/10/31/fire-up-an-interactive-bash-pod-within-a-kubernetes-cluster https://gc-taylor.com/blog/2016/10/31/fire-up-an-interactive-bash-pod-within-a-kubernetes-cluster]<br>
[https://gc-taylor.com/blog/2016/10/31/fire-up-an-interactive-bash-pod-within-a-kubernetes-cluster https://gc-taylor.com/blog/2016/10/31/fire-up-an-interactive-bash-pod-within-a-kubernetes-cluster]<br>
<pre>
<syntaxhighlight lang="bash">
# Throw up a ubuntu container
# Throw up a ubuntu container
kubectl run my-shell --rm -i --tty --image ubuntu -- bash
kubectl run my-shell --rm -i --tty --image ubuntu -- bash
</pre>
kubectl run busybox-shell --rm -i --tty --image odise/busybox-curl -- sh
</syntaxhighlight>
 
==Deployments==
In most cases, you will use deployments to provision pods.<br>
Deployments internally use replicasets to create multiple identical pods.<br>
This is great for things such as webservers or standalone services which are not stateful.
In most cases, you can stick a service in front which will round-robin requests to different pods in your deployment.
 
{{hidden | Example Deployment |
<syntaxhighlight lang="yaml">
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nextcloud-app
  labels:
    app: nextcloud
spec:
  replicas: 1
  selector:
    matchLabels:
      pod-label: nextcloud-app-pod
  template:
    metadata:
      labels:
        pod-label: nextcloud-app-pod
    spec:
      containers:
        - name: nextcloud
          image: public.ecr.aws/docker/library/nextcloud:stable
          ports:
            - containerPort: 80
          env:
            - name: MYSQL_HOST
              value: nextcloud-db-service
            - name: MYSQL_DATABASE
              value: nextcloud
            - name: MYSQL_USER
              valueFrom:
                secretKeyRef:
                  name: nextcloud-db-credentials
                  key: username
            - name: MYSQL_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: nextcloud-db-credentials
                  key: password
          volumeMounts:
            - name: nextcloud-app-storage
              mountPath: /var/www/html
      volumes:
        - name: nextcloud-app-storage
          persistentVolumeClaim:
            claimName: nextcloud-app-pvc
</syntaxhighlight>
}}
 
==StatefulSets==
[https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/ StatefulSets basics]<br>
Stateful sets are useful when you need a fixed number of pods with stable identities such as databases.<br>
Pods created by stateful sets have a unique number suffix which allows you to query a specific pod.<br>
Typically, you will want to use a headless service (i.e. without ClusterIP) to give local dns records to each service.
 
In most cases, you will want to look for a helm chart instead of creating your own stateful sets.


==Services==
==Services==
Line 173: Line 339:


Services handle networking.   
Services handle networking.   
For self-hosted/bare metal deployments, there are two types of services.
For self-hosted/bare metal clusters, there are two types of services.
* ClusterIP - This creates an IP address on the internal cluster which nodes and pods on the cluster can access. (Default)
* ClusterIP - This creates an IP address on the internal cluster which nodes and pods on the cluster can access. (Default)
* NodePort - This exposes the port on every node. It implicitly creates a ClusterIP and every node will route to that. This allows access from outside the cluster.
* NodePort - This exposes the port on every node. It implicitly creates a ClusterIP and every node will route to that. This allows access from outside the cluster.
* ExternalName - uses a CNAME record. Primarily for accessing other services from within the cluster.
* ExternalName - uses a CNAME record. Primarily for accessing other services from within the cluster.
* LoadBalancer - Creates a clusterip+nodeport and tells the loadbalancer to create an IP and route it to the nodeport.
** On bare-metal clusters you will need to install a loadbalancer such as metallb.


On managed deployments (e.g. AWS EKS, GKE) you also have
By default, ClusterIP is provided by <code>kube-proxy</code> and performs round-robin load-balancing to pods.<br>
* LoadBalancer - fires up the provider's load balancer
For exposing non-http(s) production services, you typically will use a LoadBalancer service.<br>
For https services, you will typically use an ingress.


By default, ClusterIP is provided by <code>kube-proxy</code> and performs round-robin load-balancing to pods.
{{ hidden | Example ClusterIP Service |
<syntaxhighlight lang="yaml">
apiVersion: v1
kind: Service
metadata:
  name: pwiki-app-service
spec:
  type: ClusterIP
  selector:
    pod-label: pwiki-app-pod
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
</syntaxhighlight>
}}


==Ingress==
==Ingress==
[https://kubernetes.io/docs/concepts/services-networking/ingress/ Ingress | Kubernetes]<br>
[https://kubernetes.io/docs/concepts/services-networking/ingress/ Ingress | Kubernetes]<br>
Ingress is equivalent to having a load-balancer / reverse-proxy pod with a NodePort service.
An ingress is an http endpoint. This configures an ingress controller which is a load-balancer or reverse-proxy pod that integrates with Kubernetes.
 
A common ingress controller is [https://github.com/kubernetes/ingress-nginx ingress-nginx] which is maintained by the Kubernetes team. Alternatives include [https://docs.nginx.com/nginx-ingress-controller/installation/installing-nic/installation-with-helm/ nginx-ingress] [https://doc.traefik.io/traefik/providers/kubernetes-ingress/ traefik], [https://haproxy-ingress.github.io/ haproxy-ingress], and [https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/ others].
 
===Installing ingress-nginx===
See [https://kubernetes.github.io/ingress-nginx/deploy/ ingress-nginx] to deploy an ingress controller.<br>
Note that <code>ingress-nginx</code> is managed by the Kubernetes team and <code>nginx-ingress</code> is an different ingress controller by the Nginx team.
 
Personally, I have:
{{hidden | values.yaml |
<syntaxhighlight lang="yaml">
controller:
  watchIngressWithoutClass: true
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 5
    targetCPUUtilizationPercentage: 50
    targetMemoryUtilizationPercentage: 50
    behavior: {}
 
  service:
    enabled: true
    appProtocol: true
 
    annotations: {}
    labels: {}
    externalIPs: []
 
    enableHttp: true
    enableHttps: true
 
    ports:
      http: 80
      https: 443
 
    targetPorts:
      http: http
      https: https
 
    type: LoadBalancer
    loadBalancerIP: 192.168.1.3
    externalTrafficPolicy: Local
 
  config:
    proxy-body-size: 1g
</syntaxhighlight>
}}
{{hidden | upgrade.sh |
<syntaxhighlight lang="bash">
#!/bin/bash
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
cd "${DIR}" || exit
 
helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace \
  -f values.yaml
</syntaxhighlight>
}}
To set settings per-ingress, add the annotation to your ingress definition:
{{hidden | example ingress |
<syntaxhighlight lang="yaml">
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nextcloud
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: 10g
spec:
  tls:
    - secretName: cloud-davidl-me-tls
      hosts:
        - cloud.davidl.me
  rules:
    - host: cloud.davidl.me
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: nextcloud-app-service
                port:
                  number: 80
</syntaxhighlight>
}}
 
If your backend uses HTTPS, you will need to add the annotation: <code>nginx.ingress.kubernetes.io/backend-protocol: HTTPS</code>
For self-signed SSL certificates, you will also need the annotation:
<syntaxhighlight lang="yaml">
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_ssl_name $host;
      proxy_ssl_server_name on;
</syntaxhighlight>
 
===Authentication===
[https://kubernetes.github.io/ingress-nginx/examples/auth/oauth-external-auth/ ingress-nginx external oauth]<br>
If you like to authenticate using an oauth2 provider (e.g. Google, GitHub), I suggest using [https://github.com/oauth2-proxy/oauth2-proxy oauth2-proxy].
# First setup a deployment of the oauth2, possibly without an upstream.
# Then you can simply add the following annotations to your ingresses to protect them:
#:<syntaxhighlight lang="yaml">
nginx.ingress.kubernetes.io/auth-url: "http://oauth2proxy.default.svc.cluster.local/oauth2/[email protected]"
nginx.ingress.kubernetes.io/auth-signin: "https://oauth2proxy.davidl.me/oauth2/start?rd=$scheme://$host$request_uri"
</syntaxhighlight>
 
;Additional things to look into
* Pomerium
* Keycloak
** https://www.talkingquickly.co.uk/webapp-authentication-keycloak-OAuth2-proxy-nginx-ingress-kubernetes
* Authelia - only supports username/password as the first factor
* Authentik - tried this but had too complicated and buggy for me.


===Installing an Ingress Controller===
If you use Cloudflare, you can also use Cloudflare access, though make sure you prevent other sources from accessing the service directly.
See [https://kubernetes.github.io/ingress-nginx/deploy/ ingress-nginx] to deploy an ingress controller.


==Autoscaling==
==Autoscaling==
Line 206: Line 501:
kind: Service
kind: Service
metadata:
metadata:
   name: t440s-wireguard-service
   name: t440s-wireguard
spec:
spec:
   type: ClusterIP
   type: ClusterIP
Line 217: Line 512:
kind: Endpoints
kind: Endpoints
metadata:
metadata:
   name: t440s-wireguard-service
   name: t440s-wireguard
subsets:
subsets:
   - addresses:
   - addresses:
Line 225: Line 520:
</syntaxhighlight>
</syntaxhighlight>
}}
}}
==NetworkPolicy==
Network policies are used to limit ingress or egress to pods.<br>
{{hidden | Example network policy |
<syntaxhighlight lang="yaml">
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: access-rstudio
spec:
  podSelector:
    matchLabels:
      pod-label: rstudio-pod
  ingress:
    - from:
        - podSelector:
            matchLabels:
              rstudio-access: "true"
</syntaxhighlight>
}}
==Security Context==
[https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ security context]
If you want to restrict pods to run as a particular UID/GUI while still binding to any port, you can add the following:
<syntaxhighlight lang=yaml>
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
        sysctls:
        - name: net.ipv4.ip_unprivileged_port_start
          value: "0"
</syntaxhighlight>
==Devices==
===Generic devices===
See [https://gitlab.com/arm-research/smarter/smarter-device-manager https://gitlab.com/arm-research/smarter/smarter-device-manager]<br>
and [https://github.com/kubernetes/kubernetes/issues/7890#issuecomment-766088805 https://github.com/kubernetes/kubernetes/issues/7890#issuecomment-766088805]
===Intel GPU===
See [https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin]
After adding the gpu plugin, add the following to your deployment.
<syntaxhighlight lang="yaml">
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
          resources:
            limits:
              gpu.intel.com/i915: 1
</syntaxhighlight>
==Restarting your cluster==
===Scale to 0===
[https://stackoverflow.com/questions/64133011/scale-down-kubernetes-deployments-to-0-and-scale-back-to-original-number-of-repl reference]<br>
If you wish to restart all nodes of your cluster, you can scale your deployments and stateful sets down to 0 and then scale them back up after.
<syntaxhighlight lang="bash">
# Annotate existing deployments and statefulsets with replica count.
kubectl get deploy -o jsonpath='{range .items[*]}{"kubectl annotate --overwrite deploy "}{@.metadata.name}{" previous-size="}{@.spec.replicas}{" \n"}{end}' | sh
kubectl get sts -o jsonpath='{range .items[*]}{"kubectl annotate --overwrite sts "}{@.metadata.name}{" previous-size="}{@.spec.replicas}{" \n"}{end}' | sh
# Scale to 0.
# shellcheck disable=SC2046
kubectl scale --replicas=0 $(kubectl get deploy -o name)
# shellcheck disable=SC2046
kubectl scale --replicas=0 $(kubectl get sts -o name)
# Scale back up.
kubectl get deploy -o jsonpath='{range .items[*]}{"kubectl scale deploy "}{@.metadata.name}{" --replicas="}{.metadata.annotations.previous-size}{"\n"}{end}' | sh
kubectl get sts -o jsonpath='{range .items[*]}{"kubectl scale sts "}{@.metadata.name}{" --replicas="}{.metadata.annotations.previous-size}{"\n"}{end}' | sh
</syntaxhighlight>
==Helm==
Helm is a method for deploying applications using premade kubernetes manifest templates known as helm charts.<br>
Helm charts abstract away manifests, allowing you to focus on only the important configuration values.<br>
Manifests can also be composed into other manifests for applications which require multiple microservices.
[https://artifacthub.io/ https://artifacthub.io/] allows you to search for helm charts others have made.<br>
[https://github.com/bitnami/charts bitnami/charts] contains helm charts for many popular applications.
===Usage===
To install an application, generally you do the following:
# Create a yaml file, e.g. <code>values.yaml</code> with the options you want.
# If necessary, create any PVs, PVCs, and Ingress which might be required.
# Install the application using helm.
#:<pre>helm upgrade --install $NAME $CHARTNAME -f values.yaml [--version $VERSION]</pre>
===Troubleshooting===
Sometimes, Kubernetes will deprecate APIs, preventing it from managing existing helm releases.<br>
The [https://github.com/helm/helm-mapkubeapis mapkubeapis] helm plugin can help resolve some of these issues.


==Variants==
==Variants==
===minikube===
===minikube===
[https://minikube.sigs.k8s.io/docs/ minikube] is a tool to quickly set up a local Kubernetes cluster on your PC.
[https://minikube.sigs.k8s.io/docs/ minikube] is a tool to quickly set up a local Kubernetes dev environment on your PC.


===kind===
===kind===
Line 234: Line 622:
===k3s===
===k3s===
[https://k3s.io/ k3s] is a lighter-weight Kubernetes by Rancher Labs.
[https://k3s.io/ k3s] is a lighter-weight Kubernetes by Rancher Labs.
It includes Flannel CNI and Traefik Ingress Controller.
==KubeVirt==
{{main | KubeVirt}}
KubeVirt allows you to run virtual machines on your Kubernetes cluster.


==Resources==
==Resources==
* [https://kubernetes.io/docs/tutorials/kubernetes-basics/ Kubernetes Basics]
* [https://kubernetes.io/docs/tutorials/kubernetes-basics/ Kubernetes Basics]
* [https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/ Certified Kubernetes Administrator (CKA) with Practice Tests (~$15)]
* [https://yolops.net/k8s-dualstack-cilium.html https://yolops.net/k8s-dualstack-cilium.html]

Latest revision as of 05:36, 14 July 2024

Kubernetes, also known as K8s, is a container orchestration service by Google.
This means it runs containers across a cluster of machines for you and handles networking and container failures
This document contains notes on both administrating a self-hosted Kubernetes cluster and deploying applications to one.

Getting Started

Background

Kubernetes runs applications across nodes which are (physical or virtual) Linux machines.
Each node contains a kubelet process, a container runtime (typically containerd), and any running pods.
Pods contain resources needed to host your application including volumes and containers.
Typically you will want one container per pod since deployments scale by creating multiple pods.
A deployment is a rule which spawns and manages pods.
A service is a networking rule which allows connecting to pods.

In addition to standard Kubernetes objects, operators watch for and allow you to instantiate custom resources (CR).

Administration

Notes on administering kubernetes clusters.

Installation

For local development, you can install minikube.
Otherwise, install kubeadm.

kubeadm

Deploy a Kubernetes cluster using kubeadm

Install Commands

Install Kubeadm

KUBE_VERSION=1.23.1-00
# Setup docker repos and install containerd.io
sudo apt update
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update && sudo apt install containerd.io

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system

sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet=$KUBE_VERSION kubeadm=$KUBE_VERSION kubectl=$KUBE_VERSION
sudo apt-mark hold kubelet kubeadm kubectl
Install Containerd
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install containerd.io
Setup containerd
Container runtimes
# Configure containerd
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd

# See https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd
sudo sed -i '/\[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options\]/a \ \ \ \ \ \ \ \ \ \ \ \ SystemdCgroup = true' /etc/containerd/config.toml
sudo systemctl restart containerd
Control Plane Init
# Disable swap
sudo swapoff -a && sudo sed -i '/swap/s/^/#/' /etc/fstab
sudo kubeadm init \
  --cri-socket=/run/containerd/containerd.sock \
  --pod-network-cidr=10.0.0.0/16
<br />
# (Optional) Remove taint on control-node to allow job scheduling
kubectl taint nodes --all node-role.kubernetes.io/master-
Setup Networking With Calico

After creating you control plane, you need to deploy a network plugin.
Popular choices are Calico and Flannel.
See Quickstart

# Setup calico networking
kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
kubectl create -f -<<EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
    - blockSize: 26
      cidr: 10.0.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()
    nodeAddressAutodetectionV4:
      canReach: "192.168.1.1"
---
apiVersion: operator.tigera.io/v1
kind: APIServer 
metadata: 
  name: default 
spec: {}
EOF
Notes
Local Balancer (MetalLB)

See https://metallb.universe.tf/installation/.

helm repo add metallb https://metallb.github.io/metallb
helm upgrade --install --create-namespace -n metallb metallb metallb/metallb

cat <<EOF >ipaddresspool.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: metallb
spec:
  addresses:
  - 192.168.1.2-192.168.1.11
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: example
  namespace: metallb
EOF

kubectl apply -f ipaddresspool.yaml
Ingress Controller (ingress-nginx)

The ingress controller is used to forward HTTP requests to the appropriate ingress.
See https://kubernetes.github.io/ingress-nginx/.

cert-manager

See https://cert-manager.io/docs/installation/helm/

You may also want to setup DNS challenges to support wildcard certificates.
See https://cert-manager.io/docs/configuration/acme/dns01/cloudflare/ if you are using Cloudflare.

Add worker nodes

Run the following on worker nodes.

# Disable swap
sudo swapoff -a && sudo sed -i '/swap/s/^/#/' /etc/fstab
# Add the line to join the cluster here
# kubeadm join <ip>:6443 --token <...> --discovery-token-ca-cert-hash <...>

Certificates

Certificate Management with kubeadm Kubernetes requires several TLS certificates which are automatically generated by Kubeadm. These expire in one year but are automatically renewed whenever you upgrade your cluster with kubeadm upgrade apply

To renew the certificates manually, run kubeadm certs renew all and restart your control plane services. Note that if you lets the certificates expire, you will need to setup kubectl again.

Pods per node

How to increase pods per node
By default, Kubernetes allows 110 pods per node.
You may increase this up to a limit of 255 with the default networking subnet.
For reference, GCP GKE uses 110 pods per node and AWS EKS uses 250 pods per node.

Changing Master Address

See https://ystatit.medium.com/how-to-change-kubernetes-kube-apiserver-ip-address-402d6ddb8aa2

kubectl

In general you will want to create a .yaml manifest and use apply, create, or delete to manage your resources.

nodes

kubectl get nodes

# Drain evicts all pods from a node.
kubectl drain $NODE_NAME
# Uncordon to reenable scheduling
kubectl uncordon $NODE_NAME

pods

# List all pods
kubectl get pods
kubectl describe pods

# List pods and node name
kubectl get pods -o=custom-columns='NAME:metadata.name,Node:spec.nodeName'

# Access a port on a pod
kubectl port-forward <pod> <localport:podport>

deployment

kubectl get deployments
kubectl logs $POD_NAME
kubectl exec -it $POD_NAME -- bash

# For one-off deployments of an image.
kubectl create deployment <name> --image=<image> [--replicas=1]

proxy

kubectl proxy

service

Services handle routing to your pods.

kubectl get services

kubectl expose deployment/<name> --type=<type> --port <port>
kubectl describe services/<name>

run

https://gc-taylor.com/blog/2016/10/31/fire-up-an-interactive-bash-pod-within-a-kubernetes-cluster

# Throw up a ubuntu container
kubectl run my-shell --rm -i --tty --image ubuntu -- bash
kubectl run busybox-shell --rm -i --tty --image odise/busybox-curl -- sh

Deployments

In most cases, you will use deployments to provision pods.
Deployments internally use replicasets to create multiple identical pods.
This is great for things such as webservers or standalone services which are not stateful. In most cases, you can stick a service in front which will round-robin requests to different pods in your deployment.

Example Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nextcloud-app
  labels:
    app: nextcloud
spec:
  replicas: 1
  selector:
    matchLabels:
      pod-label: nextcloud-app-pod
  template:
    metadata:
      labels:
        pod-label: nextcloud-app-pod
    spec:
      containers:
        - name: nextcloud
          image: public.ecr.aws/docker/library/nextcloud:stable
          ports:
            - containerPort: 80
          env:
            - name: MYSQL_HOST
              value: nextcloud-db-service
            - name: MYSQL_DATABASE
              value: nextcloud
            - name: MYSQL_USER
              valueFrom:
                secretKeyRef:
                  name: nextcloud-db-credentials
                  key: username
            - name: MYSQL_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: nextcloud-db-credentials
                  key: password
          volumeMounts:
            - name: nextcloud-app-storage
              mountPath: /var/www/html
      volumes:
        - name: nextcloud-app-storage
          persistentVolumeClaim:
            claimName: nextcloud-app-pvc

StatefulSets

StatefulSets basics
Stateful sets are useful when you need a fixed number of pods with stable identities such as databases.
Pods created by stateful sets have a unique number suffix which allows you to query a specific pod.
Typically, you will want to use a headless service (i.e. without ClusterIP) to give local dns records to each service.

In most cases, you will want to look for a helm chart instead of creating your own stateful sets.

Services

Documentation

Services handle networking.
For self-hosted/bare metal clusters, there are two types of services.

  • ClusterIP - This creates an IP address on the internal cluster which nodes and pods on the cluster can access. (Default)
  • NodePort - This exposes the port on every node. It implicitly creates a ClusterIP and every node will route to that. This allows access from outside the cluster.
  • ExternalName - uses a CNAME record. Primarily for accessing other services from within the cluster.
  • LoadBalancer - Creates a clusterip+nodeport and tells the loadbalancer to create an IP and route it to the nodeport.
    • On bare-metal clusters you will need to install a loadbalancer such as metallb.

By default, ClusterIP is provided by kube-proxy and performs round-robin load-balancing to pods.
For exposing non-http(s) production services, you typically will use a LoadBalancer service.
For https services, you will typically use an ingress.

Example ClusterIP Service
apiVersion: v1
kind: Service
metadata:
  name: pwiki-app-service
spec:
  type: ClusterIP
  selector:
    pod-label: pwiki-app-pod
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000

Ingress

Ingress | Kubernetes
An ingress is an http endpoint. This configures an ingress controller which is a load-balancer or reverse-proxy pod that integrates with Kubernetes.

A common ingress controller is ingress-nginx which is maintained by the Kubernetes team. Alternatives include nginx-ingress traefik, haproxy-ingress, and others.

Installing ingress-nginx

See ingress-nginx to deploy an ingress controller.
Note that ingress-nginx is managed by the Kubernetes team and nginx-ingress is an different ingress controller by the Nginx team.

Personally, I have:

values.yaml
controller:
  watchIngressWithoutClass: true
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 5
    targetCPUUtilizationPercentage: 50
    targetMemoryUtilizationPercentage: 50
    behavior: {}

  service:
    enabled: true
    appProtocol: true

    annotations: {}
    labels: {}
    externalIPs: []

    enableHttp: true
    enableHttps: true

    ports:
      http: 80
      https: 443

    targetPorts:
      http: http
      https: https

    type: LoadBalancer
    loadBalancerIP: 192.168.1.3
    externalTrafficPolicy: Local

  config:
    proxy-body-size: 1g
upgrade.sh
#!/bin/bash
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
cd "${DIR}" || exit

helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace \
  -f values.yaml

To set settings per-ingress, add the annotation to your ingress definition:

example ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nextcloud
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: 10g
spec:
  tls:
    - secretName: cloud-davidl-me-tls
      hosts:
        - cloud.davidl.me
  rules:
    - host: cloud.davidl.me
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: nextcloud-app-service
                port:
                  number: 80

If your backend uses HTTPS, you will need to add the annotation: nginx.ingress.kubernetes.io/backend-protocol: HTTPS For self-signed SSL certificates, you will also need the annotation:

    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_ssl_name $host;
      proxy_ssl_server_name on;

Authentication

ingress-nginx external oauth
If you like to authenticate using an oauth2 provider (e.g. Google, GitHub), I suggest using oauth2-proxy.

  1. First setup a deployment of the oauth2, possibly without an upstream.
  2. Then you can simply add the following annotations to your ingresses to protect them:
    nginx.ingress.kubernetes.io/auth-url: "http://oauth2proxy.default.svc.cluster.local/oauth2/[email protected]"
    nginx.ingress.kubernetes.io/auth-signin: "https://oauth2proxy.davidl.me/oauth2/start?rd=$scheme://$host$request_uri"
    
Additional things to look into

If you use Cloudflare, you can also use Cloudflare access, though make sure you prevent other sources from accessing the service directly.

Autoscaling

Horizontal Autoscale Walkthrough
Horizontal Pod Autoscaler

You will need to install metrics-server.
For testing, you may need to allow insecure tls.

Accessing External Services

access mysql on localhost
To access services running outside of your kubernetes cluster, including services running directly on a node, you need to add an endpoint and a service.

Example
apiVersion: v1
kind: Service
metadata:
  name: t440s-wireguard
spec:
  type: ClusterIP
  ports:
    - protocol: TCP
      port: 52395
      targetPort: 52395
---
apiVersion: v1
kind: Endpoints
metadata:
  name: t440s-wireguard
subsets:
  - addresses:
      - ip: 192.168.1.40
    ports:
      - port: 52395

NetworkPolicy

Network policies are used to limit ingress or egress to pods.

Example network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: access-rstudio
spec:
  podSelector:
    matchLabels:
      pod-label: rstudio-pod
  ingress:
    - from:
        - podSelector:
            matchLabels:
              rstudio-access: "true"

Security Context

security context If you want to restrict pods to run as a particular UID/GUI while still binding to any port, you can add the following:

    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
        sysctls:
        - name: net.ipv4.ip_unprivileged_port_start
          value: "0"

Devices

Generic devices

See https://gitlab.com/arm-research/smarter/smarter-device-manager
and https://github.com/kubernetes/kubernetes/issues/7890#issuecomment-766088805

Intel GPU

See https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin

After adding the gpu plugin, add the following to your deployment.

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
          resources:
            limits:
              gpu.intel.com/i915: 1

Restarting your cluster

Scale to 0

reference
If you wish to restart all nodes of your cluster, you can scale your deployments and stateful sets down to 0 and then scale them back up after.

# Annotate existing deployments and statefulsets with replica count.
kubectl get deploy -o jsonpath='{range .items[*]}{"kubectl annotate --overwrite deploy "}{@.metadata.name}{" previous-size="}{@.spec.replicas}{" \n"}{end}' | sh
kubectl get sts -o jsonpath='{range .items[*]}{"kubectl annotate --overwrite sts "}{@.metadata.name}{" previous-size="}{@.spec.replicas}{" \n"}{end}' | sh

# Scale to 0.
# shellcheck disable=SC2046
kubectl scale --replicas=0 $(kubectl get deploy -o name) 
# shellcheck disable=SC2046
kubectl scale --replicas=0 $(kubectl get sts -o name)

# Scale back up.
kubectl get deploy -o jsonpath='{range .items[*]}{"kubectl scale deploy "}{@.metadata.name}{" --replicas="}{.metadata.annotations.previous-size}{"\n"}{end}' | sh
kubectl get sts -o jsonpath='{range .items[*]}{"kubectl scale sts "}{@.metadata.name}{" --replicas="}{.metadata.annotations.previous-size}{"\n"}{end}' | sh

Helm

Helm is a method for deploying applications using premade kubernetes manifest templates known as helm charts.
Helm charts abstract away manifests, allowing you to focus on only the important configuration values.
Manifests can also be composed into other manifests for applications which require multiple microservices.

https://artifacthub.io/ allows you to search for helm charts others have made.
bitnami/charts contains helm charts for many popular applications.

Usage

To install an application, generally you do the following:

  1. Create a yaml file, e.g. values.yaml with the options you want.
  2. If necessary, create any PVs, PVCs, and Ingress which might be required.
  3. Install the application using helm.
    helm upgrade --install $NAME $CHARTNAME -f values.yaml [--version $VERSION]

Troubleshooting

Sometimes, Kubernetes will deprecate APIs, preventing it from managing existing helm releases.
The mapkubeapis helm plugin can help resolve some of these issues.

Variants

minikube

minikube is a tool to quickly set up a local Kubernetes dev environment on your PC.

kind

k3s

k3s is a lighter-weight Kubernetes by Rancher Labs. It includes Flannel CNI and Traefik Ingress Controller.

KubeVirt

KubeVirt allows you to run virtual machines on your Kubernetes cluster.

Resources