Kubernetes: Difference between revisions
Line 5: | Line 5: | ||
==Getting Started== | ==Getting Started== | ||
===Background=== | ===Background=== | ||
Kubernetes runs applications across nodes which are (physical or virtual) Linux machines.<br> | Kubernetes runs applications across '''nodes''' which are (physical or virtual) Linux machines.<br> | ||
Each node contains a kubelet process, a container runtime (typically containerd), and any running pods.<br> | Each node contains a kubelet process, a container runtime (typically containerd), and any running pods.<br> | ||
Pods contain resources needed to host your application including volumes and containers.<br> | '''Pods''' contain resources needed to host your application including volumes and containers.<br> | ||
Typically you will want one container per pod since deployments scale by creating multiple pods. | Typically you will want one container per pod since deployments scale by creating multiple pods.<br> | ||
A '''deployment''' is a rule which spawns and manages pods.<br> | |||
A '''service''' is a networking rule which allows connecting to pods. | |||
In addition to standard Kubernetes objects, '''operators''' watch for and allow you to instantiate custom resources (CR). | |||
==Administration== | ==Administration== |
Revision as of 00:15, 24 April 2023
Kubernetes, also known as K8s, is a container orchestration service by Google.
This means it runs containers across a cluster of machines for you and handles networking and container failures
This document contains notes on both administrating a self-hosted Kubernetes cluster and deploying applications to one.
Getting Started
Background
Kubernetes runs applications across nodes which are (physical or virtual) Linux machines.
Each node contains a kubelet process, a container runtime (typically containerd), and any running pods.
Pods contain resources needed to host your application including volumes and containers.
Typically you will want one container per pod since deployments scale by creating multiple pods.
A deployment is a rule which spawns and manages pods.
A service is a networking rule which allows connecting to pods.
In addition to standard Kubernetes objects, operators watch for and allow you to instantiate custom resources (CR).
Administration
Notes on administering kubernetes clusters.
Installation
For local development, you can install minikube.
Otherwise, install kubeadm
.
kubeadm
Deploy a Kubernetes cluster using kubeadm
KUBE_VERSION=1.23.1-00
# Setup docker repos and install containerd.io
sudo apt update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update && sudo apt install containerd.io
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet=$KUBE_VERSION kubeadm=$KUBE_VERSION kubectl=$KUBE_VERSION
sudo apt-mark hold kubelet kubeadm kubectl
- Install Containerd
sudo apt-get remove docker docker-engine docker.io containerd runc
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install containerd.io
- Setup containerd
- Container runtimes
# Configure containerd
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
# See https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd
sudo sed -i '/\[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options\]/a \ \ \ \ \ \ \ \ \ \ \ \ SystemdCgroup = true' /etc/containerd/config.toml
sudo systemctl restart containerd
# Disable swap
sudo swapoff -a && sudo sed -i '/swap/s/^/#/' /etc/fstab
sudo kubeadm init \
--cri-socket=/run/containerd/containerd.sock \
--pod-network-cidr=10.0.0.0/16
<br />
# (Optional) Remove taint on control-node to allow job scheduling
kubectl taint nodes --all node-role.kubernetes.io/master-
After creating you control plane, you need to deploy a network plugin.
Popular choices are Calico and Flannel.
See Quickstart
# Setup calico networking
kubectl create -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml
kubectl create -f -<<EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
ipPools:
- blockSize: 26
cidr: 10.0.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
nodeAddressAutodetectionV4:
canReach: "192.168.1.1"
---
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
EOF
- Notes
See https://metallb.universe.tf/installation/.
helm repo add metallb https://metallb.github.io/metallb
helm upgrade --install --create-namespace -n metallb metallb metallb/metallb
cat <<EOF >ipaddresspool.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default
namespace: metallb
spec:
addresses:
- 192.168.1.2-192.168.1.11
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: example
namespace: metallb
EOF
kubectl apply -f ipaddresspool.yaml
The ingress controller is used to forward HTTP requests to the appropriate ingress.
See https://kubernetes.github.io/ingress-nginx/.
See https://cert-manager.io/docs/installation/helm/
You may also want to setup DNS challenges to support wildcard certificates.
See https://cert-manager.io/docs/configuration/acme/dns01/cloudflare/ if you are using Cloudflare.
Run the following on worker nodes.
# Disable swap
sudo swapoff -a && sudo sed -i '/swap/s/^/#/' /etc/fstab
# Add the line to join the cluster here
# kubeadm join <ip>:6443 --token <...> --discovery-token-ca-cert-hash <...>
Certificates
Certificate Management with kubeadm
Kubernetes requires several TLS certificates which are automatically generated by Kubeadm.
These expire in one year but are automatically renewed whenever you upgrade your cluster with kubeadm upgrade apply
To renew the certificates manually, run kubeadm certs renew all
and restart your control plane services.
Note that if you lets the certificates expire, you will need to setup kubectl again.
Pods per node
How to increase pods per node
By default, Kubernetes allows 110 pods per node.
You may increase this up to a limit of 255 with the default networking subnet.
For reference, GCP GKE uses 110 pods per node and AWS EKS uses 250 pods per node.
kubectl
In general you will want to create a .yaml
manifest and use apply
, create
, or delete
to manage your resources.
nodes
kubectl get nodes
# Drain evicts all pods from a node.
kubectl drain $NODE_NAME
# Uncordon to reenable scheduling
kubectl uncordon $NODE_NAME
pods
# List all pods
kubectl get pods
kubectl describe pods
# List pods and node name
kubectl get pods -o=custom-columns='NAME:metadata.name,Node:spec.nodeName'
# Access a port on a pod
kubectl port-forward <pod> <localport:podport>
deployment
kubectl get deployments
kubectl logs $POD_NAME
kubectl exec -it $POD_NAME -- bash
# For one-off deployments of an image.
kubectl create deployment <name> --image=<image> [--replicas=1]
proxy
kubectl proxy
service
Services handle routing to your pods.
kubectl get services
kubectl expose deployment/<name> --type=<type> --port <port>
kubectl describe services/<name>
run
https://gc-taylor.com/blog/2016/10/31/fire-up-an-interactive-bash-pod-within-a-kubernetes-cluster
# Throw up a ubuntu container
kubectl run my-shell --rm -i --tty --image ubuntu -- bash
kubectl run busybox-shell --rm -i --tty --image odise/busybox-curl -- sh
Deployments
In most cases, you will use deployments to provision pods.
Deployments internally use replicasets to create multiple identical pods.
This is great for things such as webservers or standalone services which are not stateful.
In most cases, you can stick a service in front which will round-robin requests to different pods in your deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nextcloud-app
labels:
app: nextcloud
spec:
replicas: 1
selector:
matchLabels:
pod-label: nextcloud-app-pod
template:
metadata:
labels:
pod-label: nextcloud-app-pod
spec:
containers:
- name: nextcloud
image: public.ecr.aws/docker/library/nextcloud:stable
ports:
- containerPort: 80
env:
- name: MYSQL_HOST
value: nextcloud-db-service
- name: MYSQL_DATABASE
value: nextcloud
- name: MYSQL_USER
valueFrom:
secretKeyRef:
name: nextcloud-db-credentials
key: username
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: nextcloud-db-credentials
key: password
volumeMounts:
- name: nextcloud-app-storage
mountPath: /var/www/html
volumes:
- name: nextcloud-app-storage
persistentVolumeClaim:
claimName: nextcloud-app-pvc
StatefulSets
StatefulSets basics
Stateful sets are useful when you need a fixed number of pods with stable identities.
Pods created by stateful sets have a unique number suffix which allows you to query a specific pod.
Typically, you will want to use a headless service (i.e. without ClusterIP) to give local dns records to each service.
In most cases, you will want to look for a helm chart instead of creating your own stateful sets.
Services
Services handle networking.
For self-hosted/bare metal clusters, there are two types of services.
- ClusterIP - This creates an IP address on the internal cluster which nodes and pods on the cluster can access. (Default)
- NodePort - This exposes the port on every node. It implicitly creates a ClusterIP and every node will route to that. This allows access from outside the cluster.
- ExternalName - uses a CNAME record. Primarily for accessing other services from within the cluster.
- LoadBalancer - Creates a clusterip+nodeport and tells the loadbalancer to create an IP and route it to the nodeport.
- On bare-metal clusters you will need to install a loadbalancer such as metallb.
By default, ClusterIP is provided by kube-proxy
and performs round-robin load-balancing to pods.
For exposing non-http(s) production services, you typically will use a LoadBalancer service.
For https services, you will typically use an ingress.
apiVersion: v1
kind: Service
metadata:
name: pwiki-app-service
spec:
type: ClusterIP
selector:
pod-label: pwiki-app-pod
ports:
- protocol: TCP
port: 80
targetPort: 3000
Ingress
Ingress | Kubernetes
Ingress is equivalent to having a load-balancer / reverse-proxy pod with a NodePort service.
Installing ingress-nginx
See ingress-nginx to deploy an ingress controller.
Note that ingress-nginx
is managed by the Kubernetes team and nginx-ingress
is an different ingress controller by the Nginx team.
Personally, I have:
controller:
watchIngressWithoutClass: true
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 50
targetMemoryUtilizationPercentage: 50
behavior: {}
service:
enabled: true
appProtocol: true
annotations: {}
labels: {}
externalIPs: []
enableHttp: true
enableHttps: true
ports:
http: 80
https: 443
targetPorts:
http: http
https: https
type: LoadBalancer
loadBalancerIP: 192.168.1.3
config:
proxy-body-size: 1g
#!/bin/bash
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
cd "${DIR}" || exit
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace \
-f values.yaml
To set settings per-ingress, add the annotation to your ingress definition:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: nextcloud
annotations:
cert-manager.io/issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: 10g
spec:
tls:
- secretName: cloud-davidl-me-tls
hosts:
- cloud.davidl.me
rules:
- host: cloud.davidl.me
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nextcloud-app-service
port:
number: 80
If your backend uses HTTPS, you will need to add the annotation: nginx.ingress.kubernetes.io/backend-protocol: HTTPS
For self-signed SSL certificates, you will also need the annotation:
nginx.ingress.kubernetes.io/configuration-snippet: |
proxy_ssl_name $host;
proxy_ssl_server_name on;
Authentication
ingress-nginx external oauth
If you like to authenticate using an oauth2 provider (e.g. Google, GitHub), I suggest using oauth2-proxy.
- First setup a deployment of the oauth2, possibly without an upstream.
- Then you can simply add the following annotations to your ingresses to protect them:
nginx.ingress.kubernetes.io/auth-url: "https://oauth2proxy.davidl.me/oauth2/[email protected]" nginx.ingress.kubernetes.io/auth-signin: "https://oauth2proxy.davidl.me/oauth2/start?rd=$scheme://$host$request_uri"
Autoscaling
Horizontal Autoscale Walkthrough
Horizontal Pod Autoscaler
You will need to install metrics-server.
For testing, you may need to allow insecure tls.
Accessing External Services
access mysql on localhost
To access services running outside of your kubernetes cluster, including services running directly on a node, you need to add an endpoint and a service.
apiVersion: v1
kind: Service
metadata:
name: t440s-wireguard
spec:
type: ClusterIP
ports:
- protocol: TCP
port: 52395
targetPort: 52395
---
apiVersion: v1
kind: Endpoints
metadata:
name: t440s-wireguard
subsets:
- addresses:
- ip: 192.168.1.40
ports:
- port: 52395
NetworkPolicy
Network policies are used to limit ingress or egress to pods.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: access-rstudio
spec:
podSelector:
matchLabels:
pod-label: rstudio-pod
ingress:
- from:
- podSelector:
matchLabels:
rstudio-access: "true"
Security Context
security context If you want to restrict pods to run as a particular UID/GUI while still binding to any port, you can add the following:
spec:
securityContext:
runAsUser: 1000
runAsGroup: 1000
sysctls:
- name: net.ipv4.ip_unprivileged_port_start
value: "0"
Devices
Generic devices
See https://gitlab.com/arm-research/smarter/smarter-device-manager
and https://github.com/kubernetes/kubernetes/issues/7890#issuecomment-766088805
Intel GPU
See https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin
After adding the gpu plugin, add the following to your deployment.
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
resources:
limits:
gpu.intel.com/i915: 1
Restarting your cluster
Scale to 0
reference
If you wish to restart all nodes of your cluster, you can scale your deployments and stateful sets down to 0 and then scale them back up after.
# Annotate existing deployments and statefulsets with replica count.
kubectl get deploy -o jsonpath='{range .items[*]}{"kubectl annotate --overwrite deploy "}{@.metadata.name}{" previous-size="}{@.spec.replicas}{" \n"}{end}' | sh
kubectl get sts -o jsonpath='{range .items[*]}{"kubectl annotate --overwrite sts "}{@.metadata.name}{" previous-size="}{@.spec.replicas}{" \n"}{end}' | sh
# Scale to 0.
# shellcheck disable=SC2046
kubectl scale --replicas=0 $(kubectl get deploy -o name)
# shellcheck disable=SC2046
kubectl scale --replicas=0 $(kubectl get sts -o name)
# Scale back up.
kubectl get deploy -o jsonpath='{range .items[*]}{"kubectl scale deploy "}{@.metadata.name}{" --replicas="}{.metadata.annotations.previous-size}{"\n"}{end}' | sh
kubectl get sts -o jsonpath='{range .items[*]}{"kubectl scale sts "}{@.metadata.name}{" --replicas="}{.metadata.annotations.previous-size}{"\n"}{end}' | sh
Helm
Helm is a method for deploying applications using premade kubernetes manifest templates known as helm charts.
Helm charts abstract away manifests, allowing you to focus on only the important configuration values.
Manifests can also be composed into other manifests for applications which require multiple microservices.
https://artifacthub.io/ allows you to search for helm charts others have made.
bitnami/charts contains helm charts for many popular applications.
Usage
To install an application, generally you do the following:
- Create a yaml file, e.g.
values.yaml
with the options you want. - If necessary, create any PVs, PVCs, and Ingress which might be required.
- Install the application using helm.
helm upgrade --install $NAME $CHARTNAME -f values.yaml [--version $VERSION]
Troubleshooting
Sometimes, Kubernetes will deprecate APIs, preventing it from managing existing helm releases.
The mapkubeapis helm plugin can help resolve some of these issues.
Variants
minikube
minikube is a tool to quickly set up a local Kubernetes dev environment on your PC.
kind
k3s
k3s is a lighter-weight Kubernetes by Rancher Labs.
KubeVirt
KubeVirt allows you to run virtual machines with vGPU support on your Kubernetes cluster.