Kubernetes

Kubernetes (aka. k8s) is an open-source system for automating the deployment, scaling, and management of containerized applications.

A k8s cluster consists of its control-plane components and node components (each representing one or more host machines running a container runtime and kubelet.service. There are two options to install kubernetes, "the real one", described here, and a local install with k3s, kind, or minikube.

Installation

When manually creating a Kubernetes cluster install etcd^AUR and the package group kubernetes-control-plane (for a control-plane node) and kubernetes-node (for a worker node).

When creating a Kubernetes cluster with the help of kubeadm, install kubeadm and kubelet on each node.

Both control-plane and regular worker nodes require a container runtime for their kubelet instances which is used for hosting containers. Install either containerd or cri-o to meet this dependency.

To control a kubernetes cluster, install kubectl on the control-plane hosts and any external host that is supposed to be able to interact with the cluster.

Configuration

All nodes in a cluster (control-plane and worker) require a running instance of kubelet.service.

Tip: Read the following subsections closely before starting kubelet.service or using kubeadm.

Note: Disable swap on the host, as kubelet.service will otherwise fail to start.

Warning: Users with kubelet.service on a btrfs drive or subvolume running Kubernetes versions prior to 1.20.4 may be affected by a kubelet error, like:

Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 25 in cached partitions map

This affects both nested and un-nested setups. It is unrelated to CRI-O#Storage.

If you cannot upgrade to version 1.20.4+, a workaround for this bug is to create an explicit mountpoint (added to fstab) for e.g. either the entire /var/lib/ or just /var/lib/kubelet/ and /var/lib/containers/, like this:

 btrfs subvolume create /var/lib/kubelet
 btrfs subvolume create /var/lib/containers
 echo "/dev/vda2 /var/lib/kubelet btrfs subvol=/var/lib/kubelet 0 0" >>/etc/fstab
 echo "/dev/vda2 /var/lib/containers btrfs subvol=/var/lib/containers 0 0" >>/etc/fstab
 # mount -t btrfs -o subvol=/var/lib/kubelet /dev/vda2 /var/lib/kubelet/
 # mount -t btrfs -o subvol=/var/lib/containers /dev/vda2 /var/lib/containers/

Beware that kubeadm reset undoes this! For more information check out #95826, #65204,and #94335.

All provided systemd services accept CLI overrides in environment files:

kubelet.service: /etc/kubernetes/kubelet.env
kube-apiserver.service: /etc/kubernetes/kube-apiserver.env
kube-controller-manager.service: /etc/kubernetes/kube-controller-manager.env
kube-proxy.service: /etc/kubernetes/kube-proxy.env
kube-scheduler.service: /etc/kubernetes/kube-scheduler.env

This article or section needs expansion.

Reason:

Example for setup without kubeadm, using kube-apiserver.service, kube-controller-manager.service, kube-proxy.service and kube-scheduler.service.
Example for setup with kubeadm using configuration files.

(Discuss in Talk:Kubernetes)

Networking

The networking setup for the cluster has to be configured for the respective container runtime. This can be done using cni-plugins.

Pass the virtual network's CIDR to kubeadm init with e.g. --pod-network-cidr='10.85.0.0/16'.

Container runtime

The container runtime has to be configured and started, before kubelet.service can make use of it.

CRI-O

When using CRI-O as container runtime, it is required to provide kubeadm init or kubeadm join with its CRI endpoint: --cri-socket='unix:///run/crio/crio.sock'

Note: CRI-O by default uses systemd as its cgroup_manager (see /etc/crio/crio.conf). This is not compatible with kubelet's default (cgroupfs) when using kubelet < v1.22.

Change kubelet's default by appending --cgroup-driver='systemd' to the KUBELET_ARGS environment variable in /etc/kubernetes/kubelet.env upon first start (i.e. before using kubeadm init).

Note that the KUBELET_EXTRA_ARGS variable, used by older versions is now no longer read by the default kubelet.service!

When kubeadm updates from 1.19.x to 1.20.x, then it should be possible to use https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file as explained on https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-control-plane-node, as in https://github.com/cri-o/cri-o/pull/4440/files, instead of the above. (TBC, untested.)

After the node has been configured, the CLI flag could (but does not have to) be replaced by a configuration entry for kubelet:

/var/lib/kubelet/config.yaml

cgroupDriver: 'systemd'

Running

Before creating a new kubernetes cluster with kubeadm start and enable kubelet.service.

Note: kubelet.service will fail (but restart) until configuration for it is present.

Setup

This article or section needs expansion.

Reason:

Example for setup without kubeadm, using kube-apiserver.service, kube-controller-manager.service, kube-proxy.service and kube-scheduler.service.
Example for setup with kubeadm using configuration files.

(Discuss in Talk:Kubernetes)

When creating a new kubernetes cluster with kubeadm a control-plane has to be created before further worker nodes can join it.

Control-plane

Tip:

If the cluster is supposed to be turned into a high availability cluster (a stacked etcd topology) later on kubeadm init needs to be provided with --control-plane-endpoint=<IP or domain> (it is not possible to do this retroactively!).
It is possible to use a config file for kubeadm init instead of a set of parameters.

Use kubeadm init to initialize a control-plane on a host machine:

# kubeadm init --node-name=<name_of_the_node> --pod-network-cidr=<CIDR> --cri-socket=<SOCKET>

Note: Refer to #Networking and #Container runtime for <CIDR> and <SOCKET> (respectively).

If run successfully, kubeadm init will have generated configurations for the kubelet and various control-plane components below /etc/kubernetes and /var/lib/kubelet/. Finally, it will output commands ready to be copied and pasted to setup kubectl and make a worker node join the cluster (based on a token, valid for 24 hours).

To use kubectl with the freshly created control-plane node, setup the configuration (either as root or as a normal user):

$ mkdir -p $HOME/.kube
# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# chown $(id -u):$(id -g) $HOME/.kube/config

To install a pod network such as calico, follow the upstream documentation.

Worker node

With the token information generated in #Control-plane it is possible to make a node machine join an existing cluster:

 # kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash> --node-name=<name_of_the_node> --cri-socket=<SOCKET>

Note: Refer to #Container runtime for <SOCKET>.

Tips and tricks

Tear down a cluster

When it is necessary to start from scratch, use kubectl to tear down a cluster.

 kubectl drain <node name> --delete-local-data --force --ignore-daemonsets

Here <node name> is the name of the node that should be drained and reset. Use kubectl get node -A to list all nodes.

Then reset the node:

# kubeadm reset

Operating from Behind a Proxy

kubeadm reads the https_proxy, http_proxy, and no_proxy environment variables. Kubernetes internal networking should be included in the latest one, for example

export no_proxy="192.168.122.0/24,10.96.0.0/12,192.168.123.0/24"

where the second one is the default service network CIDR.

Troubleshooting

Failed to get container stats

If kubelet.service emits

 Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"

it is necessary to add configuration for the kubelet (see relevant upstream ticket).

/var/lib/kubelet/config.yaml

systemCgroups: '/systemd/system.slice'
kubeletCgroups: '/systemd/system.slice'

Pods cannot communicate when using Flannel CNI and systemd-networkd

See upstream bug report.

systemd-networkd assigns a persistent MAC address to every link. This policy is defined in its shipped configuration file /usr/lib/systemd/network/99-default.link. However, Flannel relies on being able to pick its own MAC address. To override systemd-networkd's behaviour for flannel* interfaces, create the following configuration file:

/etc/systemd/network/50-flannel.link

[Match]
OriginalName=flannel*

[Link]
MACAddressPolicy=none

Then restart systemd-networkd.service.

If the cluster is already running, you might need to manually delete the flannel.1 interface and the kube-flannel-ds-* pod on each node, including the master. The pods will be recreated immediately and they themselves will recreate the flannel.1 interfaces.

Delete the interface flannel.1:

# ip link delete flannel.1

Delete the kube-flannel-ds-* pod. Use the following command to delete all kube-flannel-ds-* pods on all nodes:

$ kubectl -n kube-system delete pod -l="app=flannel"