Kubernetes
Kubernetes (aka. k8s) is an open-source system for automating the deployment, scaling, and management of containerized applications.
A k8s cluster consists of its control-plane components and node components (each representing one or more host machines running a container runtime and kubelet.service
. There are two options to install kubernetes, "the real one", described here, and a local install with k3s, kind, or minikube.
Installation
When manually creating a Kubernetes cluster install etcdAUR and the package group kubernetes-control-plane (for a control-plane node) and kubernetes-node (for a worker node).
When creating a Kubernetes cluster with the help of kubeadm
, install kubeadm and kubelet on each node.
Both control-plane and regular worker nodes require a container runtime for their kubelet
instances which is used for hosting containers.
Install either containerd or cri-o to meet this dependency.
To control a kubernetes cluster, install kubectl on the control-plane hosts and any external host that is supposed to be able to interact with the cluster.
Configuration
All nodes in a cluster (control-plane and worker) require a running instance of kubelet.service
.
kubelet.service
or using kubeadm
.kubelet.service
will otherwise fail to start.kubelet.service
on a btrfs drive or subvolume running Kubernetes versions prior to 1.20.4 may be affected by a kubelet error, like:
Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 25 in cached partitions map
This affects both nested and un-nested setups. It is unrelated to CRI-O#Storage.
If you cannot upgrade to version 1.20.4+, a workaround for this bug is to create an explicit mountpoint (added to fstab) for e.g. either the entire /var/lib/
or just /var/lib/kubelet/
and /var/lib/containers/
, like this:
btrfs subvolume create /var/lib/kubelet btrfs subvolume create /var/lib/containers echo "/dev/vda2 /var/lib/kubelet btrfs subvol=/var/lib/kubelet 0 0" >>/etc/fstab echo "/dev/vda2 /var/lib/containers btrfs subvol=/var/lib/containers 0 0" >>/etc/fstab # mount -t btrfs -o subvol=/var/lib/kubelet /dev/vda2 /var/lib/kubelet/ # mount -t btrfs -o subvol=/var/lib/containers /dev/vda2 /var/lib/containers/Beware that
kubeadm reset
undoes this! For more information check out #95826, #65204,and #94335.All provided systemd services accept CLI overrides in environment files:
-
kubelet.service
:/etc/kubernetes/kubelet.env
-
kube-apiserver.service
:/etc/kubernetes/kube-apiserver.env
-
kube-controller-manager.service
:/etc/kubernetes/kube-controller-manager.env
-
kube-proxy.service
:/etc/kubernetes/kube-proxy.env
-
kube-scheduler.service
:/etc/kubernetes/kube-scheduler.env
Networking
The networking setup for the cluster has to be configured for the respective container runtime. This can be done using cni-plugins.
Pass the virtual network's CIDR to kubeadm init
with e.g. --pod-network-cidr='10.85.0.0/16'
.
Container runtime
The container runtime has to be configured and started, before kubelet.service
can make use of it.
CRI-O
When using CRI-O as container runtime, it is required to provide kubeadm init
or kubeadm join
with its CRI endpoint: --cri-socket='unix:///run/crio/crio.sock'
systemd
as its cgroup_manager
(see /etc/crio/crio.conf
). This is not compatible with kubelet's default (cgroupfs
) when using kubelet < v1.22.
Change kubelet's default by appending --cgroup-driver='systemd'
to the KUBELET_ARGS
environment variable in /etc/kubernetes/kubelet.env
upon first start (i.e. before using kubeadm init
).
Note that the KUBELET_EXTRA_ARGS
variable, used by older versions is now no longer read by the default kubelet.service
!
When kubeadm updates from 1.19.x to 1.20.x, then it should be possible to use https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file as explained on https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-control-plane-node, as in https://github.com/cri-o/cri-o/pull/4440/files, instead of the above. (TBC, untested.)
After the node has been configured, the CLI flag could (but does not have to) be replaced by a configuration entry for kubelet
:
/var/lib/kubelet/config.yaml
cgroupDriver: 'systemd'
Running
Before creating a new kubernetes cluster with kubeadm
start and enable kubelet.service
.
kubelet.service
will fail (but restart) until configuration for it is present.Setup
When creating a new kubernetes cluster with kubeadm
a control-plane has to be created before further worker nodes can join it.
Control-plane
- If the cluster is supposed to be turned into a high availability cluster (a stacked etcd topology) later on
kubeadm init
needs to be provided with--control-plane-endpoint=<IP or domain>
(it is not possible to do this retroactively!). - It is possible to use a config file for
kubeadm init
instead of a set of parameters.
Use kubeadm init
to initialize a control-plane on a host machine:
# kubeadm init --node-name=<name_of_the_node> --pod-network-cidr=<CIDR> --cri-socket=<SOCKET>
If run successfully, kubeadm init
will have generated configurations for the kubelet
and various control-plane components below /etc/kubernetes
and /var/lib/kubelet/
.
Finally, it will output commands ready to be copied and pasted to setup kubectl and make a worker node join the cluster (based on a token, valid for 24 hours).
To use kubectl
with the freshly created control-plane node, setup the configuration (either as root or as a normal user):
$ mkdir -p $HOME/.kube # cp -i /etc/kubernetes/admin.conf $HOME/.kube/config # chown $(id -u):$(id -g) $HOME/.kube/config
To install a pod network such as calico, follow the upstream documentation.
Worker node
With the token information generated in #Control-plane it is possible to make a node machine join an existing cluster:
# kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash> --node-name=<name_of_the_node> --cri-socket=<SOCKET>
<SOCKET>
.Tips and tricks
Tear down a cluster
When it is necessary to start from scratch, use kubectl to tear down a cluster.
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
Here <node name>
is the name of the node that should be drained and reset.
Use kubectl get node -A
to list all nodes.
Then reset the node:
# kubeadm reset
Operating from Behind a Proxy
kubeadm
reads the https_proxy
, http_proxy
, and no_proxy
environment variables. Kubernetes internal networking should be included in the latest one, for example
export no_proxy="192.168.122.0/24,10.96.0.0/12,192.168.123.0/24"
where the second one is the default service network CIDR.
Troubleshooting
Failed to get container stats
If kubelet.service
emits
Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
it is necessary to add configuration for the kubelet (see relevant upstream ticket).
/var/lib/kubelet/config.yaml
systemCgroups: '/systemd/system.slice' kubeletCgroups: '/systemd/system.slice'
Pods cannot communicate when using Flannel CNI and systemd-networkd
See upstream bug report.
systemd-networkd assigns a persistent MAC address to every link. This policy is defined in its shipped configuration file /usr/lib/systemd/network/99-default.link
. However, Flannel relies on being able to pick its own MAC address. To override systemd-networkd's behaviour for flannel*
interfaces, create the following configuration file:
/etc/systemd/network/50-flannel.link
[Match] OriginalName=flannel* [Link] MACAddressPolicy=none
Then restart systemd-networkd.service
.
If the cluster is already running, you might need to manually delete the flannel.1
interface and the kube-flannel-ds-*
pod on each node, including the master. The pods will be recreated immediately and they themselves will recreate the flannel.1
interfaces.
Delete the interface flannel.1
:
# ip link delete flannel.1
Delete the kube-flannel-ds-*
pod. Use the following command to delete all kube-flannel-ds-*
pods on all nodes:
$ kubectl -n kube-system delete pod -l="app=flannel"
See also
- Kubernetes Documentation - The upstream documentation
- Kubernetes Cluster with Kubeadm - Upstream documentation on how to setup a Kubernetes cluster using kubeadm
- Kubernetes Glossary - The official glossary explaining all Kubernetes specific terminology
- Kubernetes Addons - A list of third-party addons
- Kubelet Config File - Documentation on the Kubelet configuration file
- Taints and Tolerations - Documentation on node affinities and taints