SLATE Cluster Upgrade Guide - K8s v1.24
The team has been busy preparing SLATE for Kubernetes (K8s) v1.24.x
and today we are happy to announce that this work is now live. As cluster administrators there are several upgrade tasks you must perform to continue using SLATE with this new version of K8s.
Table of Contents
This post will walk you, the cluster administrator, through the following tasks:
Kubernetes Tasks
- Upgrade your SLATE Kubernetes Cluster from K8s
v1.x
tov1.24.x
usingkubeadm
(see below) - Allow pods to run on single-node clusters (see below)
- (Recommended) Update the Calico CNI to
>= v3.24.1
(see below) - (Recommended) Update MetalLB to
>= v0.13.5
(see below)
SLATE Tasks
- Upgrade the SLATE Federation Controller roles (see below)
- Update the SLATE Federation Controller itself (see below)
- Upgrade the SLATE Ingress Controller (see below)
Kubernetes Tasks
Upgrade to K8s v1.24.x
Check the status of the cluster
Start by SSH-ing to your control plane and switching to the root
user.
Configure kubectl
/kubeadm
and check the state of the Kubernetes nodes.
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes
The output should resemble:
NAME STATUS ROLES AGE VERSION
<worker> Ready <none> 2y68d v1.22.1
<controlplane> Ready control-plane 2y68d v1.22.1
Install and configure containerd
If you are using Docker on your cluster, you’ll need to switch the kubernetes runtime from Docker to containerd
because Kubernetes removed support for Docker in v1.24.0
. This guide has instructions on updating from Docker to containerd.
Please note that this step in the guide needs to be done for each node in your kubernetes cluster.
After updating to containerd, one more change must be made to the service to increase the number of open files. The default value is limitNOFILE=infinity, but due to a regression, ‘infinity’ sets the limit at 65k. The following commands will increase it to 100k, which is required to run some applications such as XCache.
systemctl edit containerd
In the editor add the following line:
LimitNOFILE=1048576
Then restart the service
systemctl restart containerd
Determine the upgrade path
Best practice is to upgrade from one Kubernetes minor release to the next and so forth down the line all the way to v1.24.x
. For example, if you are starting at v1.21.x
the upgrade path should resemble:
v1.21.x
–>v1.22.15
v1.22.15
–>v1.23.13
v1.23.13
–>v1.24.x
Note: The patchlevel of the minor releases may have changed since this document was written. See this page to get the latest patchlevel to use for each minor release. E.g. v1.22.16
instead of v1.22.15
Upgrade the control plane
Let’s assume that like the example above, we are beginning with Kubernetes v1.21.x
. Install the related packages for Kubernetes v1.22.15
, making sure the kubernetes
YUM repo is enabled.
yum update --enablerepo=kubernetes kubelet-1.22.15 kubeadm-1.22.15 kubectl-1.22.15
Check that your cluster can be upgraded from v1.21.x
–> v1.22.15
.
kubeadm upgrade plan
If there aren’t any issues proceed with the upgrade.
kubeadm upgrade apply v1.22.15
The output should resemble:
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.22.15". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
Restart kubelet
and check its status.
systemctl daemon-reload && \
systemctl restart kubelet && \
systemctl status kubelet
Upgrade the worker nodes one at a time
Leaving a terminal connected to the control plane, SSH to your first worker node in a fresh terminal, and switch to the root
user.
Install the related packages for Kubernetes v1.22.15
, making sure the kubernetes
YUM repo is enabled.
yum update --enablerepo=kubernetes kubelet-1.22.15 kubeadm-1.22.15 kubectl-1.22.15
Back in the control plane terminal window apply the upgrade and prepare the worker node for maintenance.
kubeadm upgrade node && \
kubectl drain <workernode1> --ignore-daemonsets
In the worker node terminal window restart kubelet
and check its status.
systemctl daemon-reload && \
systemctl restart kubelet && \
systemctl status kubelet
If everything looks good, finish up by uncordoning the node in the control plane terminal window.
kubectl uncordon <workernode1>
Log out of your worker node terminal window and rinse-repeat for your remaining worker nodes.
Verify the status of the cluster
Now that the kubelet
has been upgraded on the control plane and worker nodes, once more SSH to your control plane and switch to the root
user.
Configure kubectl
/kubeadm
and check the state of the Kubernetes nodes.
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes
The output should resemble:
NAME STATUS ROLES AGE VERSION
<worker> Ready <none> 2y68d v1.22.15
<controlplane> Ready control-plane 2y68d v1.22.15
If everything was successful the control plane and workers should all report as v1.22.15
.
Next steps: v1.22.15
to v1.23.12
At this point in the example your cluster should be running v1.22.15
. Repeat the steps described above to upgrade from v1.22.15
to v1.23.12
:
- Adjusting the K8s versions described in the commands accordingly.
Next steps: v1.23.12
to v1.24.x
At this point in the example your cluster should be running v1.23.12
. Repeat the steps described above to upgrade from v1.23.12
to v1.24.x
:
- Adjusting the K8s versions described in the commands accordingly.
- Removing the
--network-plugin
option from/var/lib/kubelet/kubeadm-flags.env
before restarting each of thekubelet
s.
Additional information
See the Kubernetes documentation for complete instructions on updating a Kubernetes cluster from v1.x
to v1.24.x
using kubeadm
.
Single-Node Clusters
By default, Kubernetes prevents pods from running on the Control-Plane/Master node. Running a single-node cluster requires removing this setting so that Kubernetes has the resources to run pods. If you a running a multi-node cluster, this step is not necessary.
kubectl taint nodes --all node-role.kubernetes.io/master:NoSchedule-
(Recommended) Update Calico CNI
Run this command to get the version of Calico CNI currenlty installed:
kubectl describe pod -n `kubectl get pods -A | grep calico | grep controller | awk '{print $1" "$2}'` | grep Image: | awk -F: '{print $3}'
If the version is < v3.24.1, update the Calico CNI to >= v3.24.1
.
- If you followed our Manual Cluster Installation instructions when initially setting up your cluster, use the example below to update your Tigera operators and custom resources files.
- If you chose a different route for initially installing and configuring Calico, please refer directly to the Calico documentation for update procedures.
Example
custom-resources.yaml
.Install a newer version of Calico using the operator:
CALICO_VERSION=3.24.1 && \
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v${CALICO_VERSION}/manifests/tigera-operator.yaml && \
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v${CALICO_VERSION}/manifests/custom-resources.yaml
For more information on updating Calico see Upgrade Calico on Kubernetes.
Once Calico is updated, you can verify it is working with the following commands:
kubectl run pingtest --image=busybox -it /bin/sh
ping google.com
This shows that the DNS is working and is a good indication that Calico is working.
(Recommended) Update MetalLB
Run this command to get the version of MetalLB currently installed:
kubectl describe pod -n `kubectl get pods -A | grep metal | grep controller | awk '{print $1" "$2}'` | grep Image: | awk -F: '{print $3}'
If the version is < v0.13.5, update MetalLB to >= v0.13.5
.
- If you followed our Manual Cluster Installation instructions when initially setting up your cluster, use the example below to update your MetalLB installation.
- If you chose a different route for initially installing and configuring MetalLB, please refer directly to the MetalLB documentation for update procedures.
Example
Install a newer version of MetalLB using the new native manifest:
METALLB_VERSION=0.13.5 && \
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v${METALLB_VERSION}/config/manifests/metallb-native.yaml
If you are updating from a version of MetalLB that uses ConfigMap
s, gather the current address pool information by executing the following:
kubectl describe configmap config -n metallb-system
Create a new custom resource (CR) with the gathered IP pool information. Replace the IP addresses in this example with the IP addresses from the previous command.
cat <<EOF > /tmp/metallb-ipaddrpool.yml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
namespace: metallb-system
spec:
addresses:
- 192.168.9.1-192.168.9.5
EOF
kubectl create -f /tmp/metallb-ipaddrpool.yml
Then create a Layer 2 advertisement for the first-pool
address pool:
cat <<EOF > /tmp/metallb-ipaddrpool-advert.yml
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: example
namespace: metallb-system
spec:
ipAddressPools:
- first-pool
EOF
kubectl create -f /tmp/metallb-ipaddrpool-advert.yml
Finally, remove the deprecated ConfigMap
:
kubectl delete configmap config -n metallb-system
For more information on updating MetalLB see Installation By Manifest.
SLATE Tasks
Update the SLATE Federation Controller Role
Update the role using the following command:
kubectl apply -f https://raw.githubusercontent.com/slateci/federation-controller/main/resources/installation/federation-role.yaml
Upon execution, kubectl
should update the controller and output something similar to:
clusterrole.rbac.authorization.k8s.io/federation-cluster configured
clusterrole.rbac.authorization.k8s.io/federation-cluster-global unchanged
Update the SLATE Federation Controller
Updating the federation controller is a two-step process.
The old
nrp-controller
deployment needs to be deleted by running:kubectl -n kube-system delete deployment nrp-controller
The new controller deployment needs to be installed by running:
kubectl apply -f https://raw.githubusercontent.com/slateci/federation-controller/main/resources/installation/upgrade-controller-debug.yaml
After running the second command, you should see a federation-controller
pod in the kube-system
namespace. Running the following command should display the logs:
kubectl logs -n kube-system <federation-controller-pod-name>
The logs should look something like the following:
I1011 21:00:41.448491 1 clusterns_controller.go:138] Waiting for informer caches to sync
I1011 21:00:41.448598 1 reflector.go:219] Starting reflector *v1alpha2.ClusterNS (30s) from pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167
I1011 21:00:41.448618 1 reflector.go:255] Listing and watching *v1alpha2.ClusterNS from pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167
I1011 21:00:41.549025 1 shared_informer.go:270] caches populated
I1011 21:00:41.549062 1 clusterns_controller.go:143] Starting workers
I1011 21:00:41.549091 1 clusterns_controller.go:149] Started workers
I1011 21:01:01.452612 1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1011 21:01:11.452267 1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1011 21:01:31.453082 1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1011 21:03:31.455629 1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
...
I1013 01:02:06.178662 1 reflector.go:536] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: Watch close - *v1alpha2.ClusterNS total 7 items received
I1013 01:02:14.611427 1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1013 01:02:34.311275 1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1013 01:02:42.263067 1 reflector.go:536] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: Watch close - *v1.Deployment total 12 items received
I1013 01:02:44.612985 1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
A line like the following is normal and does not indicate that an error occurred:
W1011 21:00:31.445414 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
Update the SLATE Ingress Controller
Note: You will need helm to update the SLATE Ingress controller. If you do not have helm installed, you can install it following these instructions.
Updating the SLATE Ingress Controller involves the following steps:
Download the manifest for the nginx-controller by running the following command:
wget https://raw.githubusercontent.com/slateci/slate-client-server/master/resources/nginx-ingress.yaml
- Edit the manifest and make the following changes:
- Replace all instances of
{{SLATE_NAMESPACE}}
with the namespace that slate is using on your cluster (i.eslate-system
) - If your cluster is using IPv4, replace
{{IP_FAMILY_POLICY}}
withSingleStack
and{{IP_FAMILIES}}
withIPv4
- If your cluster is using IPv6, replace
{{IP_FAMILY_POLICY}}
withSingleStack
and{{IP_FAMILIES}}
withIPv6
If your cluster is using IPv6, replace
{{IP_FAMILY_POLICY}}
withPreferDualStack
and{{IP_FAMILIES}}
with- IPv6 - IPv4
- Replace all instances of
Install the new SLATE Ingress Controller
kubectl apply -f nginx-ingress.yaml
Getting output like the following is not a cause for concern:
E1026 18:56:04.160926 8 reflector.go:140] k8s.io/client-go@v0.25.2/tools/cache/reflector.go:169: Failed to watch *v1.EndpointSlice: unknown (get endpointslices.discovery.k8s.io)