SLATE Cluster Upgrade Guide - K8s v1.24

The team has been busy preparing SLATE for Kubernetes (K8s) v1.24.x and today we are happy to announce that this work is now live. As cluster administrators there are several upgrade tasks you must perform to continue using SLATE with this new version of K8s.

Table of Contents

This post will walk you, the cluster administrator, through the following tasks:

Kubernetes Tasks

  1. Upgrade your SLATE Kubernetes Cluster from K8s v1.x to v1.24.x using kubeadm (see below)
  2. Allow pods to run on single-node clusters (see below)
  3. (Recommended) Update the Calico CNI to >= v3.24.1 (see below)
  4. (Recommended) Update MetalLB to >= v0.13.5 (see below)

SLATE Tasks

  1. Upgrade the SLATE Federation Controller roles (see below)
  2. Update the SLATE Federation Controller itself (see below)
  3. Upgrade the SLATE Ingress Controller (see below)

Kubernetes Tasks

Upgrade to K8s v1.24.x


Check the status of the cluster

Start by SSH-ing to your control plane and switching to the root user.

Configure kubectl/kubeadm and check the state of the Kubernetes nodes.

export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes

The output should resemble:

NAME                    STATUS   ROLES           AGE     VERSION
<worker>                Ready    <none>          2y68d   v1.22.1
<controlplane>          Ready    control-plane   2y68d   v1.22.1


Install and configure containerd

If you are using Docker on your cluster, you’ll need to switch the kubernetes runtime from Docker to containerd because Kubernetes removed support for Docker in v1.24.0. This guide has instructions on updating from Docker to containerd. Please note that this step in the guide needs to be done for each node in your kubernetes cluster.

After updating to containerd, one more change must be made to the service to increase the number of open files. The default value is limitNOFILE=infinity, but due to a regression, ‘infinity’ sets the limit at 65k. The following commands will increase it to 100k, which is required to run some applications such as XCache.

systemctl edit containerd

In the editor add the following line:

LimitNOFILE=1048576

Then restart the service

systemctl restart containerd


Determine the upgrade path

Best practice is to upgrade from one Kubernetes minor release to the next and so forth down the line all the way to v1.24.x. For example, if you are starting at v1.21.x the upgrade path should resemble:

  • v1.21.x –> v1.22.15
  • v1.22.15 –> v1.23.13
  • v1.23.13 –> v1.24.x

Note: The patchlevel of the minor releases may have changed since this document was written. See this page to get the latest patchlevel to use for each minor release. E.g. v1.22.16 instead of v1.22.15


Upgrade the control plane

Let’s assume that like the example above, we are beginning with Kubernetes v1.21.x. Install the related packages for Kubernetes v1.22.15, making sure the kubernetes YUM repo is enabled.

yum update --enablerepo=kubernetes kubelet-1.22.15 kubeadm-1.22.15 kubectl-1.22.15

Check that your cluster can be upgraded from v1.21.x –> v1.22.15.

kubeadm upgrade plan

If there aren’t any issues proceed with the upgrade.

kubeadm upgrade apply v1.22.15

The output should resemble:

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.22.15". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

Restart kubelet and check its status.

systemctl daemon-reload && \
systemctl restart kubelet && \
systemctl status kubelet 


Upgrade the worker nodes one at a time

Leaving a terminal connected to the control plane, SSH to your first worker node in a fresh terminal, and switch to the root user.

Install the related packages for Kubernetes v1.22.15, making sure the kubernetes YUM repo is enabled.

yum update --enablerepo=kubernetes kubelet-1.22.15 kubeadm-1.22.15 kubectl-1.22.15

Back in the control plane terminal window apply the upgrade and prepare the worker node for maintenance.

kubeadm upgrade node && \
kubectl drain <workernode1> --ignore-daemonsets

In the worker node terminal window restart kubelet and check its status.

systemctl daemon-reload && \
systemctl restart kubelet && \
systemctl status kubelet 

If everything looks good, finish up by uncordoning the node in the control plane terminal window.

kubectl uncordon <workernode1>

Log out of your worker node terminal window and rinse-repeat for your remaining worker nodes.


Verify the status of the cluster

Now that the kubelet has been upgraded on the control plane and worker nodes, once more SSH to your control plane and switch to the root user.

Configure kubectl/kubeadm and check the state of the Kubernetes nodes.

export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes

The output should resemble:

NAME                    STATUS   ROLES           AGE     VERSION
<worker>                Ready    <none>          2y68d   v1.22.15 
<controlplane>          Ready    control-plane   2y68d   v1.22.15 

If everything was successful the control plane and workers should all report as v1.22.15.


Next steps: v1.22.15 to v1.23.12

At this point in the example your cluster should be running v1.22.15. Repeat the steps described above to upgrade from v1.22.15 to v1.23.12:

  • Adjusting the K8s versions described in the commands accordingly.


Next steps: v1.23.12 to v1.24.x

At this point in the example your cluster should be running v1.23.12. Repeat the steps described above to upgrade from v1.23.12 to v1.24.x:

  • Adjusting the K8s versions described in the commands accordingly.
  • Removing the --network-plugin option from /var/lib/kubelet/kubeadm-flags.env before restarting each of the kubelets.


Additional information

See the Kubernetes documentation for complete instructions on updating a Kubernetes cluster from v1.x to v1.24.x using kubeadm.


Single-Node Clusters

By default, Kubernetes prevents pods from running on the Control-Plane/Master node. Running a single-node cluster requires removing this setting so that Kubernetes has the resources to run pods. If you a running a multi-node cluster, this step is not necessary.

kubectl taint nodes --all node-role.kubernetes.io/master:NoSchedule-


Run this command to get the version of Calico CNI currenlty installed:

 kubectl describe pod -n `kubectl get pods -A | grep calico | grep controller | awk '{print $1" "$2}'` | grep Image: | awk -F: '{print $3}'


If the version is < v3.24.1, update the Calico CNI to >= v3.24.1.

  • If you followed our Manual Cluster Installation instructions when initially setting up your cluster, use the example below to update your Tigera operators and custom resources files.
  • If you chose a different route for initially installing and configuring Calico, please refer directly to the Calico documentation for update procedures.


Example

Install a newer version of Calico using the operator:

CALICO_VERSION=3.24.1 && \
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v${CALICO_VERSION}/manifests/tigera-operator.yaml && \
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v${CALICO_VERSION}/manifests/custom-resources.yaml

For more information on updating Calico see Upgrade Calico on Kubernetes.


Once Calico is updated, you can verify it is working with the following commands:

kubectl run pingtest --image=busybox -it /bin/sh
ping google.com

This shows that the DNS is working and is a good indication that Calico is working.


Run this command to get the version of MetalLB currently installed:

kubectl describe pod -n `kubectl get pods -A | grep metal | grep controller | awk '{print $1" "$2}'` | grep Image: | awk -F: '{print $3}'


If the version is < v0.13.5, update MetalLB to >= v0.13.5.

  • If you followed our Manual Cluster Installation instructions when initially setting up your cluster, use the example below to update your MetalLB installation.
  • If you chose a different route for initially installing and configuring MetalLB, please refer directly to the MetalLB documentation for update procedures.


Example

Install a newer version of MetalLB using the new native manifest:

METALLB_VERSION=0.13.5 && \
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v${METALLB_VERSION}/config/manifests/metallb-native.yaml

If you are updating from a version of MetalLB that uses ConfigMaps, gather the current address pool information by executing the following:

kubectl describe configmap config -n metallb-system

Create a new custom resource (CR) with the gathered IP pool information. Replace the IP addresses in this example with the IP addresses from the previous command.

cat <<EOF > /tmp/metallb-ipaddrpool.yml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.9.1-192.168.9.5
EOF
kubectl create -f /tmp/metallb-ipaddrpool.yml

Then create a Layer 2 advertisement for the first-pool address pool:

cat <<EOF > /tmp/metallb-ipaddrpool-advert.yml
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: example
  namespace: metallb-system
spec:
  ipAddressPools:
  - first-pool
EOF
kubectl create -f /tmp/metallb-ipaddrpool-advert.yml

Finally, remove the deprecated ConfigMap:

kubectl delete configmap config -n metallb-system

For more information on updating MetalLB see Installation By Manifest.


SLATE Tasks

Update the SLATE Federation Controller Role

Update the role using the following command:

kubectl apply -f https://raw.githubusercontent.com/slateci/federation-controller/main/resources/installation/federation-role.yaml

Upon execution, kubectl should update the controller and output something similar to:

clusterrole.rbac.authorization.k8s.io/federation-cluster configured
clusterrole.rbac.authorization.k8s.io/federation-cluster-global unchanged


Update the SLATE Federation Controller

Updating the federation controller is a two-step process.

  1. The old nrp-controller deployment needs to be deleted by running:

    kubectl -n kube-system delete deployment nrp-controller 
    
  2. The new controller deployment needs to be installed by running:

    kubectl apply -f https://raw.githubusercontent.com/slateci/federation-controller/main/resources/installation/upgrade-controller-debug.yaml
    

After running the second command, you should see a federation-controller pod in the kube-system namespace. Running the following command should display the logs:

kubectl logs -n kube-system <federation-controller-pod-name>

The logs should look something like the following:

I1011 21:00:41.448491       1 clusterns_controller.go:138] Waiting for informer caches to sync
I1011 21:00:41.448598       1 reflector.go:219] Starting reflector *v1alpha2.ClusterNS (30s) from pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167
I1011 21:00:41.448618       1 reflector.go:255] Listing and watching *v1alpha2.ClusterNS from pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167
I1011 21:00:41.549025       1 shared_informer.go:270] caches populated
I1011 21:00:41.549062       1 clusterns_controller.go:143] Starting workers
I1011 21:00:41.549091       1 clusterns_controller.go:149] Started workers
I1011 21:01:01.452612       1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1011 21:01:11.452267       1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1011 21:01:31.453082       1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1011 21:03:31.455629       1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
...
I1013 01:02:06.178662       1 reflector.go:536] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: Watch close - *v1alpha2.ClusterNS total 7 items received
I1013 01:02:14.611427       1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1013 01:02:34.311275       1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync
I1013 01:02:42.263067       1 reflector.go:536] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: Watch close - *v1.Deployment total 12 items received
I1013 01:02:44.612985       1 reflector.go:382] pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: forcing resync

A line like the following is normal and does not indicate that an error occurred:

W1011 21:00:31.445414       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.


Update the SLATE Ingress Controller

Note: You will need helm to update the SLATE Ingress controller. If you do not have helm installed, you can install it following these instructions.

Updating the SLATE Ingress Controller involves the following steps:

  1. Download the manifest for the nginx-controller by running the following command:

    wget https://raw.githubusercontent.com/slateci/slate-client-server/master/resources/nginx-ingress.yaml
    
    
  2. Edit the manifest and make the following changes:
    1. Replace all instances of {{SLATE_NAMESPACE}} with the namespace that slate is using on your cluster (i.e slate-system)
    2. If your cluster is using IPv4, replace {{IP_FAMILY_POLICY}} with SingleStack and {{IP_FAMILIES}} with IPv4
    3. If your cluster is using IPv6, replace {{IP_FAMILY_POLICY}} with SingleStack and {{IP_FAMILIES}} with IPv6
    4. If your cluster is using IPv6, replace {{IP_FAMILY_POLICY}} with PreferDualStack and {{IP_FAMILIES}} with

        - IPv6
        - IPv4
      
      
  3. Install the new SLATE Ingress Controller

    kubectl apply -f nginx-ingress.yaml
    

Getting output like the following is not a cause for concern:

E1026 18:56:04.160926       8 reflector.go:140] k8s.io/client-go@v0.25.2/tools/cache/reflector.go:169: Failed to watch *v1.EndpointSlice: unknown (get endpointslices.discovery.k8s.io)
The SLATE Team