Equipment/Blanton/Kubernetes
Intro
This page provides a bit of a braindump on the Kubernetes setup.
There is (currently) one master node on Blanton. I have thought about adding one on Landin, but it would require careful thought, since an even number of masters is generally discouraged since it can lead to split brain scenarios. Ideally we'd have a third machine to keep the number of masters odd.
Node | Role | Location | Notes |
---|---|---|---|
kube-master | Master | Blanton | 4 cores, 4GB of RAM |
kube-node-blanton | Worker | Blanton | 8 cores, 8GB of RAM |
kube-node-landin | Worker | Landin | 8 cores, 8GB of RAM |
kube-node | Worker | Blanton | 8 cores, 8GB of RAM Older worker, mostly providing redundant capacity when one of the others are down for maintainence |
As of writing, all of our containerised services can run on one of the nodes.
General Notes
I did try doing something with docker-compose, but the networking got unwealdy fast, and I realised I was about to create something not unlike Kubernetes but badly in a bunch of scripts! A big sticking point of what took so long to get this working was the dual stack IPv4 and IPv6 support needed to fit into the rest of the hackspace environment,
A few quick notes:
- Networking is provided by Calico
- LoadBalancer requests are serviced by metallb
- If you want both IPv4 and IPv6 you will need to create two LoadBalancer instances pointing to the same service
- nginx-ingress is configured to support HTTP/HTTPS services
- cert-manager is configured to issue LetsEncrypt certificates automatically, assuming DNS entries are already in place (as would be needed for a regular VM wanting a cert)
- Mark your ingress with the annotation cert-manager.io/cluster-issuer: "letsencrypt-prod"
- there's a single-node glusterfs "cluster" providing storage
- While it's all currently on Blanton, if there was another box (or ideally two) available, it would be possible to make this much more resilient
- It's running a bleeding edge version of cert-issuer and ingress-nginx because I updated to 1.22 before things were ready :-)
MetalLB is configured to allocate IP addresses in the ranges 10.0.21.128/25 and 2a00:1d40:1843:182:f000::/68 - it uses layer 2 ARP to advertise these on the LAN.
Accessing the cluster
Currently (useful) access is limited to those in the Admins ldap group. After a bit more testing though, I'd like to create a members namespace for members to play in.
First, go get kubectl (preferably version 1.22 which, at the time of writing this, is the version the cluster runs) https://kubernetes.io/docs/tasks/tools/
Now go get the binary for your OS from https://github.com/londonhackspace/kube-auth-handler/releases/ and put it somewhere on your PATH as ldap-kube-auth
(or on Windows ldap-kube-auth.exe
- you might just want to put it next to kubectl and run it from that directory, at least to get going)
Other filenames and paths would work, but you'll need to modify the config file below in that case.
Now stick the following in $HOME/.kube/config (if you already have a config in there, you'll need to merge them, or put it somewhere else and use the KUBE_CONFIG environment variable)
apiVersion: v1 clusters: - cluster: certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1ekNDQWMrZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeE1EVXdOVEUyTXpFeE5sb1hEVE14TURVd016RTJNekV4Tmxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTnZQClVia1htT21JcnRiSHhZZHdBd3ZIaUoxVmhORUFkMXpiNlVKdXI5MFJSNHJ1VTlIRjZIaWpVd3Z3YzQ4MGh0K1UKY3F0dlZ6TG55L29OWTBNM01DQkxTRU90VndiMlQ0RXB3SUtVQ3drdENOcXV0MTA4eGhvRjlFOUp5NnhPWG9FMQp6c0xMeWQwcm1WNzFmU1NsUGg1RG9SRWoxSnNUZFdKQitJL21MNDhDd2k2aENEaXdnU25ZeFBEWitCaEFvTkVNCkxiMm1rdDVFdXVXUmVhME40bGl0cThpWktYZy9VcEh2aUJvaGxWLzMvemJWWnZnY3RyTlBrcWJvZDdUM2pYa1oKZnNwR1VjU1g1aVRKMUJpUHowZ0dlNEJ0SndvT2tVdGNjRUk0M0o1aXhQMi9hWVYwcURCd1B3MWpmVUFDaGZ5RAptbkdUVUVCSUNzYWVnY3UweFVVQ0F3RUFBYU5DTUVBd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZGNlZkVFFoOWVGdzUzMWEydEh4S3ZmWkxqMDZNQTBHQ1NxR1NJYjMKRFFFQkN3VUFBNElCQVFDWVpBcDRqdWl5MjdoK041aXl5RVczNFYzMExRc2pxOHpWdlF5OHFORVJtUS9IRDVnQQpQQkFlWFpPdXZHaklHcVB2TS9BeWZYempPL25MK09aS1Vvay91eWFVUkxHYU1rSzBqVzVMb3ZOU28zVFMwTEU4CnJBQlRjeUdLYW40Tk9zN2FoSWM4SGdxa2lNOUR2cVhiRGlwbC9pbG95VnJCRDh3bzZ3RVBPSG1tUGZJdUJZYnUKUzNZY0JXRW90bVpSNWlxTWxWdDRIUXR0MUYzMmZyS21lcmFPNFdUM2hIMkRWNTQ3enlSYmF4cktOZ1VJL0FJQgoycG53dktYNlUyNFM0Z0lUbW8zTUZzNDNoTzk4NDU2YXYrVExrTkNrYVhCVktMaUtMU2NZMXJnQlFzN0JvSHp4CnR1aU5nQ241dnVLRTBWU0diK2FhTTgzdVl6Q001TjVuVm9xbAotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== server: https://kube-master.lan.london.hackspace.org.uk:6443 name: lhs contexts: - context: cluster: lhs namespace: default user: lhsuser name: lhs current-context: lhs kind: Config preferences: {} users: - name: lhsuser user: exec: command: ldap-kube-auth env: - name: AUTH_CLUSTER value: LHS - name: AUTH_URL value: https://kube-auth.lan.london.hackspace.org.uk/getToken interactiveMode: Always apiVersion: "client.authentication.k8s.io/v1"
now you should be able to run commands:
$ kubectl get pods kubectl LDAP login helper Logging into LHS Press enter for defaults Username (michael): mich181189 Password: NAME READY STATUS RESTARTS AGE dockerreg-8568799fdb-vwtgg 1/1 Running 0 173m
The slightly gory details
Kubernetes does not natively support LDAP authentication, so we use webhook tokens (https://kubernetes.io/docs/reference/access-authn-authz/authentication/#webhook-token-authentication). To make this easier to use, there is an credential plugin (https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins)
The sources for the server and client implementations of this are on Github: https://github.com/londonhackspace/kube-auth-handler
Tokens are valid for 6 hours, and get cached in $HOME/.kube/LHS_cache (assuming the above config is used) - delete this file if you think the cache is causing you problems. The caching is very necessary otherwise you get a prompt (or sometimes multiple prompts!) per command. Server-side, while JWTs might be popular, this service just generates random strings and stashes them in redis for later lookup
Kubernetes is configured to check these tokens against a web service running in the cluster. This is done with the following chunk of kubeadm-config
apiVersion: v1 data: ClusterConfiguration: | apiServer: extraArgs: authentication-token-webhook-config-file: /etc/kubernetes/auth/ldap-auth.yaml authentication-token-webhook-version: v1
and a patch passed to kubeadm to get it to mount /etc/kubernetes/auth into the apiserver container. If this patch is not used, it will cause the API server to refuse to start! (see the upgrade instructions below for how to specify the patch directory to kubeadm upgrade)
The server deployment is in the (private) github repo kubernetes-config - this is just a fairly standard kubernetes service deployment.
Instructions Braindump For Admins
Adding a node
Kubernetes mostly requires a basic OS install, but there are a few steps you need to make sure you do correctly.
A key point here is that until recently, Kubernetes didn’t support nodes with swap enabled. These instructions therefore do not have swap. (I’m still not entirely convinced it’s a good idea!)
- Install latest Debian on a VM (without swap) with SSH and standard system utilities
- Stick your SSH key into /root/.ssh/authorized_keys and /home/<you>/.ssh/authorized_keys
- Add to the lhs-hosts section of Ansible (all nodes starting with kube-* get some basic kubernetes requirements installed) and deploy to it
- On the master, run
kubeadm token create
to get a token - On the master run to get the cert hash:
- openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
- openssl dgst -sha256 -hex | sed 's/^.* //'
- On the new node, run
kubeadm join --token <token> kube-master.lan.london.hackspace.org.uk:6443 --discovery-token-ca-cert-hash sha256:<hash>
Draining a node for mantainence
It's a good idea to shift work off a node when you're about do to anything to it (upgrade, reboot, e.t.c)
- run
kubectl drain <node> --ignore-daemonsets
when you're done with the maintainence:
- run
kubectl uncordon <node>
You might want to delete some pods that are running on remaining nodes so they get restarted more evenly spread across the nodes. Alternatively, you might want to just wait for usual updates and stuff to restart pods
Removing a node
- Drain the node
kubectl drain <node>
- Delete the node record from Kubernetes
kubectl delete node <node>
- Probably delete the VM or something - it's done now
Upgrading Kubernetes
Full instructions here: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ READ THEM!
It’s perhaps worth ignoring the 1.x.0 releases, since experience suggests things like metallb and callico might not yet support it in a stable version, which is a recipe for pain.
You probably also want to upgrade cluster services to make sure they're compatible with the newer version:
- Calico is a little tricky: because we have IPv4 and IPv6 you need to merge some of the changes into the manifest (see https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico-with-kubernetes-api-datastore-50-nodes-or-less) Specifically, FELIX_IPV6SUPPORT needs to be true!
- MetalLB is easier because it (sensibly) doesn't mix config in with the deployment manifest so you can deploy straight from https://metallb.universe.tf/installation/
- cert-manager is also fairly easy, from https://cert-manager.io/docs/installation/upgrading/ (use the static manifest instructions)
- ingress-nginx is an easy one as well: https://kubernetes.github.io/ingress-nginx/deploy/upgrade/ (though they sometimes seem to change permissions so you might need to reference the cloud install manifest - cloud install not bare-metal because we're running metallb)
You might want to scale the acnode-dash-status deployment to zero so it doesn't throw up false errors if the network gets flaky while you're doing this
- On the master node, run
apt-cache madison kubeadm
to find a version to update to - On the master node, run:
sudo apt-get install -y --allow-change-held-packages kubeadm=<your-chosen-version>
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply --patches=/etc/kubernetes/patches v<your-chosen-version>
sudo apt-get install -y --allow-change-held-packages kubelet=<your-chosen-version> kubectl=<your-chosen-version>
sudo apt-mark hold kubelet kubeadm kubectl
to make sure the packages are again pinned
On each node:
- Drain the node
- Run:
sudo apt-get install -y --allow-change-held-packages kubeadm=<your-chosen-version>
sudo kubeadm upgrade node
sudo apt-get install -y --allow-change-held-packages kubelet=<your-chosen-version> kubectl=<your-chosen-version>
sudo apt-mark hold kubelet kubeadm kubectl
to make sure the packages are again pinned
- run
kubectl get nodes
wherever you normally run kubectl to make sure the node is running the expected version - Uncordon node
Fixing Screwups
Re-adding a node you removed by mistake
If you accidentaly run
kubectl delete node <node>
when you didn't mean to, don't panic - the workload should be shifted automatically to
a remaining node. here's how to re-add the one you just removed:
- Run
kubeadm reset
on the affected node - Run
kubeadm join
as if it was a new node
ETCD Maintainance
It seems the ETCD database can grow very, very large, which can cause startup to take many minutes, making many things unhappy.
Running this on the master can help:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key get revisiontestkey -w json {"header":{"cluster_id":12152285089840826538,"member_id":17687274478532125122,"revision":45368346,"raft_term":10}}
take the revision from above, subtract one, then run:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key compact 45368345 ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key defrag