Prepull

Related to Project Nodes, there is also a “prepull” service.

It solves the issue of users facing a project in a “Pending” state for too long. This happens because the images of the project pods are very large and take some time to load on a new node.

The basic idea is to initially configure new project nodes via Taints to not be able to run projects. Prepulls loads the large project image first, before any project pod can be scheduled on a new pod. When it was successful, it does a quick check and changes the taint of the node it runs on, such that project pods can be scheduled on that node. This in turn removes itself, because of the taint configuration. Projects will now start quickly, because the large project image is already loaded.

When there is an update to the project image (new tag in manage.project.tag), the labels and taints of project nodes are reset, because of a post update Deployment Hook (which in turn runs manage/templates/prepull-update-script.yaml …).

The prepull service will then pull the new project image and once done, allows projects to schedule.

Projects that were already running before the updated are not affected. You can get a sense about what image they run by checking their project_tag label (or even delete old projects via kubectl delete pod -l run=project,project_tag=<old-tag> in order to get rid of these pods, which then allows kubelet to remove those old Docker images and avoid running into disk pressure issues).

Note

The prepull service needs cluster-wide permissions, because it must be able to modify the labels and taints of the nodes. Feel free read through cocalc/charts/manage/templates/prepull-update-script.yaml and cocalc/charts/manage/prepull.py in case you want to know what it does – it’s pretty simple, but since it has cluster-wide permissions, you might want to audit it.