Prepull¶
Related to Project Nodes, there is also a “prepull” service.
It solves the issue of users facing a project in a “Pending” state for too long. This happens because the images of the project pods are very large and take some time to load on a new node.
The basic idea is to initially configure new project nodes via Taints to not be able to run projects. Prepulls loads the large project image first, before any project pod can be scheduled on a new pod. When it was successful, it does a quick check and changes the taint of the node it runs on, such that project pods can be scheduled on that node. This in turn removes itself, because of the taint configuration. Projects will now start quickly, because the large project image is already loaded.
When there is an update to the project image (new tag in
manage.project.tag), the labels and taints of project nodes are
reset, because of a post update Deployment Hook (which in turn runs
manage/templates/prepull-update-script.yaml …).
The prepull service will then pull the new project image and once done, allows projects to schedule.
Projects that were already running before the updated are not affected.
You can get a sense about what image they run by checking their
project_tag label (or even delete old projects via
kubectl delete pod -l run=project,project_tag=<old-tag> in order to get
rid of these pods, which then allows kubelet to remove those old Docker
images and avoid running into disk pressure issues).
Note
The prepull service needs cluster-wide permissions,
because it must be able to modify the labels and taints of the nodes.
Feel free read through
cocalc/charts/manage/templates/prepull-update-script.yaml and
cocalc/charts/manage/prepull.py in case you want to know what it
does – it’s pretty simple, but since it has cluster-wide permissions,
you might want to audit it.