Google GCP/GKE

Note

As of 2023-10-17, there is currently no out-of-the-box support for an GKE Cluster. The following are notes based on the experience of setting everything up. This certainly assumes you have experience with Google Compute Platform and Kubernetes. Some details could be out of date, but the general idea should still be valid.

There is also a guide for setting up CoCalc OnPrem on Amazon AWS/EKS.

Prerequisites

  • You should have a basic understanding of cloud computing, in particular the Google Compute Platform (GCP) and Kubernetes (K8S). But don’t worry, for setting up CoCalc you do not need to be an expert!

  • This guide is specific for an GKE Cluster. If you want to use another cloud provider for your Kubernetes cluster, you have to adapt the instructions.

  • 2022-05: Starting with K8S 1.22 you need the google-cloud-sdk-gke-gcloud-auth-plugin plugin for authentication. More info, and don’t forget to set export USE_GKE_GCLOUD_AUTH_PLUGIN=True in your ~/.bashrc.

  • If you’re not an owner or admin of the GCP project, you need a couple of “Admin” roles for your user. What exactly is hard to tell, and probably changes over time. What you need on top of just a Basic “Editor” is certainly: “Compute Admin”, “Compute Network Admin”, and “Compute Storage Admin”. Please talk with the owner of the GCP project to assign those roles to your “Editor” user.

Setup

Note

All settings are mainly recommendations – feel free to look into the them in more detail, adjust them to your needs, etc. If something is actually required, it is explicitly mentioned. Often, you can change the settings later on as well.

The specific parameters are meant to get started with a small cluster. You can scale up later by changing the nodes types to be larger and CoCalc’s configuration parameters for the HELM charts.

Let’s start. In the GCP Console under Kubernetes, you can create a new cluster:

  • Name: e.g. cocalc-1 (you can come up with whatever name you want)

  • Release channel: regular version track (not static)

  • As of writing this in 2023-02-26, the version is 1.24.9-gke.3200

  • Location: e.g. region europe-west3 and specifying europe-west3-b as node location. With that, all nodes will be in the same place.

  • Automation: maintenance window on Saturday + Sunday, starting at 00:00 for 6 hours.

Aspect

Value

Description

Cluster

cocalc-1

(you can come up with whatever name you ant)

Location type

zonal

Zonal (free tier) or Regional (costs money)

Release channel

regular

regular version track (not static)

Version

1.26.xx

As of Oct 2023, this version should work fine

Location

europe-west3

(whatever suites you best)

Region

europe-west3-b

With that, all nodes will be in the same place.

Automation

maintenance window on Saturday + Sunday, starting at 00:00 for 6 hours.

Networking

Default

or you know what you do …

HTTP Load Balancing

enabled

Dataplane V2

enabled

This is mainly used to limit the networking access from within a project to the other parts of the cluster.

DNS

kube-dns

Shielded GKE nodes

enabled

PD CSI Driver

enabled

Persistent disk CSI Driver

Image streaming

disabled

See notes below…

Logging

System

Less logging save money

Cloud Monitoring

System

Less data save money

Networking

  • Default, public cluster. Of course, if you know what you’re doing, you can also set up a private cluster and use a VPN or something like that. This is beyond the scope of this guide.

  • HTTP Load Balancing: enabled

  • Dataplane V2: enabled (this will take the network configuration files into account)

  • DNS: the default kube-dns is fine – unless you want to access internal services, then maybe you want to run Cloud DNS. (2023-10: IIRC something has changed related to that setting)

Security

  • Shielded GKE nodes: yes

Features

  • Enable Compute Engine persistent disk CSI Driver

  • Disable image streaming: I tried it when enabled, but maybe because those images are so large or other reasons, it didn’t really work. Rather, make sure to configure the Prepull service. Also, this image streaming service will occupy some amount of memory. I think it’s better to use it for projects and disk caching, but YMMV.

  • Logging: yes, but only “System”. In particular, projects and hubs generate a lot of log lines, which end up becoming expensive.

  • Cloud monitoring: yes, but only “System” (both cost money, so, just being conservative here)

Node Pools

Now we add two node pools. That’s where CoCalc will be actually active. Two nodes pools are not strictly necessary, but it makes it easier to scale up and down. In particular, projects will run in one pool only, while all services are in the other pool.

The pool for services is of fixed size (e.g. 2 nodes), while the other pool of variable size is for the projects. Please read Architecture and Scaling for more details about this. There is also room to deviate from exactly these settings – they are listed to give you an idea of what is necessary.

Pool: Service

Parameter

Value

Name

“services-1” (if you change parameters, which you can’t edit, create a new pool and increment that number)

Size

2 (1 is not enough, unless you allow some services to run on project nodes as well, and also makes it more robust)

Surge update

max=1

Image type

container optimized

Type

at least e2-standard-2, must be x86/64 architecture

Disk

min. 50GB standard

Security

secure boot (others leave as they are)

Metadata

Kubernetes Label: cocalc-role=services (that’s key=value)

Pool: Projects

Parameter

Value

Name

“projects-1”

Size

1 (scale it up later)

Surge update

max=2 (temporarily more nodes)

Image type

container optimized

Machine type

e2-highmem-4, must be x86/64 architecture. This of course depends on what you really want to do. A “standard” project uses maybe around 0.5 gb ram and only a little bit of CPU (1/10 on average). Hence, you usually need more memory than CPU. If you know it will be CPU intensive, consider c2-standard-* machines!

Disk

100GB balanced disk The project images are huge, and having a faster disk speeds up downloading the image on a new node, and running programs in general. The optional Prepull service loads the latest project image first, before the node is set to be available for projects.

Security

secure boot (others leave as they are)

Spot VM

if you understand and can tolerate that spot VMs get randomly rebooted, and hence interrupt a running project, enable this – saves you a lot of money!

Kubernetes Label

cocalc-role=projects

Kubernetes Taints

To make the prepull service work, set these taints:

Key

Value

Effect

cocalc-projects-init

false

NoExecute

cocalc-projects

init

NoSchedule

Warning

Those are Kubernetes Labels and Taints – not to be confused with the GCP labels (just called “Labels”!

Database: Cloud SQL

CoCalc requires a PostgreSQL database. We use a Cloud SQL instance for that. If you know what you’re doing, you can run the DB in the cluster yourself – there is nothing in particular special about using Cloud SQL.

Attribute

Value

Description

Name

cocalc-db

(choose whatever you want)

Database version

PostgreSQL 14

Region

same as the cluster

High availability

yes

probably a good idea, you can change this later

Machine

start small

shared core, 1 vCPU, ~0.6 gb ram (or ~1.5gb) Of course, check monitoring and adjust as needed!

Storage

SSD, 10gb

automatic storage increases

Network

private IP

disable public IP

Just costs money, less secure

Backup

yes

opt-in if you like, start small

Maintenance window

Sunday, 4-5 am

YMMV

Flags

max_connections: 100

Note

  • Storage: Keep in mind that the database stores all changes to documents. Therefore, the size increases with user activity. Said that, you probably won’t see the database to grow beyond a GB anytime soon.

  • Network: Had to enable service networking API (which requires to have the “Network Admin” role) Selected to automatically allocate an IP range

  • Database: to access the DB, run the ../database/db-shell.sh script – see Database.

  • Backup: - midnight at 4am - region: same as the database - 7 days of backup - Point in time recovery: 1 day

  • Maintenance window: Should be fine to set something, which is at night during the weekend: e.g. Sunday, 4-5 am.

  • Flags: With the default (due to low memory I guess) there weren’t enough slots. Errors were like: remaining connection slots are reserved for non-replication superuser connections

I guess you can change almost everything of the above later on as well.

Post setup

  1. Create user “cocalc” (or whatever you want) with a password. Save the password somewhere; we’ll later add it as a secret to the kubernetes cluster.

  2. Create database “cocalc” (or whatever you want)

Storage

We continue setting up the Cluster. So far, we have the “control plane” in GKE and some nodes. Now, we need to setup the storage.

  • Above in features, we enabled “Enable Compute Engine Persistent Disk CSI Driver” more info

  • The config files here will use this to setup suitable PVCs and Storage Classes.

  • The names of these PVC must match the references in the CoCalc deployment.

Run the following command to setup the storage classes:

kubectl apply -f pd-classes.yaml

NFS Server

The goal is to setup an NFS storage provisioner, which uses the PVC “nfs-data” to store the data of projects, shared files and global data/software.

helm repo add nfs https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
helm repo update
helm search repo nfs

you should see nfs/nfs-server-provisioner in the output.

Look into the gke/ subdirectory for more details. In particular, check what nfs.yaml specifies. It will create a disk storing the data of all projects and shared files with the specified storageClass – they were defined in the previous step. You maybe have to tune the config file to your needs!

helm upgrade --install nfs nfs/nfs-server-provisioner -f gke/nfs.yaml

NOTE: as of writing this, there was a problem with publishing newer docker images. Hence according to this ticket I had to add --version=1.5.0 for an older variant of that chart. This problem has been resolved.

Now:

kubectl get storageclasses

should list nfs.

Ref:

Note: completely independent of the above, you can use other storage solutions as well. For that, you have to create PVCs yourself, which will must expose a ReadWriteMany filesystem. In the CoCalc deployment, you have to configure the names of these PVCs under global: {storage: {...}} and disable creating them automatically storage: {create: false}. See ../cocalc/values.yaml for more information.

Disk Backup

Once you deployed the NFS server, you’ll notice a new disk. They’re listed in “Compute Engine” → “Disks”.

The simplest way to get some backup is to setup a Snapshot Schedule. With that, GCP will make consistent snapshots of the disk, which you can restore from – or create a new disk from an older snapshot.

For that, go to “Compute Engine” → “Disks” → “pvc-(the uuid you see in kubectl get pv)” → “Edit” → “Create snapshot schedule”. Daily for two weeks sounds good.

BTW, that’s also the place where you can increase the disk size.

Next steps

The next steps are to setup NGINX ingress + NodeBalancer. So, continue in the /ingress-nginx and /letsencrypt subdirectories.

You also have to setup the credentials for pulling from the private docker registry.

Once all this is done, you can configure and deploy the HELM Chart for CoCalc.

Testing

  • First steps:

    • After the initial deployment, set the IP you see in the LoadBalancer (kubectl get svc → look for LoadBalancer with an external IP) at your DNS provider.

    • Then try to open https://[cocalc-your-domain.tld]/ in your browser.

    • You should be able to sign in directly as Admin, with the credentials set in your my-values.yaml config file. Of course, you should change your password.

  • Functionality:

    • A good test is to create a new project, and then open a terminal and run htop. You should see a script starting the project hub, a little bit of CPU activity, and not much more – maybe the sshd server for connecting via the SSH gateway.

    • Next, create some Jupyter Notebooks (Python3, R, …), create a LaTeX latex.tex file, and maybe some other files. Each one of these should work as expected.

    • Finally, explore your “Admin” panel, and see if the “Server Settings” are as expected. At the bottom you can test the email setup, by sending a password reset email.

    • As Admin, you can also create a file like data.cocalc-crm, which will allow you to look at various database tables, tie user activity to projects, etc.

Cost Control

The above cluster + associated services and resources incur costs. You can check up on that by going to: “Billing” (your billing account of your project) → Cost Management: “Reports”

  • You can see a daily graph of your usage, use the top-right above the chart drop-down to switch to “daily cumulative” to see a trend for the current billing period (for me, it’s a month).

  • On the right hand side, you can get more details by selecting “SKU” in the “Group by” selector. (“stock keeping unit” is the smallest part GCP is selling to you)

In the table below, click on “Cost ↓” to see them sorted in a decreasing way, or “Subtotal ↓”, after applying discounts & co.

  • What you should see is that the cluster itself costs something, but you get a credit for one in a single zone (not region). See notes here:

    The GKE free tier provides $74.40 in monthly credits per billing account that are applied to zonal and Autopilot clusters. If you only use a single Zonal or Autopilot cluster, this credit will at least cover the complete cost of that cluster each month.

  • The LoadBalancer + external IP address also costs a rather fixed amount per month.

  • Logging costs proportionally to the data, hence we did disable everything except “System”.

  • If you use GCP’s “SQL” for running the PostgreSQL database, don’t use an external IP, since that would also cost you a fee to rent it.

  • The bulk of your cost are CPUs + Memory, though. See notes about “Spot VM” above for running the CoCalc projects on these.

  • Disk storage is rather cheap.

  • Egress Network traffic is the last item to think about. e.g. if your users watch a lot of videos by streaming them from the platform, you might end up getting charged significantly.

Monitoring / Uptime Check

The “uptime check” in GCP periodically pings your page.

Price: It has a free quota, hence we dial it down a bit to stay below it. Make sure to read about it’s pricing. E.g. 31 days, 3 ping locations every 5 minutes are: 31 * 24 * (60 / 5) * 3 = 26784. Well below the 1M free quota, as of writing this.

To get started very simple, you can setup something like that.:

  1. Open /monitoring/uptime/create in the GCP console to create a new uptime check

  2. Target:

    • HTTPS

    • URL (to check from the “outside” if everything is ok)

    • Hostname: the “DNS” entry

    • Path: keep it blank for “/” (i.e. “hub-next”). Other interesting targets are /stats (hub-websocket) or /static/app.html (static).

    • Check frequency: 5 minutes (that’s the 60/5 in the calculation above)

    • Expand the target options:

    • Regions: Just pick 3, not all of them.

    • GET Method on Port 443 & Validate SSL certificate!

  3. Validation:

    • Timeout 10s (or maybe better 30s, i.e. something is going on, but not a real issue yet?)

    • Content matching: here you need to get creative. Maybe check for a small string in the content, e.g. the custom name of our instance, or <html> for static.

    • No logging (just adds up the logging quota, I guess)

    • Response code 2xx

  4. Alerts:

    • Name: “[your instance name] is down”

    • Duration: 5 Minutes (?)

    • Notification: here, you have to select how to notify, there is a whole setup behind this. At minimal, it should send you an email.

At the very end is a “Test” button. Check that it actually says that the page is up, before arming it :-) The first time around it might take a bit longer to respond, subsequent tests should be quicker – next.js warmed up.

Then click “create”, of course. All of the above can be changed later as well…

Note

Since the above just checks paths at certain domains, you can setup the same at another service as well.

GPU Nodes

These are just quick notes how to add a pool of GPU nodes to the cluster, managed by GKE. All node specific settings are essentially optional. The settings below are just what I selected to add one small node with a T4 GPU and up to 4 projects are able to share it using time-sharing. The kubernetes related settings are essentially the same as above, and need to be the same. Also, note, that GKE will add additional taints to these nodes!

  1. Create a node pool gpu-1, of size “1”

  2. Select as node type “GPU”, then “NVIDIA T4”, and the VM is n1-highmem-4.

  3. Select “Time-sharing” and “4”

  4. GPU Driver: “user managed”. I tried Google managed, but it didn’t work.

  5. Balanced disk, 128GB

  6. Local SSD Disk: “0” (instead of 2!)

  7. Security: click on “Enable secure boot”

  8. Metadata: same as above

  • Kubernetes label: cocalc-role=projects

  • Taints: (see note below, gpu-operator somehow ignores tainted nodes, even with taint toleration – bug!?)

    • NoExecute:cocalc-projects-init=false

    • NoSchedule:cocalc-projects=init

  1. Due to “manually managed GPU driver”, continue at feedbackManage the GPU Stack with the NVIDIA GPU Operator on Google Kubernetes Engine (GKE).

WARNING: I don’t know why, but adding taint tolerations should work, but it does not. So, the workaround is to simply not taint these GPU nodes – they’ll have that "nvidia.com/gpu" taint, though.

Check what the documentation says, and add these values for taint tolerations from a file gpu-values.yaml:

tolerations: &tol
  - key: "nvidia.com/gpu"
    operator: "Exists"
    effect: "NoSchedule"
  - key: "cocalc-projects"
    operator: "Exists"
  - key: "cocalc-projects-init"
    operator: "Exists"

daemonsets:
  tolerations: *tol

Then install/update the helm app gpu-operator in the namespace gpu-operator:

helm -n gpu-operator upgrade --install gpu-operator nvidia/gpu-operator \
  --wait \
  --set hostPaths.driverInstallDir=/home/kubernetes/bin/nvidia \
  --set toolkit.installDir=/home/kubernetes/bin/nvidia \
  --set cdi.enabled=true \
  --set cdi.default=true \
  --set driver.enabled=false \
  -f gpu-values.yaml

For sharing a GPU using time slicing, follow the GPU Sharing one cluster-wide configuration.

Once this is set up, create a license for accessing a GPU and deploy a CUDA software environment (as of writing this, I made one based on CUDA 11.x).