Google GCP/GKE¶
Note
As of 2023-10-17, there is currently no out-of-the-box support for an GKE Cluster. The following are notes based on the experience of setting everything up. This certainly assumes you have experience with Google Compute Platform and Kubernetes. Some details could be out of date, but the general idea should still be valid.
There is also a guide for setting up CoCalc OnPrem on Amazon AWS/EKS.
Prerequisites¶
You should have a basic understanding of cloud computing, in particular the Google Compute Platform (GCP) and Kubernetes (K8S). But don’t worry, for setting up CoCalc you do not need to be an expert!
This guide is specific for an GKE Cluster. If you want to use another cloud provider for your Kubernetes cluster, you have to adapt the instructions.
2022-05: Starting with K8S 1.22 you need the
google-cloud-sdk-gke-gcloud-auth-plugin
plugin for authentication. More info, and don’t forget to setexport USE_GKE_GCLOUD_AUTH_PLUGIN=True
in your~/.bashrc
.If you’re not an owner or admin of the GCP project, you need a couple of “Admin” roles for your user. What exactly is hard to tell, and probably changes over time. What you need on top of just a Basic “Editor” is certainly: “Compute Admin”, “Compute Network Admin”, and “Compute Storage Admin”. Please talk with the owner of the GCP project to assign those roles to your “Editor” user.
Setup¶
Note
All settings are mainly recommendations – feel free to look into the them in more detail, adjust them to your needs, etc. If something is actually required, it is explicitly mentioned. Often, you can change the settings later on as well.
The specific parameters are meant to get started with a small cluster. You can scale up later by changing the nodes types to be larger and CoCalc’s configuration parameters for the HELM charts.
Let’s start. In the GCP Console under Kubernetes, you can create a new cluster:
Name: e.g.
cocalc-1
(you can come up with whatever name you want)Release channel: regular version track (not static)
As of writing this in 2023-02-26, the version is
1.24.9-gke.3200
Location: e.g. region
europe-west3
and specifyingeurope-west3-b
as node location. With that, all nodes will be in the same place.Automation: maintenance window on Saturday + Sunday, starting at 00:00 for 6 hours.
Aspect |
Value |
Description |
---|---|---|
Cluster |
|
(you can come up with whatever name you ant) |
Location type |
|
Zonal (free tier) or Regional (costs money) |
Release channel |
|
regular version track (not static) |
Version |
|
As of Oct 2023, this version should work fine |
Location |
|
(whatever suites you best) |
Region |
|
With that, all nodes will be in the same place. |
Automation |
maintenance window on Saturday + Sunday, starting at 00:00 for 6 hours. |
|
Networking |
|
or you know what you do … |
HTTP Load Balancing |
|
|
Dataplane V2 |
|
This is mainly used to limit the networking access from within a project to the other parts of the cluster. |
DNS |
|
|
Shielded GKE nodes |
|
|
PD CSI Driver |
|
Persistent disk CSI Driver |
Image streaming |
|
See notes below… |
Logging |
|
Less logging save money |
Cloud Monitoring |
|
Less data save money |
Networking¶
Default, public cluster. Of course, if you know what you’re doing, you can also set up a private cluster and use a VPN or something like that. This is beyond the scope of this guide.
HTTP Load Balancing: enabled
Dataplane V2: enabled (this will take the network configuration files into account)
DNS: the default
kube-dns
is fine – unless you want to access internal services, then maybe you want to runCloud DNS
. (2023-10: IIRC something has changed related to that setting)
Security¶
Shielded GKE nodes: yes
Features¶
Enable Compute Engine persistent disk CSI Driver
Disable image streaming: I tried it when enabled, but maybe because those images are so large or other reasons, it didn’t really work. Rather, make sure to configure the Prepull service. Also, this image streaming service will occupy some amount of memory. I think it’s better to use it for projects and disk caching, but YMMV.
Logging: yes, but only “System”. In particular, projects and hubs generate a lot of log lines, which end up becoming expensive.
Cloud monitoring: yes, but only “System” (both cost money, so, just being conservative here)
Node Pools¶
Now we add two node pools. That’s where CoCalc will be actually active. Two nodes pools are not strictly necessary, but it makes it easier to scale up and down. In particular, projects will run in one pool only, while all services are in the other pool.
The pool for services is of fixed size (e.g. 2 nodes), while the other pool of variable size is for the projects. Please read Architecture and Scaling for more details about this. There is also room to deviate from exactly these settings – they are listed to give you an idea of what is necessary.
Pool: Service¶
Parameter |
Value |
---|---|
Name |
“services-1” (if you change parameters, which you can’t edit, create a new pool and increment that number) |
Size |
2 (1 is not enough, unless you allow some services to run on project nodes as well, and also makes it more robust) |
Surge update |
max=1 |
Image type |
container optimized |
Type |
at least |
Disk |
min. 50GB standard |
Security |
secure boot (others leave as they are) |
Metadata |
Kubernetes Label: |
Pool: Projects¶
Parameter |
Value |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
Name |
“projects-1” |
|||||||||
Size |
1 (scale it up later) |
|||||||||
Surge update |
max=2 (temporarily more nodes) |
|||||||||
Image type |
container optimized |
|||||||||
Machine type |
|
|||||||||
Disk |
100GB balanced disk The project images are huge, and having a faster disk speeds up downloading the image on a new node, and running programs in general. The optional Prepull service loads the latest project image first, before the node is set to be available for projects. |
|||||||||
Security |
secure boot (others leave as they are) |
|||||||||
Spot VM |
if you understand and can tolerate that spot VMs get randomly rebooted, and hence interrupt a running project, enable this – saves you a lot of money! |
|||||||||
Kubernetes Label |
|
|||||||||
Kubernetes Taints |
To make the prepull service work, set these taints:
|
Warning
Those are Kubernetes Labels and Taints – not to be confused with the GCP labels (just called “Labels”!
Database: Cloud SQL¶
CoCalc requires a PostgreSQL database. We use a Cloud SQL instance for that. If you know what you’re doing, you can run the DB in the cluster yourself – there is nothing in particular special about using Cloud SQL.
Attribute |
Value |
Description |
---|---|---|
Name |
|
(choose whatever you want) |
Database version |
PostgreSQL 14 |
|
Region |
same as the cluster |
|
High availability |
yes |
probably a good idea, you can change this later |
Machine |
start small |
shared core, 1 vCPU, ~0.6 gb ram (or ~1.5gb) Of course, check monitoring and adjust as needed! |
Storage |
SSD, 10gb |
automatic storage increases |
Network |
private IP |
|
disable public IP |
Just costs money, less secure |
|
Backup |
yes |
opt-in if you like, start small |
Maintenance window |
Sunday, 4-5 am |
|
Flags |
|
Note
Storage: Keep in mind that the database stores all changes to documents. Therefore, the size increases with user activity. Said that, you probably won’t see the database to grow beyond a GB anytime soon.
Network: Had to enable service networking API (which requires to have the “Network Admin” role) Selected to automatically allocate an IP range
Database: to access the DB, run the
../database/db-shell.sh
script – see Database.Backup: - midnight at 4am - region: same as the database - 7 days of backup - Point in time recovery: 1 day
Maintenance window: Should be fine to set something, which is at night during the weekend: e.g. Sunday, 4-5 am.
Flags: With the default (due to low memory I guess) there weren’t enough slots. Errors were like:
remaining connection slots are reserved for non-replication superuser connections
I guess you can change almost everything of the above later on as well.
Post setup¶
Create user “cocalc” (or whatever you want) with a password. Save the password somewhere; we’ll later add it as a secret to the kubernetes cluster.
Create database “cocalc” (or whatever you want)
Storage¶
We continue setting up the Cluster. So far, we have the “control plane” in GKE and some nodes. Now, we need to setup the storage.
Above in features, we enabled “Enable Compute Engine Persistent Disk CSI Driver” more info
The config files here will use this to setup suitable PVCs and Storage Classes.
The names of these PVC must match the references in the CoCalc deployment.
Run the following command to setup the storage classes:
kubectl apply -f pd-classes.yaml
NFS Server¶
The goal is to setup an NFS storage provisioner, which uses the PVC “nfs-data” to store the data of projects, shared files and global data/software.
helm repo add nfs https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
helm repo update
helm search repo nfs
you should see nfs/nfs-server-provisioner
in the output.
Look into the gke/ subdirectory for more details.
In particular, check what nfs.yaml
specifies.
It will create a disk storing the data of all projects and shared files
with the specified storageClass
– they were defined in the previous step.
You maybe have to tune the config file to your needs!
helm upgrade --install nfs nfs/nfs-server-provisioner -f gke/nfs.yaml
NOTE: as of writing this, there was a problem with publishing newer
docker images. Hence according to this
ticket
I had to add --version=1.5.0
for an older variant of that chart.
This problem has been resolved.
Now:
kubectl get storageclasses
should list nfs
.
Ref:
https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
https://artifacthub.io/packages/helm/kvaps/nfs-server-provisioner
Note: completely independent of the above, you can use other storage
solutions as well. For that, you have to create PVCs yourself, which
will must expose a ReadWriteMany filesystem.
In the CoCalc deployment, you have to configure the names
of these PVCs under global: {storage: {...}}
and disable creating
them automatically storage: {create: false}
.
See ../cocalc/values.yaml
for more information.
Disk Backup¶
Once you deployed the NFS server, you’ll notice a new disk. They’re listed in “Compute Engine” → “Disks”.
The simplest way to get some backup is to setup a Snapshot Schedule. With that, GCP will make consistent snapshots of the disk, which you can restore from – or create a new disk from an older snapshot.
For that, go to “Compute Engine” → “Disks” → “pvc-(the uuid you see in
kubectl get pv
)” → “Edit” → “Create snapshot schedule”. Daily for
two weeks sounds good.
BTW, that’s also the place where you can increase the disk size.
Next steps¶
The next steps are to setup NGINX ingress + NodeBalancer.
So, continue in the /ingress-nginx
and /letsencrypt
subdirectories.
You also have to setup the credentials for pulling from the private docker registry.
Once all this is done, you can configure and deploy the HELM Chart for CoCalc.
Testing¶
First steps:
After the initial deployment, set the IP you see in the LoadBalancer (
kubectl get svc
→ look forLoadBalancer
with an external IP) at your DNS provider.Then try to open
https://[cocalc-your-domain.tld]/
in your browser.You should be able to sign in directly as Admin, with the credentials set in your my-values.yaml config file. Of course, you should change your password.
Functionality:
A good test is to create a new project, and then open a terminal and run
htop
. You should see a script starting the project hub, a little bit of CPU activity, and not much more – maybe the sshd server for connecting via the SSH gateway.Next, create some Jupyter Notebooks (Python3, R, …), create a LaTeX
latex.tex
file, and maybe some other files. Each one of these should work as expected.Finally, explore your “Admin” panel, and see if the “Server Settings” are as expected. At the bottom you can test the email setup, by sending a password reset email.
As Admin, you can also create a file like
data.cocalc-crm
, which will allow you to look at various database tables, tie user activity to projects, etc.
Cost Control¶
The above cluster + associated services and resources incur costs. You can check up on that by going to: “Billing” (your billing account of your project) → Cost Management: “Reports”
You can see a daily graph of your usage, use the top-right above the chart drop-down to switch to “daily cumulative” to see a trend for the current billing period (for me, it’s a month).
On the right hand side, you can get more details by selecting “SKU” in the “Group by” selector. (“stock keeping unit” is the smallest part GCP is selling to you)
In the table below, click on “Cost ↓” to see them sorted in a decreasing way, or “Subtotal ↓”, after applying discounts & co.
What you should see is that the cluster itself costs something, but you get a credit for one in a single zone (not region). See notes here:
The GKE free tier provides $74.40 in monthly credits per billing account that are applied to zonal and Autopilot clusters. If you only use a single Zonal or Autopilot cluster, this credit will at least cover the complete cost of that cluster each month.
The LoadBalancer + external IP address also costs a rather fixed amount per month.
Logging costs proportionally to the data, hence we did disable everything except “System”.
If you use GCP’s “SQL” for running the PostgreSQL database, don’t use an external IP, since that would also cost you a fee to rent it.
The bulk of your cost are CPUs + Memory, though. See notes about “Spot VM” above for running the CoCalc projects on these.
Disk storage is rather cheap.
Egress Network traffic is the last item to think about. e.g. if your users watch a lot of videos by streaming them from the platform, you might end up getting charged significantly.
Monitoring / Uptime Check¶
The “uptime check” in GCP periodically pings your page.
Price: It has a free quota, hence we dial it down a bit to stay below
it. Make sure to read about it’s
pricing.
E.g. 31 days, 3 ping locations every 5 minutes are:
31 * 24 * (60 / 5) * 3 = 26784
. Well below the 1M free quota, as of
writing this.
To get started very simple, you can setup something like that.:
Open
/monitoring/uptime/create
in the GCP console to create a new uptime checkTarget:
HTTPS
URL (to check from the “outside” if everything is ok)
Hostname: the “DNS” entry
Path: keep it blank for “/” (i.e. “hub-next”). Other interesting targets are
/stats
(hub-websocket) or/static/app.html
(static).Check frequency: 5 minutes (that’s the
60/5
in the calculation above)Expand the target options:
Regions: Just pick 3, not all of them.
GET Method on Port 443 & Validate SSL certificate!
Validation:
Timeout 10s (or maybe better 30s, i.e. something is going on, but not a real issue yet?)
Content matching: here you need to get creative. Maybe check for a small string in the content, e.g. the custom name of our instance, or
<html>
forstatic
.No logging (just adds up the logging quota, I guess)
Response code 2xx
Alerts:
Name: “[your instance name] is down”
Duration: 5 Minutes (?)
Notification: here, you have to select how to notify, there is a whole setup behind this. At minimal, it should send you an email.
At the very end is a “Test” button. Check that it actually says that the page is up, before arming it :-) The first time around it might take a bit longer to respond, subsequent tests should be quicker – next.js warmed up.
Then click “create”, of course. All of the above can be changed later as well…
Note
Since the above just checks paths at certain domains, you can setup the same at another service as well.
GPU Nodes¶
These are just quick notes how to add a pool of GPU nodes to the cluster, managed by GKE. All node specific settings are essentially optional. The settings below are just what I selected to add one small node with a T4 GPU and up to 4 projects are able to share it using time-sharing. The kubernetes related settings are essentially the same as above, and need to be the same. Also, note, that GKE will add additional taints to these nodes!
Create a node pool gpu-1, of size “1”
Select as node type “GPU”, then “NVIDIA T4”, and the VM is
n1-highmem-4
.Select “Time-sharing” and “4”
GPU Driver: “user managed”. I tried Google managed, but it didn’t work.
Balanced disk, 128GB
Local SSD Disk: “0” (instead of 2!)
Security: click on “Enable secure boot”
Metadata: same as above
Kubernetes label:
cocalc-role=projects
Taints: (see note below,
gpu-operator
somehow ignores tainted nodes, even with taint toleration – bug!?)
NoExecute:cocalc-projects-init=false
NoSchedule:cocalc-projects=init
Due to “manually managed GPU driver”, continue at feedbackManage the GPU Stack with the NVIDIA GPU Operator on Google Kubernetes Engine (GKE).
WARNING: I don’t know why, but adding taint tolerations should work, but it does not.
So, the workaround is to simply not taint these GPU nodes – they’ll have that "nvidia.com/gpu"
taint, though.
Check what the documentation says, and add these values for taint tolerations from a file gpu-values.yaml
:
tolerations: &tol
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- key: "cocalc-projects"
operator: "Exists"
- key: "cocalc-projects-init"
operator: "Exists"
daemonsets:
tolerations: *tol
Then install/update the helm app gpu-operator
in the namespace gpu-operator
:
helm -n gpu-operator upgrade --install gpu-operator nvidia/gpu-operator \
--wait \
--set hostPaths.driverInstallDir=/home/kubernetes/bin/nvidia \
--set toolkit.installDir=/home/kubernetes/bin/nvidia \
--set cdi.enabled=true \
--set cdi.default=true \
--set driver.enabled=false \
-f gpu-values.yaml
For sharing a GPU using time slicing, follow the GPU Sharing one cluster-wide configuration.
Once this is set up, create a license for accessing a GPU and deploy a CUDA software environment (as of writing this, I made one based on CUDA 11.x).