Scaling

This page describes how to scale CoCalc in a Kubernetes cluster. There are basically two aspects to think about:

  • Active Users: how many users do you have to handle at the same time? The total number is not as important as the number of active users. If you have 1000 users, but only 10 are active at the same time, you don’t need to scale up.

  • Active Projects: like with users, the total number of projects is not as important as the number of active projects. An active project is taking up most of the resources in your cluster. Since each user is usually working in one or more projects, this aspect scales directly with the number of active users.

Hub services

The hub services are exposed to the outside world. The ones that handle incoming traffic from users are hub-websocket and hub-proxy. Learn more about this in the Architecture.

The more users connected at the same time, the more memory these hub’s need. Each hub can only handle a maximum of about 50 active connections at the same time. Hence if you expect 200 active users, you should run with a replication of at least 4. A higher replication will also speed up communication and reduce the latency of all operations.

Note

To quickly assess the current situation, you can open the /stats endpoint (and look at the active connections counter per hub) or consult the /info/status page for a more long term view of the situation.

In your yaml config file, there is a part for hubs. Here is how to increase memory request and replication in a sensible way. The resources part follows the Kubernetes resource specification. The multiple_replicas part is the number of pods to run. There is no need to scale up api or next, since the meat of the work is done by websocket and proxy.

hub:
  resources:
    requests:
      cpu: 1
      memory: 1Gi
    limits:
      cpu: 2
      memory: 2Gi

multiple_replicas:
  websocket: 6
  proxy: 6
  next: 2
  api: 1

Note: If for some reasons your use case is to “hammer” the API:

  • v1: is handled by api

  • v2: is handled by next

Increase the number of these two accordingly.

Proxy/Networking

Since all the communication also has to go through the NGINX ingress server (or whatever you use to connect CoCalc to the outside world), you should add some replicas there as well. There is a note about this in the /ingress-nginx directory. For example, the values.yaml mentions to run with 2 replicas, and it tells it to distribute them across different nodes. YMMV.

Database

The database plays a crucial role as well, but overall, it’s probably not the part you have to worry about too much. Make sure it has enough memory available (out of the box, Postgres is configured to run only on a small/old machine!). Rule of thumb should be that 1 CPU core (but average load will be below that) and 10Gi of memory should be enough for 200 users, 1000+ projects, and many actively edited files.

Projects

There is a line in the general configuration with the resource specs for a default project.

default_quotas: '{"internet":true,"idle_timeout":1800,"mem":2000,"cpu":1,"cpu_oc":20,"mem_oc":10}'

Before we discuss this in more detail, this basically defines how “heavy” a single project is. The general idea is to first estimate how many projects will run, and well, by these “taints” of nodes, you can dedicate a set of nodes in the cluster to only run these projects. Then, just do some back of the envelope calculations to see if they can fit.

About the default quotas above:

  • mem:2000 defines the max memory limit to be 2GB. On average, let’s estimate a project will use 500mb of RAM. So, you can run 100 of them on a VM with 64GB RAM, no problem.

  • cpu:1 defines the max CPU limit to be 1 core. Due to the way CoCalc is used for interactive work, most of the time the computer waits on the user. Average usage per project is usually much less than 1 core!

  • cpu_oc:20 defines an overcommit ratio of 1:20. So, each project will request 1/20 of a cpu core.

  • mem_oc:10 defines an overcommit ratio of 1:10. So, each project will request 1/10 of these 2000MB of RAM (i.e. 200MB).

If there are scheduling errors, you might have to tweak the parameters, though. Possible problems:

  1. Not enough CPU: this means the requested CPU of all projects attempting to run exceeds the node’s core limit.

  2. Not enough Memory: same as above, but memory. This is more likely to happen, because memory is an inelastic resource.

  3. The max number of pods you can run on a single node. If you run kubectl describe node [.....] there is a section “Capacity/Allocatable”. One entry is “pods”. This number depends on the cluster setup and networking (i.e. each pod needs an internet IP address). If your max number of pods is not 110 (what I have here) but much smaller, you won’t be able to schedule 100 projects on such a node – even if it has 24 CPU cores and hundreds of GB ram.