.. _gke: Google GCP/GKE ==================== .. note:: As of 2023-10-17, there is currently no out-of-the-box support for an `GKE Cluster`_. The following are notes based on the experience of setting everything up. This certainly assumes you have experience with :term:`Google Compute Platform` and :term:`Kubernetes`. Some details could be out of date, but the general idea should still be valid. There is also a guide for setting up CoCalc OnPrem on :doc:`eks`. Prerequisites ------------- - You should have a basic understanding of cloud computing, in particular the :term:`Google Compute Platform` (GCP) and :term:`Kubernetes` (K8S). But don't worry, for setting up CoCalc you do not need to be an expert! - This guide is specific for an `GKE Cluster`_. If you want to use another cloud provider for your Kubernetes cluster, you have to adapt the instructions. - 2022-05: Starting with K8S 1.22 you need the ``google-cloud-sdk-gke-gcloud-auth-plugin`` plugin for authentication. `More info `_, and don't forget to set ``export USE_GKE_GCLOUD_AUTH_PLUGIN=True`` in your ``~/.bashrc``. - If you're not an owner or admin of the GCP project, you need a couple of "Admin" roles for your user. What exactly is hard to tell, and probably changes over time. What you need on top of just a Basic "Editor" is certainly: "Compute Admin", "Compute Network Admin", and "Compute Storage Admin". Please talk with the owner of the GCP project to assign those roles to your "Editor" user. Setup ----- .. include:: ../_shared/settings-recommendations.rst Let's start. In the `GCP Console under Kubernetes `__, you can create a new cluster: - Name: e.g. ``cocalc-1`` (you can come up with whatever name you want) - Release channel: regular version track (not static) - As of writing this in 2023-02-26, the version is ``1.24.9-gke.3200`` - Location: e.g. region ``europe-west3`` and specifying ``europe-west3-b`` as node location. With that, all nodes will be in the same place. - Automation: maintenance window on Saturday + Sunday, starting at 00:00 for 6 hours. +---------------------+---------------------+---------------------------------------------------------+ | Aspect | Value | Description | +=====================+=====================+=========================================================+ | Cluster | ``cocalc-1`` | (you can come up with whatever name you ant) | +---------------------+---------------------+---------------------------------------------------------+ | Location type | ``zonal`` | Zonal (free tier) or Regional (costs money) | +---------------------+---------------------+---------------------------------------------------------+ | Release channel | ``regular`` | regular version track (not static) | +---------------------+---------------------+---------------------------------------------------------+ | Version | ``1.26.xx`` | As of Oct 2023, this version should work fine | +---------------------+---------------------+---------------------------------------------------------+ | Location | ``europe-west3`` | (whatever suites you best) | +---------------------+---------------------+---------------------------------------------------------+ | Region | ``europe-west3-b`` | With that, all nodes will be in the same place. | +---------------------+---------------------+---------------------------------------------------------+ | Automation | | maintenance window on Saturday + Sunday, | | | | starting at 00:00 for 6 hours. | +---------------------+---------------------+---------------------------------------------------------+ | Networking | ``Default`` | or you know what you do ... | +---------------------+---------------------+---------------------------------------------------------+ | HTTP Load Balancing | ``enabled`` | | +---------------------+---------------------+---------------------------------------------------------+ | Dataplane V2 | ``enabled`` | This is mainly used to limit the networking access | | | | from within a project to the other parts of the cluster.| +---------------------+---------------------+---------------------------------------------------------+ | DNS | ``kube-dns`` | | +---------------------+---------------------+---------------------------------------------------------+ | Shielded GKE nodes | ``enabled`` | | +---------------------+---------------------+---------------------------------------------------------+ | PD CSI Driver | ``enabled`` | Persistent disk CSI Driver | +---------------------+---------------------+---------------------------------------------------------+ | Image streaming | ``disabled`` | See notes below... | +---------------------+---------------------+---------------------------------------------------------+ | Logging | ``System`` | Less logging save money | +---------------------+---------------------+---------------------------------------------------------+ | Cloud Monitoring | ``System`` | Less data save money | +---------------------+---------------------+---------------------------------------------------------+ Networking ~~~~~~~~~~ - Default, public cluster. Of course, if you know what you're doing, you can also set up a private cluster and use a VPN or something like that. This is beyond the scope of this guide. - HTTP Load Balancing: enabled - Dataplane V2: enabled (this will take the network configuration files into account) - DNS: the default ``kube-dns`` is fine – unless you want to access internal services, then maybe you want to run ``Cloud DNS``. (2023-10: IIRC something has changed related to that setting) Security ~~~~~~~~~~ - Shielded GKE nodes: yes Features ~~~~~~~~~~ - Enable Compute Engine **persistent disk CSI Driver** - Disable image streaming: I tried it when enabled, but maybe because those images are so large or other reasons, it didn't really work. Rather, make sure to configure the :ref:`prepull` service. Also, this image streaming service will occupy some amount of memory. I think it's better to use it for projects and disk caching, but :term:`YMMV`. - Logging: yes, but only "System". In particular, projects and hubs generate a lot of log lines, which end up becoming expensive. - Cloud monitoring: yes, but only "System" (both cost money, so, just being conservative here) .. _gke-node-pools: Node Pools ---------- Now we add :ref:`two node pools `. That's where CoCalc will be actually active. Two nodes pools are not strictly necessary, but it makes it easier to scale up and down. In particular, projects will run in one pool only, while all services are in the other pool. The pool for services is of fixed size (e.g. 2 nodes), while the other pool of variable size is for the projects. Please read :doc:`../architecture` and :doc:`../ops/scaling` for more details about this. There is also room to deviate from exactly these settings – they are listed to give you an idea of what is necessary. Pool: **Service** ~~~~~~~~~~~~~~~~~~ +---------------------+------------------------------------------------------------------+ | Parameter | Value | +=====================+==================================================================+ | Name | "services-1" (if you change parameters, which you can't edit, | | | create a new pool and increment that number) | +---------------------+------------------------------------------------------------------+ | Size | 2 (1 is not enough, unless you allow some services | | | to run on project nodes as well, and also makes it more robust) | +---------------------+------------------------------------------------------------------+ | Surge update | max=1 | +---------------------+------------------------------------------------------------------+ | Image type | container optimized | +---------------------+------------------------------------------------------------------+ | Type | at least ``e2-standard-2``, must be x86/64 architecture | +---------------------+------------------------------------------------------------------+ | Disk | min. 50GB standard | +---------------------+------------------------------------------------------------------+ | Security | secure boot (others leave as they are) | +---------------------+------------------------------------------------------------------+ | Metadata | Kubernetes Label: ``cocalc-role=services`` (that's ``key=value``)| +---------------------+------------------------------------------------------------------+ Pool: **Projects** ~~~~~~~~~~~~~~~~~~ +---------------------+-----------------------------------------------------------------------------+ | Parameter | Value | +=====================+=============================================================================+ | Name | "projects-1" | +---------------------+-----------------------------------------------------------------------------+ | Size | 1 (scale it up later) | +---------------------+-----------------------------------------------------------------------------+ | Surge update | max=2 (temporarily more nodes) | +---------------------+-----------------------------------------------------------------------------+ | Image type | container optimized | +---------------------+-----------------------------------------------------------------------------+ | Machine type | ``e2-highmem-4``, must be x86/64 architecture. | | | This of course depends on what you really want to do. | | | A "standard" project uses maybe around 0.5 gb ram and only a little bit | | | of CPU (1/10 on average). Hence, you usually need more memory than CPU. | | | If you know it will be CPU intensive, consider c2-standard-* machines! | +---------------------+-----------------------------------------------------------------------------+ | Disk | 100GB balanced disk | | | The project images are huge, and having a faster disk speeds up | | | downloading the image on a new node, and running programs in general. | | | The optional :ref:`prepull` service loads the latest project image first, | | | before the node is set to be available for projects. | +---------------------+-----------------------------------------------------------------------------+ | Security | secure boot (others leave as they are) | +---------------------+-----------------------------------------------------------------------------+ | Spot VM | if you understand and can tolerate that spot VMs get randomly rebooted, | | | and hence interrupt a running project, enable this | | | – saves you a lot of money! | +---------------------+-----------------------------------------------------------------------------+ | Kubernetes Label | ``cocalc-role=projects`` | +---------------------+-----------------------------------------------------------------------------+ | Kubernetes Taints | To make the prepull service work, set these taints: | | | | | | .. list-table:: | | | | | | * - **Key** | | | - **Value** | | | - **Effect** | | | * - cocalc-projects-init | | | - false | | | - NoExecute | | | * - cocalc-projects | | | - init | | | - NoSchedule | | | | +---------------------+-----------------------------------------------------------------------------+ .. warning:: Those are *Kubernetes* Labels and Taints – not to be confused with the GCP labels (just called "Labels"! .. _gke-database: Database: Cloud SQL ------------------- CoCalc requires a PostgreSQL database. We use a Cloud SQL instance for that. If you know what you're doing, you can run the DB in the cluster yourself – there is nothing in particular special about using Cloud SQL. +----------------------+---------------------------+----------------------------------------------------+ | Attribute | Value | Description | +======================+===========================+====================================================+ | Name | ``cocalc-db`` | (choose whatever you want) | +----------------------+---------------------------+----------------------------------------------------+ | Database version | PostgreSQL 14 | | +----------------------+---------------------------+----------------------------------------------------+ | Region | same as the cluster | | +----------------------+---------------------------+----------------------------------------------------+ | High availability | yes | probably a good idea, you can change this later | +----------------------+---------------------------+----------------------------------------------------+ | Machine | start small | shared core, 1 vCPU, ~0.6 gb ram (or ~1.5gb) | | | | Of course, check monitoring and adjust as needed! | +----------------------+---------------------------+----------------------------------------------------+ | Storage | SSD, 10gb | automatic storage increases | +----------------------+---------------------------+----------------------------------------------------+ | Network | private IP | | +----------------------+---------------------------+----------------------------------------------------+ | | disable public IP | Just costs money, less secure | +----------------------+---------------------------+----------------------------------------------------+ | Backup | yes | opt-in if you like, start small | +----------------------+---------------------------+----------------------------------------------------+ | Maintenance window | Sunday, 4-5 am | :term:`YMMV` | +----------------------+---------------------------+----------------------------------------------------+ | Flags | ``max_connections``: 100 | | +----------------------+---------------------------+----------------------------------------------------+ .. note:: - Storage: Keep in mind that the database stores all changes to documents. Therefore, the size increases with user activity. Said that, you probably won't see the database to grow beyond a GB anytime soon. - Network: Had to enable service networking API (which requires to have the "Network Admin" role) Selected to automatically allocate an IP range - Database: to access the DB, run the ``../database/db-shell.sh`` script – see :ref:`troubleshooting-database`. - Backup: - midnight at 4am - region: same as the database - 7 days of backup - Point in time recovery: 1 day - Maintenance window: Should be fine to set something, which is at night during the weekend: e.g. Sunday, 4-5 am. - Flags: With the default (due to low memory I guess) there weren't enough slots. Errors were like: ``remaining connection slots are reserved for non-replication superuser connections`` I guess you can change almost everything of the above later on as well. Post setup ~~~~~~~~~~~~~~~~~~~ #. Create user "cocalc" (or whatever you want) with a password. Save the password somewhere; we'll later add it as a secret to the kubernetes cluster. #. Create database "cocalc" (or whatever you want) Storage ------- We continue setting up the Cluster. So far, we have the "control plane" in GKE and some nodes. Now, we need to setup the storage. - Above in features, we enabled "Enable Compute Engine Persistent Disk CSI Driver" `more info `__ - The config files here will use this to setup suitable `PVC `__\ s and `Storage Classes `__. - The names of these PVC must match the references in the CoCalc deployment. Run the following command to setup the storage classes: .. code:: bash kubectl apply -f pd-classes.yaml .. _nfs-server: NFS Server ~~~~~~~~~~ The goal is to setup an NFS storage provisioner, which uses the PVC "nfs-data" to store the data of projects, shared files and global data/software. .. code:: bash helm repo add nfs https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/ helm repo update helm search repo nfs you should see ``nfs/nfs-server-provisioner`` in the output. Look into the `gke/` subdirectory for more details. In particular, check what ``nfs.yaml`` specifies. It will create a disk storing the data of all projects and shared files with the specified ``storageClass`` – they were defined in the previous step. You maybe have to tune the config file to your needs! .. code:: bash helm upgrade --install nfs nfs/nfs-server-provisioner -f gke/nfs.yaml NOTE: as of writing this, there was a problem with publishing newer docker images. Hence `according to this ticket `__ I had to add ``--version=1.5.0`` for an older variant of that chart. This problem has been resolved. Now: .. code:: bash kubectl get storageclasses should list ``nfs``. Ref: - https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/ - https://artifacthub.io/packages/helm/kvaps/nfs-server-provisioner .. include:: ../_shared/custom-pvc.rst .. _gke-disk-backup: Disk Backup ~~~~~~~~~~~ Once you deployed the NFS server, you'll notice a new disk. They're listed in "Compute Engine" → "Disks". The simplest way to get some backup is to setup a `Snapshot Schedule`_. With that, GCP will make consistent snapshots of the disk, which you can restore from – or create a new disk from an older snapshot. For that, go to "Compute Engine" → "Disks" → "pvc-(the uuid you see in ``kubectl get pv``)" → "Edit" → "Create snapshot schedule". Daily for two weeks sounds good. BTW, that's also the place where you can increase the disk size. Next steps ---------- The next steps are to setup NGINX ingress + NodeBalancer. So, continue in the ``/ingress-nginx`` and ``/letsencrypt`` subdirectories. You also have to setup the :ref:`credentials for pulling from the private docker registry `. Once all this is done, you can configure and deploy the :doc:`HELM Chart for CoCalc <../deployment>`. Testing ------- - First steps: - After the initial deployment, set the IP you see in the LoadBalancer (``kubectl get svc`` → look for ``LoadBalancer`` with an external IP) at your DNS provider. - Then try to open ``https://[cocalc-your-domain.tld]/`` in your browser. - You should be able to sign in directly as Admin, with the credentials set in your :ref:`my-values.yaml ` config file. Of course, you should change your password. - Functionality: - A good test is to create a new project, and then open a terminal and run ``htop``. You should see a script starting the project hub, a little bit of CPU activity, and not much more – maybe the sshd server for connecting via the SSH gateway. - Next, create some Jupyter Notebooks (Python3, R, …), create a LaTeX ``latex.tex`` file, and maybe some other files. Each one of these should work as expected. - Finally, explore your "Admin" panel, and see if the "Server Settings" are as expected. At the bottom you can test the email setup, by sending a password reset email. - As Admin, you can also create a file like ``data.cocalc-crm``, which will allow you to look at various database tables, tie user activity to projects, etc. Cost Control ------------ The above cluster + associated services and resources incur costs. You can check up on that by going to: "Billing" (your billing account of your project) → Cost Management: "Reports" - You can see a daily graph of your usage, use the top-right above the chart drop-down to switch to "daily cumulative" to see a trend for the current billing period (for me, it's a month). - On the right hand side, you can get more details by selecting "SKU" in the "Group by" selector. ("stock keeping unit" is the smallest part GCP is selling to you) In the table below, click on "Cost ↓" to see them sorted in a decreasing way, or "Subtotal ↓", after applying discounts & co. - What you should see is that the cluster itself costs something, but you get a credit for one in a single zone (not region). `See notes here `__: The GKE free tier provides $74.40 in monthly credits per billing account that are applied to zonal and Autopilot clusters. If you only use a single Zonal or Autopilot cluster, this credit will at least cover the complete cost of that cluster each month. - The :term:`LoadBalancer` + external IP address also costs a rather fixed amount per month. - Logging costs proportionally to the data, hence we did disable everything except "System". - If you use GCP's "SQL" for running the PostgreSQL database, don't use an external IP, since that would also cost you a fee to rent it. - The bulk of your cost are CPUs + Memory, though. See notes about "Spot VM" above for running the CoCalc projects on these. - Disk storage is rather cheap. - Egress Network traffic is the last item to think about. e.g. if your users watch a lot of videos by streaming them from the platform, you might end up getting charged significantly. Monitoring / Uptime Check ------------------------- The "uptime check" in GCP periodically pings your page. Price: It has a free quota, hence we dial it down a bit to stay below it. Make sure to `read about it's pricing `__. E.g. 31 days, 3 ping locations every 5 minutes are: ``31 * 24 * (60 / 5) * 3 = 26784``. Well below the 1M free quota, as of writing this. To get started very simple, you can setup something like that.: 1. Open ``/monitoring/uptime/create`` in the GCP console to create a new uptime check 2. Target: - HTTPS - URL (to check from the "outside" if everything is ok) - Hostname: the "DNS" entry - Path: keep it blank for "/" (i.e. "hub-next"). Other interesting targets are ``/stats`` (hub-websocket) or ``/static/app.html`` (static). - Check frequency: 5 minutes (that's the ``60/5`` in the calculation above) - Expand the target options: - Regions: Just pick 3, not all of them. - GET Method on Port 443 & Validate SSL certificate! 3. Validation: - Timeout 10s (or maybe better 30s, i.e. something is going on, but not a real issue yet?) - Content matching: here you need to get creative. Maybe check for a small string in the content, e.g. the custom name of our instance, or ```` for ``static``. - No logging (just adds up the logging quota, I guess) - Response code 2xx 4. Alerts: - Name: "[your instance name] is down" - Duration: 5 Minutes (?) - Notification: here, you have to select how to notify, there is a whole setup behind this. At minimal, it should send you an email. At the very end is a "Test" button. Check that it actually says that the page is up, before arming it :-) The first time around it might take a bit longer to respond, subsequent tests should be quicker – next.js warmed up. Then click "create", of course. All of the above can be changed later as well… .. note:: Since the above just checks paths at certain domains, you can setup the same at another service as well. .. _GKE Cluster: https://cloud.google.com/kubernetes-engine .. _Snapshot Schedule: https://cloud.google.com/compute/docs/disks/scheduled-snapshots GPU Nodes ------------------------------ These are just quick notes how to add a pool of GPU nodes to the cluster, managed by GKE. All node specific settings are essentially optional. The settings below are just what I selected to add one small node with a T4 GPU and up to 4 projects are able to share it using time-sharing. The kubernetes related settings are essentially the same as above, and need to be the same. Also, note, that GKE will add additional taints to these nodes! #. Create a node pool ``gpu-1``, of size "1" #. Select as node type "GPU", then "NVIDIA T4", and the VM is ``n1-highmem-4``. #. Select "Time-sharing" and "4" #. GPU Driver: "user managed". I tried Google managed, but it didn't work. #. Balanced disk, 128GB #. Local SSD Disk: "0" (instead of 2!) #. Security: click on "Enable secure boot" #. Metadata: same as above * Kubernetes label: ``cocalc-role=projects`` * Taints: (see note below, ``gpu-operator`` somehow ignores tainted nodes, even with taint toleration – bug!?) * ``NoExecute:cocalc-projects-init=false`` * ``NoSchedule:cocalc-projects=init`` #. Due to "manually managed GPU driver", continue at `feedbackManage the GPU Stack with the NVIDIA GPU Operator on Google Kubernetes Engine (GKE) `_. WARNING: I don't know why, but adding taint tolerations *should* work, but it does not. So, the workaround is to simply not taint these GPU nodes – they'll have that ``"nvidia.com/gpu"`` taint, though. Check what the documentation says, and add these values for taint tolerations from a file ``gpu-values.yaml``:: tolerations: &tol - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule" - key: "cocalc-projects" operator: "Exists" - key: "cocalc-projects-init" operator: "Exists" daemonsets: tolerations: *tol Then install/update the helm app ``gpu-operator`` in the namespace ``gpu-operator``:: helm -n gpu-operator upgrade --install gpu-operator nvidia/gpu-operator \ --wait \ --set hostPaths.driverInstallDir=/home/kubernetes/bin/nvidia \ --set toolkit.installDir=/home/kubernetes/bin/nvidia \ --set cdi.enabled=true \ --set cdi.default=true \ --set driver.enabled=false \ -f gpu-values.yaml For sharing a GPU using time slicing, follow the `GPU Sharing one cluster-wide configuration `_. Once this is set up, create a :ref:`license for accessing a GPU ` and deploy a :ref:`CUDA software environment ` (as of writing this, I made one based on CUDA 11.x).