.. _gke:

Google GCP/GKE
====================


.. note::

    As of 2023-10-17, there is currently no out-of-the-box support for an `GKE Cluster`_.
    The following are notes based on the experience of setting everything up.
    This certainly assumes you have experience with :term:`Google Compute Platform` and :term:`Kubernetes`.
    Some details could be out of date, but the general idea should still be valid.

    There is also a guide for setting up CoCalc OnPrem on :doc:`eks`.


Prerequisites
-------------

-  You should have a basic understanding of cloud computing, in
   particular the :term:`Google Compute Platform` (GCP) and
   :term:`Kubernetes` (K8S). But don't worry, for setting
   up CoCalc you do not need to be an expert!

-  This guide is specific for an `GKE Cluster`_.
   If you want to use another cloud provider for your Kubernetes cluster,
   you have to adapt the instructions.

-  2022-05: Starting with K8S 1.22 you need the ``google-cloud-sdk-gke-gcloud-auth-plugin`` plugin for authentication.
   `More info <https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke>`_, and
   don't forget to  set ``export USE_GKE_GCLOUD_AUTH_PLUGIN=True`` in your ``~/.bashrc``.

-  If you're not an owner or admin of the GCP project, you need a couple
   of "Admin" roles for your user. What exactly is hard to tell, and
   probably changes over time. What you need on top of just a Basic
   "Editor" is certainly: "Compute Admin", "Compute Network Admin", and
   "Compute Storage Admin". Please talk with the owner of the GCP
   project to assign those roles to your "Editor" user.

Setup
-----

.. include:: ../_shared/settings-recommendations.rst

Let's start. In the `GCP Console under
Kubernetes <https://console.cloud.google.com/kubernetes/>`__, you can
create a new cluster:

-  Name: e.g. ``cocalc-1`` (you can come up with whatever name you want)
-  Release channel: regular version track (not static)
-  As of writing this in 2023-02-26, the version is ``1.24.9-gke.3200``
-  Location: e.g. region ``europe-west3`` and specifying
   ``europe-west3-b`` as node location. With that, all nodes will be in
   the same place.
-  Automation: maintenance window on Saturday + Sunday, starting at
   00:00 for 6 hours.


+---------------------+---------------------+---------------------------------------------------------+
| Aspect              | Value               | Description                                             |
+=====================+=====================+=========================================================+
| Cluster             | ``cocalc-1``        | (you can come up with whatever name you ant)            |
+---------------------+---------------------+---------------------------------------------------------+
| Location type       | ``zonal``           | Zonal (free tier) or Regional (costs money)             |
+---------------------+---------------------+---------------------------------------------------------+
| Release channel     | ``regular``         | regular version track (not static)                      |
+---------------------+---------------------+---------------------------------------------------------+
| Version             | ``1.26.xx``         | As of Oct 2023, this version should work fine           |
+---------------------+---------------------+---------------------------------------------------------+
| Location            | ``europe-west3``    |  (whatever suites you best)                             |
+---------------------+---------------------+---------------------------------------------------------+
| Region              | ``europe-west3-b``  | With that, all nodes will be in the same place.         |
+---------------------+---------------------+---------------------------------------------------------+
| Automation          |                     | maintenance window on Saturday + Sunday,                |
|                     |                     | starting at  00:00 for 6 hours.                         |
+---------------------+---------------------+---------------------------------------------------------+
| Networking          | ``Default``         | or you know what you do ...                             |
+---------------------+---------------------+---------------------------------------------------------+
| HTTP Load Balancing | ``enabled``         |                                                         |
+---------------------+---------------------+---------------------------------------------------------+
| Dataplane V2        | ``enabled``         | This is mainly used to limit the networking access      |
|                     |                     | from within a project to the other parts of the cluster.|
+---------------------+---------------------+---------------------------------------------------------+
| DNS                 | ``kube-dns``        |                                                         |
+---------------------+---------------------+---------------------------------------------------------+
| Shielded GKE nodes  | ``enabled``         |                                                         |
+---------------------+---------------------+---------------------------------------------------------+
| PD CSI Driver       | ``enabled``         | Persistent disk CSI Driver                              |
+---------------------+---------------------+---------------------------------------------------------+
| Image streaming     | ``disabled``        | See notes below...                                      |
+---------------------+---------------------+---------------------------------------------------------+
| Logging             | ``System``          | Less logging save money                                 |
+---------------------+---------------------+---------------------------------------------------------+
| Cloud Monitoring    | ``System``          | Less data save money                                    |
+---------------------+---------------------+---------------------------------------------------------+


Networking
~~~~~~~~~~

-  Default, public cluster. Of course, if you know what you're doing,
   you can also set up a private cluster and use a VPN or something
   like that. This is beyond the scope of this guide.
-  HTTP Load Balancing: enabled
-  Dataplane V2: enabled (this will take the network configuration
   files into account)
-  DNS: the default ``kube-dns`` is fine – unless you want to access
   internal services, then maybe you want to run ``Cloud DNS``.
   (2023-10: IIRC something has changed related to that setting)

Security
~~~~~~~~~~

-  Shielded GKE nodes: yes

Features
~~~~~~~~~~

-  Enable Compute Engine **persistent disk CSI Driver**
-  Disable image streaming: I tried it when enabled, but maybe
   because those images are so large or other reasons, it didn't
   really work. Rather, make sure to configure the :ref:`prepull` service.
   Also, this image streaming service will occupy some amount of
   memory. I think it's better to use it for projects and disk caching, but :term:`YMMV`.
-  Logging: yes, but only "System". In particular, projects and hubs
   generate a lot of log lines, which end up becoming expensive.
-  Cloud monitoring: yes, but only "System" (both cost money, so, just being conservative here)


.. _gke-node-pools:

Node Pools
----------

Now we add :ref:`two node pools <project-node-setup>`. That's where CoCalc will be actually active.
Two nodes pools are not strictly necessary, but it makes it easier to
scale up and down. In particular, projects will run in one pool only,
while all services are in the other pool.

The pool for services is of fixed size (e.g. 2 nodes), while the other
pool of variable size is for the projects.
Please read :doc:`../architecture` and :doc:`../ops/scaling` for more details about this.
There is also room to deviate from exactly these settings – they are
listed to give you an idea of what is necessary.


Pool: **Service**
~~~~~~~~~~~~~~~~~~

+---------------------+------------------------------------------------------------------+
| Parameter           | Value                                                            |
+=====================+==================================================================+
| Name                | "services-1" (if you change parameters, which you can't edit,    |
|                     | create a new pool and increment that number)                     |
+---------------------+------------------------------------------------------------------+
| Size                | 2 (1 is not enough, unless you allow some services               |
|                     | to run on project nodes as well, and also makes it more robust)  |
+---------------------+------------------------------------------------------------------+
| Surge update        | max=1                                                            |
+---------------------+------------------------------------------------------------------+
| Image type          | container optimized                                              |
+---------------------+------------------------------------------------------------------+
| Type                | at least ``e2-standard-2``, must be x86/64 architecture          |
+---------------------+------------------------------------------------------------------+
| Disk                | min. 50GB standard                                               |
+---------------------+------------------------------------------------------------------+
| Security            | secure boot (others leave as they are)                           |
+---------------------+------------------------------------------------------------------+
| Metadata            | Kubernetes Label: ``cocalc-role=services`` (that's ``key=value``)|
+---------------------+------------------------------------------------------------------+

Pool: **Projects**
~~~~~~~~~~~~~~~~~~

+---------------------+-----------------------------------------------------------------------------+
| Parameter           | Value                                                                       |
+=====================+=============================================================================+
| Name                | "projects-1"                                                                |
+---------------------+-----------------------------------------------------------------------------+
| Size                | 1 (scale it up later)                                                       |
+---------------------+-----------------------------------------------------------------------------+
| Surge update        | max=2 (temporarily more nodes)                                              |
+---------------------+-----------------------------------------------------------------------------+
| Image type          | container optimized                                                         |
+---------------------+-----------------------------------------------------------------------------+
| Machine type        | ``e2-highmem-4``, must be x86/64 architecture.                              |
|                     | This of course depends on what you really want to do.                       |
|                     | A "standard" project uses maybe around 0.5 gb ram and only a little bit     |
|                     | of CPU (1/10 on average). Hence, you usually need more memory than CPU.     |
|                     | If you know it will be CPU intensive, consider c2-standard-* machines!      |
+---------------------+-----------------------------------------------------------------------------+
| Disk                | 100GB balanced disk                                                         |
|                     | The project images are huge, and having a faster disk speeds up             |
|                     | downloading the image on a new node, and running programs in general.       |
|                     | The optional :ref:`prepull`  service loads the latest project image first,  |
|                     | before the node is set to be available for projects.                        |
+---------------------+-----------------------------------------------------------------------------+
| Security            | secure boot (others leave as they are)                                      |
+---------------------+-----------------------------------------------------------------------------+
| Spot VM             | if you understand and can tolerate that spot VMs get randomly rebooted,     |
|                     | and hence interrupt a running project, enable this                          |
|                     | – saves you a lot of money!                                                 |
+---------------------+-----------------------------------------------------------------------------+
| Kubernetes Label    | ``cocalc-role=projects``                                                    |
+---------------------+-----------------------------------------------------------------------------+
| Kubernetes Taints   | To make the prepull service work, set these taints:                         |
|                     |                                                                             |
|                     | .. list-table::                                                             |
|                     |                                                                             |
|                     |    * - **Key**                                                              |
|                     |      - **Value**                                                            |
|                     |      - **Effect**                                                           |
|                     |    * - cocalc-projects-init                                                 |
|                     |      - false                                                                |
|                     |      - NoExecute                                                            |
|                     |    * - cocalc-projects                                                      |
|                     |      - init                                                                 |
|                     |      - NoSchedule                                                           |
|                     |                                                                             |
+---------------------+-----------------------------------------------------------------------------+


.. warning::

  Those are *Kubernetes* Labels and Taints – not to be confused with the GCP labels (just called "Labels"!


.. _gke-database:

Database: Cloud SQL
-------------------

CoCalc requires a PostgreSQL database.
We use a Cloud SQL instance for that.
If you know what you're doing, you can run the DB in the cluster yourself –
there is nothing in particular special about using Cloud SQL.

+----------------------+---------------------------+----------------------------------------------------+
| Attribute            | Value                     | Description                                        |
+======================+===========================+====================================================+
| Name                 | ``cocalc-db``             | (choose whatever you want)                         |
+----------------------+---------------------------+----------------------------------------------------+
| Database version     | PostgreSQL 14             |                                                    |
+----------------------+---------------------------+----------------------------------------------------+
| Region               | same as the cluster       |                                                    |
+----------------------+---------------------------+----------------------------------------------------+
| High availability    | yes                       | probably a good idea, you can change this later    |
+----------------------+---------------------------+----------------------------------------------------+
| Machine              | start small               | shared core, 1 vCPU, ~0.6 gb ram (or ~1.5gb)       |
|                      |                           | Of course, check monitoring and adjust as needed!  |
+----------------------+---------------------------+----------------------------------------------------+
| Storage              | SSD, 10gb                 |      automatic storage increases                   |
+----------------------+---------------------------+----------------------------------------------------+
| Network              | private IP                |                                                    |
+----------------------+---------------------------+----------------------------------------------------+
|                      | disable public IP         | Just costs money, less secure                      |
+----------------------+---------------------------+----------------------------------------------------+
| Backup               | yes                       | opt-in if you like, start small                    |
+----------------------+---------------------------+----------------------------------------------------+
| Maintenance window   | Sunday, 4-5 am            |  :term:`YMMV`                                      |
+----------------------+---------------------------+----------------------------------------------------+
| Flags                | ``max_connections``: 100  |                                                    |
+----------------------+---------------------------+----------------------------------------------------+


.. note::

  - Storage: Keep in mind that the database stores all changes to documents.
    Therefore, the size increases with user activity. Said that, you
    probably won't see the database to grow beyond a GB anytime soon.
  - Network: Had to enable service networking API (which requires to have the "Network Admin" role)
    Selected to automatically allocate an IP range
  - Database: to access the DB, run the ``../database/db-shell.sh`` script – see :ref:`troubleshooting-database`.
  - Backup:
    - midnight at 4am
    - region: same as the database
    - 7 days of backup
    - Point in time recovery: 1 day
  - Maintenance window: Should be fine to set something, which is at night during the
    weekend: e.g. Sunday, 4-5 am.
  - Flags: With the default (due to low memory I guess) there weren't enough slots.
    Errors were like: ``remaining connection slots are reserved for non-replication superuser connections``

I guess you can change almost everything of the above later on as well.

Post setup
~~~~~~~~~~~~~~~~~~~

#. Create user "cocalc" (or whatever you want) with a password.
   Save the password somewhere; we'll later add it as a secret to the kubernetes cluster.
#. Create database "cocalc" (or whatever you want)

Storage
-------

We continue setting up the Cluster. So far, we have the "control plane"
in GKE and some nodes. Now, we need to setup the storage.

-  Above in features, we enabled "Enable Compute Engine Persistent Disk
   CSI Driver" `more
   info <https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver>`__
-  The config files here will use this to setup suitable
   `PVC <https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims>`__\ s
   and `Storage
   Classes <https://kubernetes.io/docs/concepts/storage/storage-classes/>`__.
-  The names of these PVC must match the references in the CoCalc
   deployment.

Run the following command to setup the storage classes:

.. code:: bash

   kubectl apply -f pd-classes.yaml

.. _nfs-server:

NFS Server
~~~~~~~~~~

The goal is to setup an NFS storage provisioner, which uses the PVC
"nfs-data" to store the data of projects, shared files and global
data/software.

.. code:: bash

   helm repo add nfs https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
   helm repo update
   helm search repo nfs

you should see ``nfs/nfs-server-provisioner`` in the output.

Look into the `gke/` subdirectory for more details.
In particular, check what ``nfs.yaml`` specifies.
It will create a disk storing the data of all projects and shared files
with the specified ``storageClass`` – they were defined in the previous step.
You maybe have to tune the config file to your needs!

.. code:: bash

   helm upgrade --install nfs nfs/nfs-server-provisioner -f gke/nfs.yaml

NOTE: as of writing this, there was a problem with publishing newer
docker images. Hence `according to this
ticket <https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner/issues/115>`__
I had to add ``--version=1.5.0`` for an older variant of that chart.
This problem has been resolved.

Now:

.. code:: bash

   kubectl get storageclasses

should list ``nfs``.

Ref:

-  https://kubernetes-sigs.github.io/nfs-ganesha-server-and-external-provisioner/
-  https://artifacthub.io/packages/helm/kvaps/nfs-server-provisioner

.. include:: ../_shared/custom-pvc.rst

.. _gke-disk-backup:

Disk Backup
~~~~~~~~~~~

Once you deployed the NFS server, you'll notice a new disk. They're
listed in "Compute Engine" → "Disks".

The simplest way to get some backup is to setup a `Snapshot Schedule`_.
With that, GCP will make consistent snapshots of the disk, which you can
restore from – or create a new disk from an older snapshot.

For that, go to "Compute Engine" → "Disks" → "pvc-(the uuid you see in
``kubectl get pv``)" → "Edit" → "Create snapshot schedule". Daily for
two weeks sounds good.

BTW, that's also the place where you can increase the disk size.

Next steps
----------

The next steps are to setup NGINX ingress + NodeBalancer.
So, continue in the ``/ingress-nginx`` and ``/letsencrypt`` subdirectories.

You also have to setup the
:ref:`credentials for pulling from the private docker registry <private-docker-registry>`.

Once all this is done, you can configure and deploy the :doc:`HELM Chart for CoCalc <../deployment>`.

Testing
-------

-  First steps:

   -  After the initial deployment, set the IP you see in the
      LoadBalancer (``kubectl get svc`` → look for ``LoadBalancer`` with
      an external IP) at your DNS provider.
   -  Then try to open ``https://[cocalc-your-domain.tld]/`` in your
      browser.
   -  You should be able to sign in directly as Admin, with the
      credentials set in your :ref:`my-values.yaml <my-values.yaml>` config file.
      Of course, you should change your password.

-  Functionality:

   -  A good test is to create a new project, and then open a terminal
      and run ``htop``. You should see a script starting the project
      hub, a little bit of CPU activity, and not much more – maybe the
      sshd server for connecting via the SSH gateway.
   -  Next, create some Jupyter Notebooks (Python3, R, …), create a
      LaTeX ``latex.tex`` file, and maybe some other files. Each one of
      these should work as expected.
   -  Finally, explore your "Admin" panel, and see if the "Server
      Settings" are as expected. At the bottom you can test the email
      setup, by sending a password reset email.
   -  As Admin, you can also create a file like ``data.cocalc-crm``,
      which will allow you to look at various database tables, tie user
      activity to projects, etc.


Cost Control
------------

The above cluster + associated services and resources incur costs. You
can check up on that by going to: "Billing" (your billing account of
your project) → Cost Management: "Reports"

-  You can see a daily graph of your usage, use the top-right above the
   chart drop-down to switch to "daily cumulative" to see a trend for the
   current billing period (for me, it's a month).
-  On the right hand side, you can get more details by selecting "SKU"
   in the "Group by" selector. ("stock keeping unit" is the smallest
   part GCP is selling to you)

In the table below, click on "Cost ↓" to see them sorted in a decreasing
way, or "Subtotal ↓", after applying discounts & co.

-  What you should see is that the cluster itself costs something, but
   you get a credit for one in a single zone (not region). `See notes
   here <https://cloud.google.com/kubernetes-engine/pricing#cluster_management_fee_and_free_tier>`__:

    The GKE free tier provides $74.40 in monthly credits per billing
    account that are applied to zonal and Autopilot clusters. If you only
    use a single Zonal or Autopilot cluster, this credit will at least
    cover the complete cost of that cluster each month.

-  The :term:`LoadBalancer` + external IP address also costs a rather fixed amount per month.
-  Logging costs proportionally to the data, hence we did disable everything except "System".
-  If you use GCP's "SQL" for running the PostgreSQL database, don't use
   an external IP, since that would also cost you a fee to rent it.
-  The bulk of your cost are CPUs + Memory, though. See notes about
   "Spot VM" above for running the CoCalc projects on these.
-  Disk storage is rather cheap.
-  Egress Network traffic is the last item to think about. e.g. if your
   users watch a lot of videos by streaming them from the platform, you
   might end up getting charged significantly.


Monitoring / Uptime Check
-------------------------

The "uptime check" in GCP periodically pings your page.

Price: It has a free quota, hence we dial it down a bit to stay below
it. Make sure to `read about it's
pricing <https://cloud.google.com/stackdriver/pricing#monitoring-pricing-summary>`__.
E.g. 31 days, 3 ping locations every 5 minutes are:
``31 * 24 * (60 / 5) * 3 = 26784``. Well below the 1M free quota, as of
writing this.

To get started very simple, you can setup something like that.:

1. Open ``/monitoring/uptime/create`` in the GCP console to create a new
   uptime check
2. Target:

   -  HTTPS
   -  URL (to check from the "outside" if everything is ok)
   -  Hostname: the "DNS" entry
   -  Path: keep it blank for "/" (i.e. "hub-next"). Other interesting
      targets are ``/stats`` (hub-websocket) or ``/static/app.html``
      (static).
   -  Check frequency: 5 minutes (that's the ``60/5`` in the calculation
      above)
   -  Expand the target options:
   -  Regions: Just pick 3, not all of them.
   -  GET Method on Port 443 & Validate SSL certificate!

3. Validation:

   -  Timeout 10s (or maybe better 30s, i.e. something is going on, but
      not a real issue yet?)
   -  Content matching: here you need to get creative. Maybe check for a
      small string in the content, e.g. the custom name of our instance,
      or ``<html>`` for ``static``.
   -  No logging (just adds up the logging quota, I guess)
   -  Response code 2xx

4. Alerts:

   -  Name: "[your instance name] is down"
   -  Duration: 5 Minutes (?)
   -  Notification: here, you have to select how to notify, there is a
      whole setup behind this. At minimal, it should send you an email.

At the very end is a "Test" button. Check that it actually says that the
page is up, before arming it :-) The first time around it might take a
bit longer to respond, subsequent tests should be quicker – next.js
warmed up.

Then click "create", of course. All of the above can be changed later as
well…

.. note::

  Since the above just checks paths at certain domains,
  you can setup the same at another service as well.


.. _GKE Cluster: https://cloud.google.com/kubernetes-engine
.. _Snapshot Schedule: https://cloud.google.com/compute/docs/disks/scheduled-snapshots


GPU Nodes
------------------------------

These are just quick notes how to add a pool of GPU nodes to the cluster, managed by GKE.
All node specific settings are essentially optional.
The settings below are just what I selected to add one small node with a T4 GPU and up to 4 projects are able to share it using time-sharing.
The kubernetes related settings are essentially the same as above, and need to be the same.
Also, note, that GKE will add additional taints to these nodes!

#. Create a node pool ``gpu-1``, of size "1"
#. Select as node type "GPU", then "NVIDIA T4", and the VM is ``n1-highmem-4``.
#. Select "Time-sharing" and "4"
#. GPU Driver: "user managed". I tried Google managed, but it didn't work.
#. Balanced disk, 128GB
#. Local SSD Disk: "0" (instead of 2!)
#. Security: click on "Enable secure boot"
#. Metadata: same as above

  * Kubernetes label: ``cocalc-role=projects``

  * Taints: (see note below, ``gpu-operator`` somehow ignores tainted nodes, even with taint toleration – bug!?)

    * ``NoExecute:cocalc-projects-init=false``

    * ``NoSchedule:cocalc-projects=init``

#. Due to "manually managed GPU driver", continue at `feedbackManage the GPU Stack with the NVIDIA GPU Operator on Google Kubernetes Engine (GKE) <https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-operator>`_.

WARNING: I don't know why, but adding taint tolerations *should* work, but it does not.
So, the workaround is to simply not taint these GPU nodes – they'll have that ``"nvidia.com/gpu"`` taint, though.

Check what the documentation says, and add these values for taint tolerations from a file ``gpu-values.yaml``::

    tolerations: &tol
      - key: "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "cocalc-projects"
        operator: "Exists"
      - key: "cocalc-projects-init"
        operator: "Exists"

    daemonsets:
      tolerations: *tol

Then install/update the helm app ``gpu-operator`` in the namespace ``gpu-operator``::

    helm -n gpu-operator upgrade --install gpu-operator nvidia/gpu-operator \
      --wait \
      --set hostPaths.driverInstallDir=/home/kubernetes/bin/nvidia \
      --set toolkit.installDir=/home/kubernetes/bin/nvidia \
      --set cdi.enabled=true \
      --set cdi.default=true \
      --set driver.enabled=false \
      -f gpu-values.yaml

For sharing a GPU using time slicing, follow the `GPU Sharing one cluster-wide configuration <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html#applying-one-cluster-wide-configuration>`_.

Once this is set up, create a :ref:`license for accessing a GPU <gpu-support>` and deploy a :ref:`CUDA software environment <project-image>` (as of writing this, I made one based on CUDA 11.x).