Datastore¶
If enabled (via global.datastore.enabled), in the Project Settings a
configuration panel “Cloud storage & remote filesystems” appears. This
allows users to mount remote filesystems into the particular project.
This supports SSHFS, AWS S3 and Google Cloud Storage.
Under the hood, “Datastore” is a sidecar for the project, which mounts
these filesystems according to their configuration in /data/[name]
(where name is the name of the datastore). This mountpoint is
propagated to the project container from the host. If the directory
~/data is not taken, the project will automatically create a symlink
to that global directory. Therefore, collaborators of the project can
use and see this filesystem, but they do not know the secret, don’t see
the raw configuration files, and also cannot interact with the actual
process doing the FUSE mount. The “secret” is hidden in the user
interface, it’s not sent to the web client.
The “read-only” mode enables the
romount option for the FUSE mount.To make the filesystem perform well, it does a bit of caching, but only with a small timeout. This means if you give it a few seconds to read/write sync, it’s possible to do a bit of collaboration via the same mounted filesystem. It’s not really recommended, but possible. Also note that there is filesystem level polling of discovered directories in CoCalc’s projects, which means that remote changes to these files will eventually show up as well and update in an opened editor. Those projects are also cached on CoCalc’s side.
Requests to support other remote filesystems are welcome, and if there is a robust tool and a way to easily configure them, we certainly consider adding it.
Pro-tip: if a project is set to “Always Running”, you can use the SSHFS configuration in combination with the SSH Gateway to mount a directory from another project. This is a bit of a hack, but it works.
S3 fuse backend¶
For type=s3 mounts, the datastore container ships with two fuse
clients and picks one as the default:
geesefs (default): https://github.com/yandex-cloud/geesefs. Faster on large files; honors the optional
regionfield of a mount config (and needs it for non-AWS endpoints it cannot auto-detect).s3fs (legacy fallback): https://github.com/s3fs-fuse/s3fs-fuse. Ignores
region. Kept for compatibility with existing deployments.
The deployment-wide default is selected via the DATASTORE_S3_BACKEND
environment variable on the datastore container. The chart exposes this
as manage.datastore.s3Backend in my-values.yaml:
manage:
datastore:
# Allowed: "geesefs" or "s3fs". Default (empty) -> "geesefs".
s3Backend: "geesefs"
Leaving the field empty omits the env var entirely; the datastore
container then uses its built-in default (geesefs). An invalid value
crashes the datastore sidecar at startup rather than silently falling
through.
Individual mounts may override the default by setting backend to
geesefs or s3fs in the mount’s JSON config (useful for migration
or A/B testing).
Note
Read-only buckets and IAM permissions. S3 datastore mounts default
to read/write. If the mount’s IAM credentials only grant read access
(s3:GetObject / s3:ListBucket but not s3:PutObject), set
"readonly": true in the mount config so the FUSE layer rejects
writes upfront. Otherwise users may observe confusing behavior with
geesefs: a mkdir returns success and the directory shows in
ls, but the underlying PutObject is silently denied by S3 and
the directory exists only in the geesefs in-memory cache. Repeated
write attempts can also accumulate retry state and eventually crash
the geesefs daemon, leaving the mountpoint in a stale “Transport
endpoint is not connected” state until the project is restarted.
The s3fs backend surfaces the IAM denial directly as EACCES
and is a safer fallback for read-only credentials.