Software Environment

In the realm of CoCalc, our mission is to empower users to execute code effectively and collaborate effortlessly. This chapter will dive into the essential theme of executing code!

Every user interacts with a complex ecosystem—one that abstracts away the hardware while relying on programming languages, scripts, external packages, libraries, and necessary data files.

Users have varied needs when it comes to executing their code; some seek highly specific packages, while others prefer a robust, stable environment. You may receive requests to install proprietary software for specialized tasks, or to update certain packages that are not available by default.

The good news is that CoCalc provides the flexibility to tailor the software environment of a project to meet these diverse requirements. We’ll explore three key approaches for customization:

  1. Within a Project: users can install their software packages directly within their project environments.

  2. Global Software in /ext: install software globally, shared across projects.

  3. Custom Software Environments: build, host and deploy customized software environments as Docker images

By understanding these options, you can create a more accommodating and efficient workspace for your users.

Within a Project

Users are able to install their own software packages in their projects. A project is essentially a full Linux “user” environment, without elevated privileges. This means all the usual ways to install software as a user are available, e.g. for Python:

pip install --user --upgrade [mypackage]

for R Software:

install.packages("[mypackage]", lib="~/R")

or for GNU Autotools based packages:

./configure --prefix=$HOME/.local
make
make install

or CMake:

mkdir build
cd build
cmake ..
cmake --install .. --prefix=$HOME/.local

Read more:

Global Software

See Projects Software about how to get read/write access to the global /ext mountpoint. This is quite powerful, because it allows you to install software packages globally – available to all projects.

Note

Useful detail: if a file /ext/.bashrc exists, it is sourced by all projects via their local ~/.bashrc file. This means it is possible to extend the path, configure aliases, etc. right there. If some users want to opt out for a project, they just have to comment or delete this from the bottom of their local ~/.bashrc file.

Custom Jupyter Kernels

It is possible to globally deploy customized Jupyter Kernels. Each sub-directory of /ext/jupyter/kernels/ could hold your own kernels, where that /ext mountpoint is where the globally shared read-only filesystem is mounted in all projects (see Projects Software).

This works, because by default $JUPYTER_PATH is configured and points to that jupyter directory. Globally installed kernels with the same directory name can be overwritten, because that path takes precedence – e.g. python3.

To check if a kernel is available:

  1. Open a terminal in a project and run jupyter kernelspec list.

  2. Try to start it via jupyter console --kernel=[kernelname].

Note

For a Python kernel, we suggest to add these parameters to the argv array in the kernel.json file:

  • "--HistoryManager.enabled=False": there is no need to record the history in a local database. In particular, if you’re on an NFS file-system, the underlying Sqlite database could cause problems in the form of “database is locked” errors, preventing the kernel from starting.

  • "--matplotlib=inline": to automatically load matplotlib

Ref.:

Custom Software Environment

The entire project image can be provided by you, hosted on a Docker registry of yours. This is the most flexible way to customize the software environment and provides complete control over the user computing environment.

Benefits:

  • Complete control over the environment including operating system, packages, and configurations

  • Use existing environment definitions (Dockerfiles and build scripts) you already possess

  • Offer multiple environments for users to select from, potentially categorized by specific tasks

  • Incorporate proprietary packages, code, or configurations

  • Version control your software environment at your own pace, allowing users to choose:
    • “stable” releases, named with release date. This enables your users to adhere to a specific environment and avoid disruptions due to software updates.

    • “testing” releases, which will evolve into the new default after some iterations and updates. This allows users to test the new environment and provide feedback before it becomes “stable”.

  • Organizational compliance by including required security tools, corporate certificates, and compliance software

  • Performance optimization by pre-installing and configuring software for your specific use cases

Default “Full” Environment

CoCalc OnPrem includes a comprehensive “full” software environment (software-YYYYMMDD-HHMM images) that serves as both a reference implementation and a foundation for customization. This environment includes:

Programming Languages & Frameworks:
  • Python 3 with 200+ scientific packages (NumPy, SciPy, Pandas, Matplotlib, PyTorch, TensorFlow, JAX)

  • R with 80+ packages and RStudio Server

  • Julia with essential packages (IJulia, Pluto, Plots)

  • SageMath for mathematical computing (optional)

  • Java (OpenJDK), Go, C/C++ (GCC, Clang)

  • Octave for MATLAB compatibility

Development Tools:
  • VS Code Server (optional)

  • Jupyter Lab and Jupyter Notebook

  • Git and version control tools

  • LaTeX with full TeXLive distribution

  • Build tools (Make, CMake, Autotools)

Scientific & Data Analysis:
  • QGIS for geospatial analysis

  • Pandoc and Quarto for document processing

  • Statistical and visualization libraries

  • Machine learning frameworks

Desktop Applications (optional):
  • X11 server with Xpra for remote desktop

  • GIMP, Inkscape, LibreOffice

  • Scientific applications (Spyder, Scilab)

  • Development tools (Texmaker, TeXstudio)

Building Custom Images

Note

These files are only accessible if you have access to the private repository.

The recommended approach is to start with the default “full” environment and customize it for your needs. The build system uses a modular approach with specialized installation scripts.

Project Structure: The ./project/full/ directory contains the reference implementation with these key components:

  • Dockerfile - Build configuration with customizable arguments

  • common.sh - Base system packages and prerequisites

  • python.sh - Python ecosystem with virtual environment setup

  • full.sh - Core development tools and libraries

  • r.sh - R statistical computing environment

  • julia.sh - Julia scientific computing setup

  • sage.sh - SageMath installation (optional)

  • vscode.sh - VS Code Server (optional)

  • x11.sh - X11 desktop environment (optional)

  • user-2001.sh - Project user setup

  • py3.txt - Python package list for customization

Build Arguments: Control optional components during build:

ARG INSTALL_VSCODE=true    # Enable VS Code Server
ARG INSTALL_SAGE=10.5      # SageMath version or 'none'
ARG INSTALL_X11=true       # Enable X11 desktop support

Customization Examples:

  1. Add Corporate Packages:

    # In common.sh, add corporate packages
    aptitude install -q -y corporate-security-agent corporate-monitoring
    
    # In python.sh, add to py3.txt
    echo "corporate-python-lib==1.0.0" >> py3.txt
    
  2. Configure Corporate Infrastructure:

    # In common.sh, add corporate CA certificates
    curl -o /usr/local/share/ca-certificates/corporate-ca.crt https://ca.corp.com/cert
    update-ca-certificates
    
    # Configure corporate proxy
    echo 'export https_proxy=https://proxy.corp.com:8080' >> /etc/environment
    
  3. Optimize for Specific Use Cases:

    # Build without desktop applications for server use
    docker build --build-arg INSTALL_X11=false -t custom-server .
    
    # Build with specific SageMath version
    docker build --build-arg INSTALL_SAGE=10.4 -t custom-math .
    
  4. Add Custom Software:

    # In full.sh, add domain-specific tools
    wget https://example.com/custom-tool.deb -O /tmp/tool.deb
    dpkg -i /tmp/tool.deb
    
    # Configure environment variables
    echo 'export CUSTOM_TOOL_HOME=/opt/custom-tool' >> /etc/cocalc_init.sh
    

Requirements for Custom Images

Essential Requirements:

  • User Configuration: The image must define a user named user with UID/GID 2001:

    # user "user" must be 2001:2001. Do not change the UID, assumed in several places!
    RUN umask 022 \
      && mkdir /home/user \
      && chown 2001:2001 -R /home/user \
      && /usr/sbin/groupadd --gid=2001 --non-unique user \
      && /usr/sbin/useradd --home-dir=/home/user --gid=2001 --uid=2001 --shell=/bin/bash user
    
  • System Utilities: Install essential utilities for CoCalc operation:

    RUN apt-get update && apt-get install -y \
      file mount psutils curl wget git vim \
      python3 python3-pip build-essential
    
  • PATH Configuration: Keep /cocalc/bin in the $PATH for CoCalc functionality:

    # In /etc/cocalc_init.sh
    export PATH="/cocalc/bin:$PATH"
    
  • Initialization Script: Create /etc/cocalc_init.sh for project startup customization:

    # Example /etc/cocalc_init.sh
    export CUSTOM_VAR="value"
    source /opt/venvs/cocalc/bin/activate  # Activate Python environment
    export PS1='\w\$ '  # Set prompt
    

Python Environment Setup:

For Python-based environments, follow this pattern:

# Create virtual environment
mkdir -p /opt/venvs
python3 -m venv /opt/venvs/cocalc

# Install packages
/opt/venvs/cocalc/bin/pip install jupyter ipykernel [other packages]

# Configure kernels
/opt/venvs/cocalc/bin/python -m ipykernel install --name python3 --display-name "Python 3"

# Activate by default
echo "source /opt/venvs/cocalc/bin/activate" >> /etc/cocalc_init.sh

Jupyter Kernel Integration:

Ensure Jupyter kernels are properly configured:

# Install kernels in system location
mkdir -p /usr/local/share/jupyter/kernels

# For Python kernel
/opt/venvs/cocalc/bin/jupyter kernelspec install --system python3_kernel_spec

# For R kernel (example)
echo 'IRkernel::installspec(user = FALSE)' | R --no-save

Build Process

Standard Build:

# Build with all components
docker build -t custom-cocalc:latest ./project/full/

Customized Build:

# Build optimized for specific use case
docker build \
  --build-arg INSTALL_VSCODE=true \
  --build-arg INSTALL_SAGE=none \
  --build-arg INSTALL_X11=false \
  -t custom-cocalc:optimized ./project/full/

Integration with CoCalc OnPrem

Registry Configuration:

  1. Build and Push Image:

    docker build -t your-registry.com/cocalc/custom-env:20250115-1200 .
    docker push your-registry.com/cocalc/custom-env:20250115-1200
    
  2. Configure values.yaml:

    global:
      software:
        environments:
          custom-env:
            title: "Custom Environment"
            descr: "Organization-specific environment with custom tools"
            tag: "custom-env-20250115-1200"
            group: "Custom"
            registry: "your-registry.com/cocalc"
    
  3. Deploy Changes:

    helm upgrade cocalc ./cocalc -f your-values.yaml
    

Multiple Environment Strategy:

global:
  software:
    environments:
      data-science:
        title: "Data Science"
        descr: "Optimized for data analysis and machine learning"
        tag: "data-science-20250115-1200"
        group: "Specialized"
      mathematics:
        title: "Mathematics"
        descr: "Mathematical computing with SageMath"
        tag: "mathematics-20250115-1200"
        group: "Specialized"
      development:
        title: "Development"
        descr: "Software development environment"
        tag: "development-20250115-1200"
        group: "Development"

Legacy Directory References

For backward compatibility and additional examples:

  • Essential Environment: ./project/essential/ - Minimal but complete setup

  • Import Examples: ./project/import/ - Adapting existing images:
    • texlive/ - LaTeX-focused environment

    • jupyter-datascience/ - Jupyter ecosystem integration

    • anaconda-gpu/ - GPU-accelerated computing

    • cuda-*/ - CUDA development environments

Testing Custom Images

Local Testing:

# Test basic functionality
docker run -it --rm custom-image:latest python3 -c "import numpy; print('OK')"
docker run -it --rm custom-image:latest R --version
docker run -it --rm custom-image:latest julia --version

Integration Testing:

# Test with CoCalc project server
docker run -it --rm \
  -v /path/to/project/data:/home/user/data \
  custom-image:latest \
  bash -c "source /etc/cocalc_init.sh && python3 -c 'import sys; print(sys.path)'"

Kernel Testing:

# Verify Jupyter kernels
docker run -it --rm custom-image:latest jupyter kernelspec list
docker run -it --rm custom-image:latest jupyter console --kernel=python3

Troubleshooting

Common Issues:

  • Permission Problems: Ensure UID/GID 2001 is used consistently

  • Path Issues: Verify /cocalc/bin remains in PATH

  • Kernel Problems: Check Jupyter kernel specifications and permissions

  • Package Conflicts: Review package installation order and dependencies

Debugging Commands:

# Check user configuration
docker run -it --rm custom-image:latest id user

# Verify environment setup
docker run -it --rm custom-image:latest bash -c "source /etc/cocalc_init.sh && env"

# Test file operations
docker run -it --rm custom-image:latest bash -c "touch /tmp/test && ls -la /tmp/test"

Debug Mode Build:

# Build with detailed output
docker build --no-cache --progress=plain -t debug-image .

Note

Once your built project image is on your own registry, configure the Software Environment of your CoCalc deployment, to make it available to your users.

Note

If something goes wrong and e.g. creating new files does not work, you have to use the “mini terminal” in the explorer to create a terminal file: touch term.term. Then, open that term.term file to investigate the environment to understand what’s going on. E.g. for creating new files that /cocalc/bin needs to be in the $PATH and cc-new-file has to work.