Skip to content

Melissa Launcher with Singularity on a SLURM cluster

This guide provides a focused setup for using Melissa with Singularity on an HPC cluster. In this scenario, the melissa-launcher runs outside (as opposed to the containerized Melissa installed with heavy dependencies), and it spawns srun commands that execute the Melissa server and clients inside a Singularity container.


1. Install the Launcher as a standalone binary

The melissa-launcher is designed as a lightweight, front-end Python orchestration tool. Unlike the melissa-server, which handle heavy data processing and require complex dependencies like MPI, ZeroMQ, or deep learning frameworks (PyTorch), the launcher acts primarily as a job scheduler and monitor. Because it only requires a standard Python environment with minimal base libraries to interface with a batch scheduler, it can be installed directly on a cluster.

# Use python >= 3.11
python3 -m pip install --user --no-cache-dir \
    -Ccmake.define.LAUNCHER_ONLY=ON \
    "melissa[launcher] @ git+https://gitlab.inria.fr/melissa/melissa.git"
Either ~/.local or in a virtual environment. Make sure to expose the bin/.

This may install some redundant dependencies, but the installation should be quick.

2. Configuration Overview

Since the launcher is outside the container, your configuration file must tell it how to "wrap" the server and client commands inside singularity exec.

Add the singularity_config section in the launcher_config. This ensures that every task (Server and Clients) launched by Melissa is wrapped correctly.

{
    "launcher_config": {
        "scheduler": "slurm-global",
        "singularity_config": {
            "container_path": "/path/to/container-with-mpi-melissa-installed.sif",
            "exec_options": [
                "--nv",
                "-B", "/path/to/bind:/path/to/bind"
            ]
        }
    }
}

Important

The singularity wrapper is currently supported for slurm-global scheduler alone.

3. How it Works: The Execution Chain

When you run melissa-launcher, it automates the following steps for you:

  1. Allocation: The launcher uses Slurm to request nodes and CPUs.
  2. Wrapping: It takes server and client script commands and prefixes it with the singularity exec.
  3. Binding: It may be necessary to bind the cluster's filesystem visible, in some cases.
  4. GPU Support: The --nv flag passes the host's NVIDIA drivers into the container for GPU-accelerated tasks.

Important

To ensure the containerized Melissa can communicate with the host's Slurm scheduler, ensure your container is configured to use the Hybrid MPI model.

  • Recommendation: Use --mpi=pmi2 in your Slurm global settings if the default pmix causes version mismatch errors.
  • Environment: Singularity inherits your host environment by default, which helps the containerized MPI find the Slurm process IDs.

For a full submission example, See examples/heat-pde/heat-pde-dl/study_singularity_global.sh.