Running Melissa in a virtual cluster

In this tutorial we demonstrate the deployment of Melissa in a slurm virtual cluster.

Note

Information for OAR and more can be found on the original virtual cluster documentation.

Setting up the virtual cluster

The requirements to run an LXD based virtual cluster are:

  • LXD 3.0 or newer (older releases may work but were never tested),
  • distrobuilder.

On debian systems, these are available through snap

sudo snap install lxd
sudo snap install distrobuilder --classic

First let us run the following command to start the interactive configuration process:

sudo lxd init

Note

Please when asked, apply the default configuration.

Then, LXD can be added to the machine groups:

sudo adduser <your username> lxd
groups

The groups command should print a series of groups the user is part of and this list should end with lxd.

Note

This may require to reboot your machine to take effect.

Next, clone and move to the melissa-ci repository can be cloned

git clone https://gitlab.inria.fr/melissa/melissa-ci.git
cd melissa-ci

By creating and moving to a temporary folder as follows:

mktemp -d  # this will create a temporary dir in /tmp/
cd /tmp/tmp.<some-name>
the user can then execute distrobuilder:
sudo distrobuilder build-lxd /path/to/melissa-ci/lxd/rockylinux.yaml \
 -o image.release=8 \
 -o image.architecture=x86_64 \
 -o image.variant=virtual-cluster \
 -o source.variant=boot

Note

Because CentOS v8 is not maintained anymore, CentOS was replaced with Rocky Linux (same distribution as Jean-Zay).

Finally, the image can be imported like this:

lxc image import \
 --alias 'virtual-cluster/rockylinux/8/amd64' \
 -- lxd.tar.xz rootfs.squashfs

Launching the virtual cluster

Now that the LXD image was built and imported, go to the melissa-ci/virtual-cluster directory. The following command shows the available parameters of the launch-virtual-cluster.py script:

python3 launch-virtual-cluster.py --help

Warning

  • For a given scheduler, this script first stops and deletes all existing virtual clusters of this kind.
  • In every virtual cluster one container will be dedicated to the batch scheduler. Hence at least two containers are required.

In order to launch a virtual cluster with 12 containers (one server and eleven compute nodes) with slurm as a batch scheduler, one should use the following command:

python3 launch-virtual-cluster.py slurm 12 'virtual-cluster/rockylinux/8/amd64'

The next command is used to connect to the master node which corresponds to the virtual cluster front-end:

lxc exec slurm-0 -- sudo --login --user john

The virtual cluster comes with its own OpenMPI package accessible with the module manager:

module load mpi

Then, an MPI application can simply be started this way:

srun -n 3 echo TEST

In addition, the user is free to use any package manager (e.g. spack, pip, etc.) to add dependencies to the cluster environment.

Note

Advanced configuration of the virtual cluster environment can require root access which can be activated by connecting to the virtual cluster with this command:

lxc exec slurm-0 -- bash

Running Melissa in the virtual cluster

Melissa can then be installed according to the Quick Install instructions. Afterwards, the user can finally run a first study by following one of the corresponding tutorials:

Warning

With slurm scheduler, the virtual cluster only supports groups of unit size. To circumvent this issue, the user can use slurm-openmpi instead.