Continuous Integration and tests

Continuous Integration

Melissa relies on GitLab standard continuous integration tool and the script .gitlab-ci.yml located at the directory root. The CI general structure can be illustrated as follows:

image

The pipeline is accelerated by caching a base docker image which holds all necessary dependencies to check, test, and install all features of Melissa. The base docker image should not be rebuilt unless developers add/update dependencies to the project. Therefore, the docker build stage of the CI can only be manually launched by clicking the play button of the build stage of the pipeline on the melissa gitlab. This will begin building the base docker image, followed by pushing the container to the Melissa container registry. All subsequent pipelines will use the newly built base container.

The various stages of the CI pipeline can be run on INRIA gitlab shared runners (ci.inria.fr and small/medium/large tags) or our in-house DATAMOVE computer maiko (docker tag). It is preferable to use maiko because it caches the base docker image - expediting the pipeline. In comparison, the INRIA shared runners are explicitly set to not allow image caching.

Some of the CI stages rely on a virtual cluster running locally on maiko. This runner can be selected by using tags: lxd-runner. Details of how this cluster can be re-built or updated are located in the respective repository.

Melissa Unit Tests

The best way to test Melissa is to run Melissa. Indeed, as a distributed application, most bugs are detected in "real world" conditions (i.e. at scale and on a supercomputer). Nevertheless, absent-mindedness programming errors can be spotted through unit testing.

The launcher was designed as a modular piece of code independent of the batch scheduler at hand. As a results, the higher level structure parts (e.g. the I/O master and state machine) are thoroughly testable.

Similarly, the server was entirely revisited and redesigned so that the central objects (e.g. BaseServer, Simulation, FaultTolerance, etc.) could be instantiated and tested separately.

Note

Parts of the server tests are still under development and should be added progressively.

For now, the tests are contained in the tests folder which has the following structure:

tests/
├── ci_configs
    ├── openmpi_faulttol.json
    ├── study.sh
    ├── vc_fail_dl_server.py
    ├── vc_fail_sa_server.py
    ├── vc_slurm_openmpi.json
    ├── vc_slurm_semiglobal.json
    ├── vc_slurm.json
├── launcher
    ├── test_io.py
    ├── test_message.py
    ├── test_state_machine.py
├── scheduler
    ├── test_dummy.py
    ├── test_openmpi.py
    ├── test_scheduler.py
    ├── test_slurm.py
├── server
    ├── simple_sa_server.py
    ├── test_dataset.py
    ├── test_reservoir.py
    ├── test_sensitivity_analysis_server.py
    ├── test_server.py
├── utility
    ├── test_functools.py
    ├── test_networking.py
    ├── test_timer.py

As discussed in the Contributing tab, any contribution to the code should be compatible with the tests in place.

The latest interactive coverage report is available here. If developers wish to generate their own report locally, they need to run:

coverage run --source=melissa/ -m pytest tests/
coverage html -d coverage-report

which will run the unit tests and then create a folder with detailed summary information about the code coverage.

Melissa Integration Tests

The CI includes a variety of integration tests designed to run the full Melissa launcher + clients + server in a variety of configurations. These configurations include:

  • openmpi + deep learning torch server + python API based client
  • openmpi + parallel deep learning tensorflow server + C api based client
  • openmpi + sensitivity analysis server + C api based client + faul tolerance
  • slurm + deep learning torch server + C api based parallel client + fault tolerance
  • slurm-semiglobal + parallel deep learning torch server + C api based parallel client + fault tolerance
  • slurm-openmpi + sensitivity analysis server + C api based client

Fault tolerance is tested by using server files that kill themselves after a certain number of samples received. These files are stored in tests/ci_configs/. These server files are not to be used by users, they are for testing purposes only.

The openmpi tests are run in docker containers, while the slurm tests are run on a locally hosted virtual cluster. Details of how this cluster can be re-built or updated are located in the respective repository.