Building a new use-case

This tutorial assumes that the user is familiar with the terminology of Melissa introduced in Melissa overview.

Instrumenting the data-generator (i.e. the client)

From the point of view of Melissa, a simulation (i.e. data-generator or client) manages the state of one or more fields or quantities (e.g., energy, temperature, or pressure). Each field or quantity can have its own mesh but these meshes must be fixed. For each kind of value to be analyzed by Melissa, the simulation must call

#include <melissa_api.h>
melissa_init("value-kind", grid_size, mpi_communicator);
The MPI communicator must be application specific (see the MPI_Comm_get_attr documentation regarding the property MPI_APPNUM). For every time-step and for every kind of value, the simulation must call
#include <melissa_api.h>
const double* values = ...;
melissa_send("value-kind", values);
Keep in mind that one time-step for Melissa does not have to equal one time-step in the simulation. After all data was sent, the simulation must call
#include <melissa_api.h>
melissa_finalize();
This statement is obligatory and must take place before MPI_Finalize() is called.

Note

The #include <melissa_api.h> instruction is specific to C/C++ and must be adapted to the solver language. The Python and Fortran90 corresponding libraries are melissa_api.py and melisa_api.f90.

Warning

The list of quantities to be analyzed must be consistent with the fields given in the use-case configuration file <config-file>.json.

Hints for Fortran Users

Although good practice encourages the use of mpi module instead of mpif.h (see here), melissa only supports the latter. Hence, the command:

include "melissa_api.f90"
must be accompanied by:
include "mpif.h"

The Melissa server being developed in C and since C and Fortran90 do not handle strings the same way, the field_name passing of the melissa_init subroutine can be responsible for the failure of Melissa. The compatibility between both languages is ensured by null terminating the Fortran field_name. For a field named temperature, this can be done through the following command:

character(len=12) :: field_name = "temperature"//char(0)
where field_name is the first argument passed to the melissa_init subroutine.

Building your server

For each use-case, a <use-case>_server.py file should be created. This file implements a new server class by inheriting from the SensitivityAnalysisServer or TorchServer/TensorFlowServer classes (see figure below). Depending on the user's application, class methods must (if abstract) or can (if optional) be instantiated. These are summarized in the following table:

Element Name Purpose
TorchServer server_online() Initiating data collection and directing
the custom methods for acting on collected data
(abstract).
TorchServer configure_data_collection() Instantiates the data collector and buffer
(abstract).
TorchServer train() Use-case based training loop (abstract).
BaseServer process_simulation_data() Method used to custom process data
(abstract).
BaseServer setup_environment() Any necessary setup methods go here.
e.g. distributed data parallelism with
PyTorch requires dist.init_process_group
(abstract).
TorchServer server_finalize() All finalization methods go here
(optional).

image

Note

Advanced users can inherit any of the BaseServer methods, but typically these should remain untouched.

Note

The TorchServer (or TensorFlowServer) specializes the DeepMelissaServer to properly implement all the distributed initialization/trainig/synchronization/finalization methods for PyTorch (or TensorFlow). Therefore, it is best for users to inherit from TorchServer (or TensorFlowServer) to avoid needing to re-implement the distributed wrappers. But if users wish to use a different library than PyTorch (or TensorFlow), they will need to copy the methodology shown in the TorchServer (or TensorFlowServer).

Setting the Melissa environment

In simple cases, setting up the environment can simply be done with the following command:

. "/path/to/melissa/melissa_set_env.sh"

Considering the wide application scope of Melissa, managing dependencies can quickly become a burden in more complicated cases. Indeed, when the data-generator comes with specific dependencies different from and incompatible with those of the server, the use-case can require setting different dependencies with the client_config and server_config preprocessing_commands bash commands lists (see Advanced installation).

Note

With compiled data generators, users tend to forget that the Melissa environment must first be set, otherwise the API libraries can't be found.

Configuring the study

To finish the use-case construction, the <config-file>.json must finally be configured. This JSON file consists of a dictionary comprised of multiple sub-dictionaries. The minimal expected elements are:

  • The server file, class name and output directory:
    {
    "server_filename": "<use-case>_server.py",  # str
    "server_class": "<use-case>Server",         # str
    "output_dir": "<result-folder>"             # str                      
    }
    
  • The study general options:

    "study_options": {                 
        "field_names": ["<list-of-fields>"],    # List[str]
        "num_clients": <sampling-size>,         # int
        "num_samples": <nb-time-steps>,         # int
        "group_size": <nb-clients-per-group>,   # int
        "nb_parameters": <nb-parameters>        # int
    }
    
  • The study specific options:

    "sa_config": {                 
        "mean": <boolean>,          # lower case boolean
        "variance": <boolean>,      # lower case boolean
        "skewness": <boolean>,      # lower case boolean
        "kurtosis": <boolean>,      # lower case boolean
        "sobol_indices": <boolean>  # lower case boolean
    }
    
    or
    "dl_config": {                 
        "batch_size": <batch-size>,                 # int
        "per_server_watermark": <water-mark>,       # int
        "buffer_size": <buffer-size>,               # int
        "buffer": <buffer>                          # str
    }
    

Note

The user can add any custom variable to any of these dictionary and access it from the server (see parameter_range in both heat-pde examples).

Note

Unless the full path is specified, the output directory will by default be created in the project directory.

Note

When Sobol indices are computed, the group_size option in the configuration file is ignored and automatically set to group_size=nb_parameters+2 by the server.

  • The launcher options:
    "launcher_config": {
        "scheduler": "<scheduler>",                                 # str
        "scheduler_arg": "<scheduler-options-for-all-jobs>",        # List[str]
        "scheduler_arg_client": "<scheduler-options-for-clients>",  # List[str]
        "scheduler_arg_server": "<scheduler-options-for-server>",   # List[str]
        "fault_tolerance": <boolean>,                               # lower case boolean
        "client_executable": "<path/to/executable>"                 # str
    }
    

Note

The launcher supports a wide variety of options detailed in Melissa-Launcher.

  • Client configuration

    "client_config": {
        "preprocessing_commands": ["<bash commands>"],      # List[str]
        "executable_command": "<full execution command>"    # str
    }
    
  • Server configuration

    "server_config": {
        "preprocessing_commands": ["<bash commands>"],      # List[str]
    }
    

Launching and debugging a use-case

Once the use-case has been configured properly it can be launched with the following command:

melissa-launcher --config_name /path/to/project/dir/<config-file>

Note

When debugging we recommend to set the following options in the configuration file:

{
    "study_options": {
        "verbosity": 3,
        ...
    },
    "launcher_config": {
        "fault_tolerance": false,
        "verbosity": 3,
        ...
    }
}
This will provide the maximal amount of information in the standard output/error files and the study will immediately stop in case of failure (i.e. no instance will be automatically restarted).

Since Melissa relies on three distinct components (launcher, server, clients), a failure can be due to errors coming from any of these elements. The first output to check is that of the launcher:

  • An error occurring at the launcher level should be obvious to spot since it would directly be thrown inside the standard output (i.e. melissa_launcher.log).
  • An error occurring at the server level should be noticeable in the launcher output via a reference to an unexpected server/launcher disconnection, a time-out due to the prolonged absence of life signal or the detection of a server job failure. In this case, the user should check the server output (i.e. melissa_server_rank.log).
  • An error occurring at the client level should be indicated by client job failures (generally if one client fails, all client fail). The user should check any client error output (e.g. openmpi.<uid>.err, job.<uid>.melissa-client.err, oar.<uid>.err).

Note

The <uid> of this file must be inferred from the job submission UID (i.e. unique identifier) which is attributed by the launcher at submission. For instance with the OAR scheduler, if the launcher output writes:

2022-10-27T16:38:40 melissa.launcher.io            DEBUG    job launched uid=1 id=15408
and that a failure is detected afterwards for this specific job (id=15408), the corresponding standard error file ot this job will be oar.1.err.