Building a new use-case¶
This tutorial assumes that the user is familiar with the terminology of Melissa introduced in Melissa overview.
Instrumenting the data-generator (i.e. the client
)¶
From the point of view of Melissa, a simulation (i.e. data-generator or client) manages the state of one or more fields or quantities (e.g., energy, temperature, or pressure). Each field or quantity can have its own mesh but these meshes must be fixed. For each kind of value to be analyzed by Melissa, the simulation must call
The MPI communicator must be application specific (see theMPI_Comm_get_attr
documentation regarding the property MPI_APPNUM
). For every time-step and for every kind of value, the simulation must call
Keep in mind that one time-step for Melissa does not have to equal one time-step in the simulation. After all data was sent, the simulation must call
This statement is obligatory and must take place before MPI_Finalize()
is called.
Note
The #include <melissa_api.h>
instruction is specific to C/C++ and must be adapted to the solver language. The Python and Fortran90 corresponding libraries are melissa_api.py
and melisa_api.f90
.
Warning
The list of quantities to be analyzed must be consistent with the fields given in the use-case configuration file <config-file>.json
.
Hints for Fortran Users¶
Although good practice encourages the use of mpi
module instead of mpif.h
(see here), melissa only supports the latter. Hence, the command:
The Melissa server being developed in C and since C and Fortran90 do not handle strings the same way, the field_name
passing of the melissa_init
subroutine can be responsible for the failure of Melissa. The compatibility between both languages is ensured by null terminating the Fortran field_name
. For a field named temperature
, this can be done through the following command:
field_name
is the first argument passed to the melissa_init
subroutine.
Building your server¶
For each use-case, a <use-case>_server.py
file should be created. This file implements a new server class by inheriting from the SensitivityAnalysisServer
or TorchServer
/TensorFlowServer
classes (see figure below). Depending on the user's application, class methods must (if abstract) or can (if optional) be instantiated. These are summarized in the following table:
Element | Name | Purpose |
---|---|---|
TorchServer |
server_online() |
Initiating data collection and directing the custom methods for acting on collected data (abstract). |
TorchServer |
configure_data_collection() |
Instantiates the data collector and buffer (abstract). |
TorchServer |
train() |
Use-case based training loop (abstract). |
BaseServer |
process_simulation_data() |
Method used to custom process data (abstract). |
BaseServer |
setup_environment() |
Any necessary setup methods go here. e.g. distributed data parallelism with PyTorch requires dist.init_process_group (abstract). |
TorchServer |
server_finalize() |
All finalization methods go here (optional). |
Note
Advanced users can inherit any of the BaseServer
methods, but typically these should remain untouched.
Note
The TorchServer
(or TensorFlowServer
) specializes the DeepMelissaServer
to properly implement all the distributed initialization/trainig/synchronization/finalization methods for PyTorch
(or TensorFlow
). Therefore, it is best for users to inherit from TorchServer
(or TensorFlowServer
) to avoid needing to re-implement the distributed wrappers. But if users wish to use a different library than PyTorch
(or TensorFlow
), they will need to copy the methodology shown in the TorchServer
(or TensorFlowServer
).
Setting the Melissa environment¶
In simple cases, setting up the environment can simply be done with the following command:
Considering the wide application scope of Melissa, managing dependencies can quickly become a burden in more complicated cases. Indeed, when the data-generator comes with specific dependencies different from and incompatible with those of the server, the use-case can require setting different dependencies with the client_config
and server_config
preprocessing_commands
bash commands lists (see Advanced installation).
Note
With compiled data generators, users tend to forget that the Melissa environment must first be set, otherwise the API libraries can't be found.
Configuring the study¶
To finish the use-case construction, the <config-file>.json
must finally be configured. This JSON
file consists of a dictionary comprised of multiple sub-dictionaries. The minimal expected elements are:
- The server file, class name and output directory:
-
The study general options:
-
The study specific options:
or
Note
The user can add any custom variable to any of these dictionary and access it from the server (see parameter_range
in both heat-pde
examples).
Note
Unless the full path is specified, the output directory will by default be created in the project directory.
Note
When Sobol indices are computed, the group_size
option in the configuration file is ignored and automatically set to group_size=nb_parameters+2
by the server.
- The launcher options:
"launcher_config": { "scheduler": "<scheduler>", # str "scheduler_arg": "<scheduler-options-for-all-jobs>", # List[str] "scheduler_arg_client": "<scheduler-options-for-clients>", # List[str] "scheduler_arg_server": "<scheduler-options-for-server>", # List[str] "fault_tolerance": <boolean>, # lower case boolean "client_executable": "<path/to/executable>" # str }
Note
The launcher supports a wide variety of options detailed in Melissa-Launcher.
-
Client configuration
-
Server configuration
Launching and debugging a use-case¶
Once the use-case has been configured properly it can be launched with the following command:
Note
When debugging we recommend to set the following options in the configuration file:
{
"study_options": {
"verbosity": 3,
...
},
"launcher_config": {
"fault_tolerance": false,
"verbosity": 3,
...
}
}
Since Melissa relies on three distinct components (launcher, server, clients), a failure can be due to errors coming from any of these elements. The first output to check is that of the launcher:
- An error occurring at the launcher level should be obvious to spot since it would directly be thrown inside the standard output (i.e.
melissa_launcher.log
). - An error occurring at the server level should be noticeable in the launcher output via a reference to an unexpected server/launcher disconnection, a time-out due to the prolonged absence of life signal or the detection of a server job failure. In this case, the user should check the server output (i.e.
melissa_server_rank.log
). - An error occurring at the client level should be indicated by client job failures (generally if one client fails, all client fail). The user should check any client error output (e.g.
openmpi.<uid>.err
,job.<uid>.melissa-client.err
,oar.<uid>.err
).
Note
The <uid>
of this file must be inferred from the job submission UID (i.e. unique identifier) which is attributed by the launcher at submission. For instance with the OAR
scheduler, if the launcher output writes:
id=15408
), the corresponding standard error file ot this job will be oar.1.err
.