A first Deep-Learning study

This tutorial assumes that the user has gone through Quick Install and Running your first SA study.

We cover two examples in this tutorial,

  1. Heat-PDE use-case
  2. Lorenz attractor use-case

Heat-PDE use-case

The heat equation is a partial differential equation (PDE) often taught in introductory courses on differential equations. This section demonstrates a Melissa Deep-Learning study involving a parallel MPI simulation using the example of a heat equation solver.

Use case presentation

In this example, a finite-difference parallel solver is used to solve the heat equation on a cartesian grid of size and the time discretization . The solver input variables are:

  • the initial temperature across the domain,
  • , , and , the wall temperatures.

By solving the heat equation for multiple sets of inputs, the purpose of this example is to train a Deep-Surrogate of the solver. By default, the considered network is a multi-layer perceptron with the following architecture:

  • an input layer of 6 neurons ,
  • two hidden layers of 256 neurons each,
  • an output layer of neurons.

Making Melissa visible

Assuming that you are in the project's root directory melissa/. Update the current shell:

source melissa_set_env.sh

If Melissa was installed via a package manager, there is no need to manually set up the environment. Simply loading the API package will automatically configure the paths as needed.

Running the example

Next, move to the example folder and build the example code:

cd examples/heat-pde/
cmake -S executables/ -B executables/build
make -C executables/build
cd heat-pde-dl/

If the build is successful, three new executables should appear in the executables/build sub-directory:

-rwxr-xr-x  1 root root 37264 Feb 20 10:06 heatc
-rwxr-xr-x  1 root root 37104 Feb 20 10:06 heatf
-rwxr-xr-x  1 root root 37192 Feb 20 10:06 heat_no_melissac

The configuration file config_<scheduler>.json is used to configure the Melissa execution (e.g. parameter sweep, computed statistics, launcher options). It must be edited at least to update the path to the executable:

    "client_config": {
        "executable_command": "/path/to/melissa/examples/heat-pde/executables/build/heatc",
        "command_default_args": ["100", "100", "100"]
    }

The example can be started with one of several batch schedulers supported by Melissa: OpenMPI, slurm, or OAR. It may be necessary to pass additional arguments directly to the batch scheduler for a successful example run. For example, starting with version 3, OpenMPI refuses to oversubscribe by default and requires the --oversubscribe option to have more processes than there are available CPUs. If you end running Melissa with mpirun on your local machine, it may require this option.

Note

In the configuration files, you will find an option command_default_args: ["100", "100", "100"] (, , ) specifying default command line arguments to the executable_command. The sampled parameters (, ) will be attached beyond this part as further command line arguments.

For the tutorial, we use the OpenMPI scheduler and the default config_mpi.json file:

melissa-launcher --config_name /path/to/heat-pde-dl/config_mpi

How to setup a configuration for the study is explained in Configuration Structure.

Note

The problem may not be computationally challenging problem but simply due to the number of simulation processes and depending on the resources available to the user, the system may end up being oversubscribed. If so, specifying --oversubscribe likewise can be helpful.

"scheduler_arg_client": ["-n", "1","--timeout", "60", "--oversubscribe"],
"scheduler_arg_server": ["-n", "1","--timeout", "3600", "--oversubscribe"]
This will have for effect to submit every mpirun command with this option.

All results, log files, and a copy of the configuration file are stored in a dedicated directory named STUDY_OUT. If not explicitly specified in the configuration file, the output directory defaults to the format melissa-YYYYMMDDTHHMMSS, where YYYYMMDD represents the current date, and THHMMSS represents the local time in ISO 8601 basic format.

After a successful study, the Melissa server will generate one file checkpoints/model.ckpt containing the trained parameters of the neural networks.

To see the results from training you can load the tensorboard logs by,

tensorboard --logdir STUDY_OUT/tensorboard

STUDY_OUT/tensorboard will contain a folders named gpu_<rank> regardless of whether you use GPU or not. These server rank folders may share the same training statistics but will maintain their respective buffer processing statistics.

Lorenz attractor use-case

The Lorenz attractor is a set of chaotic solutions of the Lorenz system (cf. Wiki page). In the recent years it has become a famous Deep-Learning problem for the study of chaotic dynamical systems (see Dubois et al. or Chattopadhyay et al. for examples). This section demonstrates a Melissa Deep-Learning study involving a non-parallel MPI simulation using the example of a Lorenz system solver.

Use case presentation

Note

This use-case is described in details in this notebook.

In this example, scipy based lorenz attractor is used for the Lorenz system and the solver input variables are:

  • the system parameter values ,
  • the initial 3D-coordinates of the trajectory .

By solving the Lorenz system for multiple initial coordinates, the purpose of this example is to train a Deep-Surrogate of the solver i.e. capable of generating the trajectory resulting from any set of initial coordinates and for specific parameter values (, and ). By default, the considered network is a multi-layer perceptron with the following architecture:

  • an input layer of 3 neurons ,
  • two hidden layers of 512 neurons each,
  • an output layer of size 3 predicting the time derivative of each coordinate ( is the time discretization).

Note

The use-case is not parallel and its computational load cannot be changed, but it can easily be tested at scale even on a local machine. It is recommended to use --oversubscribe option, if many clients will be submitted.

Running the example

For this use-case, the data generator has the following dependencies

First, move to the example folder:

cd /path/to/melissa/examples/lorenz

The configuration file config_<scheduler>.json is used to configure the Melissa execution (e.g. parameter sweep, computed statistics, launcher options). It must be edited at least to update the path to the executable:

    "client_config": {
        "executable_command": "python3 /path/to/melissa/examples/lorenz/lorenz.py",
        "command_default_args": [
            "--sigma=10",
            "--rho=28",
            "--beta=2.667",
            "--tf=20.0",
            "--dt=0.01"
        ]
    }

For the tutorial, we use the OpenMPI scheduler and the default config_mpi.json file:

melissa-launcher --config_name /path/to/heat-pde-dl/config_mpi

The surrogate can finally be evaluated with the aid of the script plot-results.py. For example, the command below will generate several graphs representative of the training quality and of the model preciseness:

python3 plot-results.py /path/to/<result-dir>
LorenzReference

Note

If the --coefficients option is used, the script will try to compute two additional evaluation quantities (the Lyapunov exponent and the correlation coefficient) and their corresponding graphs. However, their computation relies on the nolitsa package which must be installed beforehand. Guidelines to do so are available here.

Note

The Lorenz example exploits the convert_log_to_df feature. See Deeper post-processing.