Design of Experiments¶
"The design of experiments (DOE) also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation."
From Wiki.
In other words, the Design of Experiments (DOE) determines which experiments should be conducted to explore the parameter space of a given input/output problem.
For instance, in surrogate modeling, the goal is typically to understand the behavior of an expensive black-box function in order to make cost-effective predictions that are statistically coherent for a given input. However, as the number of dimensions in the design space increases, exploring this parameter space can quickly become overwhelming. The challenge arises because uniform exploration of a parameter space grows exponentially with the number of dimensions, a phenomenon known as the curse of dimensionality. As a result, more advanced sampling methods are often preferred over basic uniform sampling.
DoE in Melissa¶
In Melissa, sampling is handled by a parameter generator, which is initialized using the set_parameter_sampler
method on the user-defined server. This method first creates a parameter_sampler
instance that defines a generator
method. The generator then iteratively yields parameter sets when creating client scripts.
Parameter Sampler Hierarchy¶
Melissa’s parameter sampler follows a structured class hierarchy:
At the core of this system is the BaseExperiment
class, which provides essential functionalities such as defining the number of parameters, setting seeds for reproducibility, specifying parameter bounds, and implementing a generator
method that calls draw
. Users must implement the sample
method and optionally draw
to customize their parameter sampling.
The StaticExperiment
class extends this functionality by allowing parameters to be stored and checkpointed, which is particularly useful for active sampling scenarios where parameters are used as foundation for producing a next set of parameters.
MixIn
classes might seem complex at first, but they play a crucial role in making the implementation modular and reusable. By using MixIn
classes, Melissa provides greater flexibility to create custom sampling strategies without modifying the core functionality.
How set_parameter_sampler
Works¶
The set_parameter_sampler
method follows these key steps:
- Initializes a
parameter_sampler
instance. - Calls
parameter_generator = parameter_sampler.generator()
. - Iterates over client scripts, retrieving parameters with
list(next(parameter_generator))
. - The generator operates as follows:
generator() → draw() → sample(1)
.
Important
The separation of draw()
and sample()
is intentional: while sample()
generates the parameter values, draw()
handles preprocessing, allowing flexibility in how parameters are formatted and passed to client scripts. In examples/lorenz/lorenz_server.py
, class LorenzParameterGenerator
is designed to produce inputs compatible with the lorenz.py
solver script, ensuring they align with its argparse
requirements.
Melissa provides predefined samplers for convenience, accessible via melissa.server.parameters.ParameterSamplerType
Enums. Therefore, there are two ways in which users can register a sampler in their server class' __init__
:
-
By passing the Enum value to
sampler_t
: -
By passing the type of the sampler to
sampler_t
(See again the lorenz example):
Predefined Samplers¶
Random Uniform¶
This sampler uses numpy.random.uniform
when overriding the sample
method.
In Melissa, users typically focus on ensemble runs, where the number of computable solutions is often large enough for uniform sampling to remain effective, regardless of dimensionality. Additionally, in sensitivity analysis, uniform sampling is particularly valuable as it provides uncorrelated samples, making it well-suited for methods like pick-freeze.
The user can instantiate RandomUniform
sampler by setting:
Scipy-based sampling¶
For deep surrogates, uniform sampling may result in slower learning due to inefficient coverage of the design space. Additionally, if training is unsatisfactory, extending the study further may be necessary. In such cases, incremental parameter space exploration can be improved using sequence sampling methods available in the scipy.stats.qmc
submodule.
Halton Sequence¶
The Halton Sequence is a deterministic sampling method. In Melissa, the HaltonGenerator
is based on scipy.stats.qmc
Halton sampler.
The user can instantiate HaltonGenerator
sampler by setting:
Latin Hypercube Sampling (LHS)¶
The Latin Hypercube Sampling is a non-deterministic method. In Melissa, the LHSGenerator
is based on scipy.stats.qmc
Latin Hypercube Sampler.
The user can instantiate LHSGenerator
sampler by setting:
Note
Non-deterministic generators take a seed
integer as argument in order to enforce the reproducibility of the generated inputs.
Warning
As opposed to the Halton sequence, sampling twice 10 samples from an LHS sampler won't yield the same DOE as when sampling 20 samples at once.
DoE Quality Metrics¶
The Figure below compares the DOEs obtained with uniform, LHS and Halton sampling of 50 points across a parameter space of 2 dimensions:
It clearly shows how uniform sampling may result in both cluttered and under-explored regions across the parameter space while Halton and LHS sampling provide a more homogeneous coverage.
In addition, as discussed earlier, LHS and Halton sampling are sequence samplers which means that their DOE can be enhanced a posteriori by resampling from the same generator. This feature is illustrated on the figure below where 20 points are added to the previous sets of parameters.
Finally, although the quality of the DOE may seem evident from the figures, intuition may be misleading. In order to evaluate the quality of a DOE, scipy.qmc
comes with a discrepancy
method:
The discrepancy is a uniformity criterion used to assess the space filling of a number of samples in a hypercube. A discrepancy quantifies the distance between the continuous uniform distribution on a hypercube and the discrete uniform distribution on distinct sample points.
The lower the value is, the better the coverage of the parameter space is.
For the DOEs represented in this section, the following discrepancies were obtained:
Sampling | Sample size | Discrepancy |
---|---|---|
Uniform | 50 | 0.01167 |
Uniform | 50+20 | 0.01045 |
LHS | 50 | 0.00054 |
LHS | 50+20 | 0.00041 |
Halton | 50 | 0.00183 |
Halton | 50+20 | 0.00097 |