Online Generative Active Sampling (OGAS) Breeder Classes¶
The OGAS breeder (OGASBreeder) provides an advanced generative approach designed for online surrogate training, bypassing the storage limitations and latencies typically associated with offline, pool-based, or importance-sampling active learning methods.
Guided by the Online Generative Active Sampling (OGAS) methodology from the Learning Where to Simulate paper, it progressively learns to sample configuration parameters (such as initial conditions or physical coefficients) that yield challenging dynamics for the surrogate model. This directly addresses the shortcomings of uniform sampling, which tends to over-represent trivial dynamics and limit generalization.
How to use it¶
To configure the OGASBreeder, you typically instantiate it with a dictionary or parameters mapping to an ExperimentConfig, which is composed of three main parts:
-
breeder(BreederConfig): Controls the core breeding strategy.start_breed_ratiotoend_breed_ratioandbreakpointdictate the fraction of bred samples over generations.loss_sampling_strategy("proportional" or "uniform_quantile") determines how the DDPM is conditioned for generating parameters on high target losses.learn_sampling_ratio: whenTrue, enables history bias mitigation via a RatioMLP.
-
model(DDPMConfig): Controls the diffusion model architecture and training (model_dim,timesteps,lr, etc.). -
data(DataConfig): Governs the dynamically updated FIFO reservoir buffer that feeds the DDPM (buffer_size,batch_size).
Implementation Details¶
-
Continuous DDPM Training: A fast Denoising Diffusion Probabilistic Model (DDPM) is continuously trained asynchronously alongside the main surrogate model on an internal background thread utilizing
DataConfigcomponents. The model captures the evolving conditional distribution of parameters given the training signal (current loss/difficulty). -
Targeted Generation: During generation breaks (
next_parameters), the main thread coordinates with the DDPM by determining target losses and callingmodel.generate(). -
Loss Conditioning: Sampling is heavily weighted toward high-loss regimes utilizing strategies like
proportional(scaling randomly based on the highest observed loss difference) oruniform_quantile(sampling uniformly from losses greater than a given percentage threshold). -
History Bias Mitigation: A classifier-based density-ratio estimator (
RatioMLP) determines sample weights (w) inside the DataLoader batch to correct historical bias toward repeatedly sampled parameter ranges without needing expensive uniform re-evaluations. -
OOB Resampling: Generated candidate parameters that fall out-of-bounds are recompiled or clipped automatically directly by the generative scheme in a multi-attempt loop.
melissa.server.deep_learning.active_sampling.breeder.ogas_breeder.OGASBreeder¶
Bases: BreederPlottingMixin, ExperimentBreeder
Online Generative Active Sampling breeder for simulation parameters.
Uses a DDPM (Denoising Diffusion Probabilistic Model) to propose new parameters conditioned on target loss values. The model is trained continuously on incoming simulation data.
The breeder's behavior is entirely driven by the ExperimentConfig object,
including data processing, model architecture, and training strategy.
Attributes:
| Name | Type | Description |
|---|---|---|
config |
ExperimentConfig with all settings. |
|
device |
PyTorch device for computation. |
|
model |
The DDPM model instance. |
|
history |
HistoryManager for tracking simulation data. |
|
R |
Current breeding ratio (proportion of bred vs random samples). |
Initialize the OGASBreeder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Configuration parameters that will be parsed into an ExperimentConfig dataclass. |
{}
|
train_sampler()
¶
Public method to trigger training.
next_parameters(max_breeding_count=-1, **kwargs)
¶
Generates the next set of parameters for a new generation of simulations.
concretize_resampled_parameters(last_submitted_sim_id, **kwargs)
¶
Concretizes the current metadata and parameters with newly bred child metadata for the future (unsubmitted) simulations.
checkpoint_state()
¶
Saves the complete state of the breeder and its components.
restart_from_checkpoint()
¶
Restores the state from the last checkpoint.