Skip to content

Online Generative Active Sampling (OGAS) Breeder Classes

The OGAS breeder (OGASBreeder) provides an advanced generative approach designed for online surrogate training, bypassing the storage limitations and latencies typically associated with offline, pool-based, or importance-sampling active learning methods.

Guided by the Online Generative Active Sampling (OGAS) methodology from the Learning Where to Simulate paper, it progressively learns to sample configuration parameters (such as initial conditions or physical coefficients) that yield challenging dynamics for the surrogate model. This directly addresses the shortcomings of uniform sampling, which tends to over-represent trivial dynamics and limit generalization.

How to use it

To configure the OGASBreeder, you typically instantiate it with a dictionary or parameters mapping to an ExperimentConfig, which is composed of three main parts:

  • breeder (BreederConfig): Controls the core breeding strategy.

    • start_breed_ratio to end_breed_ratio and breakpoint dictate the fraction of bred samples over generations.
    • loss_sampling_strategy ("proportional" or "uniform_quantile") determines how the DDPM is conditioned for generating parameters on high target losses.
    • learn_sampling_ratio: when True, enables history bias mitigation via a RatioMLP.
  • model (DDPMConfig): Controls the diffusion model architecture and training (model_dim, timesteps, lr, etc.).

  • data (DataConfig): Governs the dynamically updated FIFO reservoir buffer that feeds the DDPM (buffer_size, batch_size).

Implementation Details

  • Continuous DDPM Training: A fast Denoising Diffusion Probabilistic Model (DDPM) is continuously trained asynchronously alongside the main surrogate model on an internal background thread utilizing DataConfig components. The model captures the evolving conditional distribution of parameters given the training signal (current loss/difficulty).

  • Targeted Generation: During generation breaks (next_parameters), the main thread coordinates with the DDPM by determining target losses and calling model.generate().

  • Loss Conditioning: Sampling is heavily weighted toward high-loss regimes utilizing strategies like proportional (scaling randomly based on the highest observed loss difference) or uniform_quantile (sampling uniformly from losses greater than a given percentage threshold).

  • History Bias Mitigation: A classifier-based density-ratio estimator (RatioMLP) determines sample weights (w) inside the DataLoader batch to correct historical bias toward repeatedly sampled parameter ranges without needing expensive uniform re-evaluations.

  • OOB Resampling: Generated candidate parameters that fall out-of-bounds are recompiled or clipped automatically directly by the generative scheme in a multi-attempt loop.

melissa.server.deep_learning.active_sampling.breeder.ogas_breeder.OGASBreeder

Bases: BreederPlottingMixin, ExperimentBreeder

Online Generative Active Sampling breeder for simulation parameters.

Uses a DDPM (Denoising Diffusion Probabilistic Model) to propose new parameters conditioned on target loss values. The model is trained continuously on incoming simulation data.

The breeder's behavior is entirely driven by the ExperimentConfig object, including data processing, model architecture, and training strategy.

Attributes:

Name Type Description
config

ExperimentConfig with all settings.

device

PyTorch device for computation.

model

The DDPM model instance.

history

HistoryManager for tracking simulation data.

R

Current breeding ratio (proportion of bred vs random samples).

Initialize the OGASBreeder.

Parameters:

Name Type Description Default
**kwargs

Configuration parameters that will be parsed into an ExperimentConfig dataclass.

{}

train_sampler()

Public method to trigger training.

next_parameters(max_breeding_count=-1, **kwargs)

Generates the next set of parameters for a new generation of simulations.

concretize_resampled_parameters(last_submitted_sim_id, **kwargs)

Concretizes the current metadata and parameters with newly bred child metadata for the future (unsubmitted) simulations.

checkpoint_state()

Saves the complete state of the breeder and its components.

restart_from_checkpoint()

Restores the state from the last checkpoint.