Creating a dataset

melissa.server.deep_learning.dataset.make_dataset(framework_t, buffer, tb_logger, config_dict, transform)

Factory function to create datasets based on the specified deep learning framework.

This function initializes and returns a dataset object for either PyTorch or TensorFlow based on the provided framework type.

Parameters
  • framework_t (FrameworkType): The type of framework (DEFAULT, TORCH or TENSORFLOW).
  • buffer (BaseQueue): The data buffer to be used by the dataset.
  • tb_logger (Any): A logger for TensorBoard metrics.
  • config_dict (Dict[str, Any]): Configuration dictionary for the dataset.
  • transform (Callable): A transformation function to process data before yielding it.
Returns
  • MelissaIterableDataset: A dataset object compatible with the specified framework.

Iterable Dataset Classes

melissa.server.deep_learning.dataset.MelissaIterableDataset

A dataset class designed to handle streaming simulation data through a buffer, with optional data transformations and logging capabilities.

Parameters
  • buffer (BaseQueue): The buffer used for storing and retrieving streaming data.
  • config_dict (dict, optional): Configuration dictionary for initializing dataset-specific parameters. Defaults to an empty dictionary.
  • transform (Callable, optional): A callable transformation function to apply to the data samples. Defaults to None.
  • tb_logger (TensorboardLogger, optional): A logger for tracking dataset operations via TensorBoard. Defaults to None.
Attributes
  • buffer (BaseQueue): Holds the data samples in a queue for processing.
  • __tb_logger (Optional[TensorboardLogger]): Logs dataset-related events or metrics for TensorBoard visualization.
  • _is_receiving (bool): Indicates whether the dataset is currently receiving data from the buffer.
  • sample_number (int): Tracks the number of samples processed.
  • config_dict (Dict[str, Any]): Stores configuration settings for the dataset.
  • __transform (Callable, optional): Holds the transformation function, if provided.
  • __transform_lock (threading.Lock): Ensures thread-safe application of the transformation function.

has_data property

Returns if the server is still receiving and the buffer is not empty.

get_sample_number()

Returns the total sample count that were pulled from the buffer and processed. Useful for logging.

signal_reception_over()

Called after reception is done to flush the remaining elements from the buffer.

__iter__()

Infinite iterator which will always try to pull from the buffer as long as the buffer is not empty or the server is still receiving data.

melissa.server.deep_learning.dataset.torch_dataset.TorchMelissaIterableDataset

Bases: MelissaIterableDataset, IterableDataset

A dataset class designed to integrate Melissa's iterable dataset functionality with PyTorch's IterableDataset.

This class enables seamless usage of Melissa's streaming simulation data within PyTorch-based deep learning workflows.

melissa.server.deep_learning.dataset.tf_dataset.TfMelissaIterableDataset

Bases: MelissaIterableDataset

A TensorFlow-compatible extension of the MelissaIterableDataset.

This class adapts the MelissaIterableDataset to work seamlessly with TensorFlow pipelines. It serves as a bridge between the Melissa distributed data system and TensorFlow, ensuring compatibility and ease of use.