Creating a dataloader¶
melissa.server.deep_learning.dataset.make_dataloader(framework_t, iter_dataset, batch_size, collate_fn=None, num_workers=0, **extra_torch_dl_args)
¶
Factory function to create dataloader based on the specified deep learning framework.
Parameters¶
- framework_t (
FrameworkType): The type of framework (DEFAULT,TORCHorTENSORFLOW). - iter_dataset (
MelissaIterableDataset): An iterable dataset that streams data via its__iter__method. - batch_size (
int): Number of samples per batch. - collate_fn (
Callable, optional): A function to combine multiple samples into a batch. Defaults toNone, which creates batches as lists of samples. - num_workers (
int, optional): Number of worker threads for parallel data loading. Defaults to0(no threading). - extra_torch_dl_args (
Dict[str, Any], optional): Extrakwargsfortorch.utils.data.DataLoader.
Returns¶
Union[GeneralDataLoader, torch.utils.data.DataLoader, tensorflow.data.Dataset]: An iterable for training over batches.
Raises¶
RuntimeErrorif the specified framework is not found.
Iterable Dataset Classes¶
melissa.server.deep_learning.dataset.GeneralDataLoader¶
A general-purpose data loader designed to handle streaming datasets with optional multi-threaded loading and batch collation.
This class supports datasets like MelissaIterableDataset that provide
infinite or streaming data. It enables efficient batching and parallel
data loading while ensuring compatibility with custom collation functions.
Parameters¶
- dataset (
MelissaIterableDataset): An iterable dataset that streams data via its__iter__method. - batch_size (
int): Number of samples per batch. - collate_fn (
Callable, optional): A function to combine multiple samples into a batch. Defaults toNone, which creates batches as lists of samples. - num_workers (
int, optional): Number of worker threads for parallel data loading. Defaults to0(no threading). - drop_last (
bool, optional): Whether to drop the last incomplete batch. Defaults toTrue.
Attributes¶
- dataset (
MelissaIterableDataset): The dataset being wrapped for batching and loading. - batch_size (
int): Size of each batch produced by the data loader. - collate_fn (
Optional[Callable]): The function used to collate samples into batches. - num_workers (
int): Number of worker threads for parallel data loading. - drop_last (
bool): Indicates if incomplete batches are dropped. - _queue (
queue.Queue): An internal buffer to hold preloaded samples during multi-threaded loading. - _stop_event (
threading.Event): A flag to signal worker threads to stop loading data. - _threads (
List[threading.Thread]): List of worker threads for parallel data loading.
melissa.server.deep_learning.dataset.torch_dataset.as_torch_dataloader¶
Creates a torch DataLoader using the iterable dataset.
- iter_dataset (
TorchMelissaIterableDataset): An iterable dataset that streams data via its__iter__method. - batch_size (
int): Number of samples per batch. - collate_fn (
Callable, optional): A function to combine multiple samples into a batch. Defaults toNone, which creates batches as lists of samples. - num_workers (
int, optional): Number of worker threads for parallel data loading. Defaults to0(no threading). - extra_torch_dl_args (
Dict[str, Any], optional): Extrakwargsfortorch.utils.data.DataLoader.
Returns¶
torch.utils.data.DataLoader: A torch dataloader instance for training over batches.
melissa.server.deep_learning.dataset.tf_dataset.as_tensorflow_dataset¶
Converts the iterable dataset into a TensorFlow tf.data.Dataset.
This method utilizes TensorFlow's from_generator functionality to
wrap the current iterable dataset into a tf.data.Dataset, allowing
integration with TensorFlow's data processing pipelines.
Parameters¶
- iter_dataset (
TfMelissaIterableDataset): An iterable dataset instance defining__iter__method. - batch_size (
int): Batch size for the iterable. - collate_fn (
Callable, optional): A function to combine multiple samples into a batch.
Returns¶
tf.data.Dataset: A TensorFlow dataset with elements structured as(features, labels). Both features and labels are of typetf.float32with dynamic shapes (None).