Server Class

melissa.server.base_server.BaseServer

Bases: ABC

BaseServer class that handles the following tasks:

  • Manages connections with the launcher and clients.
  • Generates client scripts for simulations.
  • Encodes and decodes messages between the server and clients.
  • Provides basic checkpointing functionality to save and restore states.
Parameters
  • config_dict (Dict[str, Any]): A dictionary containing configuration settings for initializing the server.
  • checkpoint_file (str, optional): The filename for the checkpoint file (default is "checkpoint.pkl"). This file is used for saving and restoring the server's state.
Attributes
  • comm (MPI.Intracomm): The MPI communicator for inter-process communication.
  • rank (int): The rank of the current process in the MPI communicator.
  • comm_size (int): The total number of server processes in the MPI communicator.
  • client_comm_size (int): The total number of client processes.
  • server_processes (int): Synonym for comm_size.
  • connection_port (int): The server port to establish request-response connection with the clients.
  • data_puller_port (int): The server port to establish data pulling with the clients.

  • _offline_mode (bool): Internal flag indicating offline mode where no sending operation takes place. Useful when running multiple clients to produce datasets. Call self.make_study_offline to enable.

  • _learning (int): Internal flag indicating the learning state (initially 0).
  • __t0 (float): The timestamp marking the initialization time of the object.
  • _job_limit (int): Maximum number of jobs the launcher can manage concurrently.
  • __is_direct_scheduler (bool): Flag indicating whether the study is using a direct scheduler.

  • _restart (int): Flag indicating if the system is in a restart state; initialized from the MELISSA_RESTART environment variable.

  • _consistency_lock (threading.RLock): Reentrant lock to ensure thread-safe operations on shared resources.
  • _is_receiving (bool): Flag indicating whether data reception is ongoing.
  • _is_online (bool): Flag indicating if the system is in an online operational mode.
  • _sobol_op (bool): Flag indicating whether Sobol operations are being performed.
  • _total_bytes_recv (int): Tracks the total number of bytes received over the network.
  • _active_sim_ids (set): Set of active simulation ids currently being managed.

  • _groups (Dict[int, Group]): Dictionary mapping group ids to Group objects.

  • _parameter_sampler (Optional[BaseExperiment]): Sampler for generating parameter values for simulations.
  • __parameter_generator (Any): Internal generator object for producing parameters.

  • verbose_level (int): Determines the verbosity level for logging and debugging output.

  • config_dict (Dict[str, Any]): Configuration dictionary provided during initialization.
  • checkpoint_file (str): File name used for storing checkpoint data.

  • crashes_before_redraw (int): Number of simulation crashes allowed before redrawing parameters.

  • max_delay (Union[int, float]): Maximum allowed delay for simulations, in seconds.
  • rm_script (bool): Indicates whether client scripts should be removed after execution.
  • group_size (int): Number of simulations grouped together for batch processing.
  • zmq_hwm (int): High-water mark for ZeroMQ communication.

  • fields (List[str]): List of field names used in the study.

  • nb_parameters (int): Number of parameters in the parameter sweep study.
  • nb_time_steps (int): Number of time steps in each simulation.
  • nb_clients (int): Total number of clients participating in the parameter sweep study.

  • nb_groups (int): Total number of groups, derived from the number of clients and group size.

  • nb_submitted_groups (int): Tracks the number of groups submitted so far. finished_groups (set): Tracks the finished set of groups.
  • mtt_simulation_completion (float): Iteratively keeps track of mean of simulation durations.

  • no_fault_tolerance (bool): Indicates whether fault tolerance is disabled, based on the MELISSA_FAULT_TOLERANCE environment variable.

  • __ft (FaultTolerance): Fault tolerance object managing simulation crashes and retries.

time_steps_known property

Time steps are known prior study or not.

is_direct_scheduler property

Study is using a direct scheduler or not.

learning property

Deep learning activated? Required when establishing a connection with clients.

consistency_lock property

Useful for active sampling.

__loop_pings()

Maintains communication with the launcher to ensure it does not assume the server has become unresponsive.

__initialize_ports(connection_port=2003, data_puller_port=5000)

Assigns port numbers for connection and data pulling as class attributes. If the specified ports are already in use, likely due to multiple servers running on the same node, the function attempts to find available ports by incrementing the base port values and rechecking their availability.

Note: When multiple independent melissa-server jobs are running simultaneously on the same node, there is a chance that a port may incorrectly appear as available, leading to potential conflicts.

Parameters
  • connection_port (int, optional): The port number used for establishing the main connection (default is 2003).
  • data_puller_port (int, optional): The port number used for pulling data (default is 5000).
Raises
  • FatalError: If no ports were found after given number of attempts.

__connect_to_launcher()

Establishes a connection with the launcher and sends metadata about the study.

__setup_sockets()

Sets up ZeroMQ (ZMQ) sockets over a given TCP connection port for communication.

__setup_poller()

This method sets up the polling mechanism by registering three important sockets: - Data Socket: Handles data communication. - Timer Socket: Manages timing events. - Launcher Socket: Facilitates communication with the launcher.

__start_debugger()

Launches the Visual Studio Code (VSCode) debugger for debugging purposes.

configure_logger()

Configures server loggers for each MPI rank.

initialize_connections()

Initializes socket connections for communication.

set_parameter_sampler(sampler_t, **kwargs)

Sets the defined parameter sampler type. This dictates how parameters are sampled for experiments. This sampler type can either be pre-defined or customized by inheriting a pre-defined sampling class.

Parameters
  • sampler_t (Union[ParameterSamplerType, Type[ParameterSamplerClass]]):
    • ParameterSamplerType: Enum specifying pre-defined samplers.
    • Type[ParameterSamplerClass]: A class type to instantiate.
  • kwargs (Dict[str, Any]): Dictionary of keyword arguments. Useful to pass custom parameter as well as strict parameter such as l_bounds, u_bounds, apply_pick_freeze, second_order, seed=0, etc.

__generate_client_script(sim_id, parameters, script_path)

Generates a single client script for a given simulation id and parameters.

Parameters
  • sim_id (int): The simulation id associated with the client script.
  • parameters (list): The list of parameters.
  • script_path (str): The absolute path of the client script to create.

poll_sockets(timeout=10)

Performs polling over the registered socket descriptors to monitor various events, including timer, launcher messages, new client connections, and data readiness.

Parameters
  • timeout (int, optional): The maximum time (in seconds) to wait for a socket event before returning. Default is 10 seconds.
Returns
  • Optional[Union[ServerStatus, SimulationData, PartialSimulationData]]:
    • ServerStatus if the event is related to server status.
    • SimulationData if new simulation data is received.
    • PartialSimulationData if partial data from a simulation is received.

__forceful_group_termination(group_id)

Forcefully terminates all clients in a group.

__handle_timerfd()

Handles timer messages.

__handle_failed_group(group_id)

Handles failed group by using fault-tolerance to decide resubmission.

close_connection(exit_=0)

Signals to the launcher that the study has ended with a specified exit status.

Parameters
  • exit_ (int, optional): The exit status code to be sent to the launcher. Defaults to 0, indicating successful completion.

get_memory_info_in_gb()

Returns a Tuple[float, float] containing memory consumed and the total main memory in GB.

setup_environment()

Optional. A method that sets up the environment or initialization. Any necessary setup methods go here. For example, Melissa DL study needs dist.init_process_group to be called.

__deserialize_message(msg)

Deserializes a byte stream into a PartialSimulationData object.

Parameters
  • msg (bytes): Serialized message containing simulation data.
Returns
  • PartialSimulationData: Data objet.

__process_simulation_completion(simulation, force=False)

Finalizes simulation completion and adjusts metadata associated with it.

Parameters
  • simulation (Simulation): Instance of the simulation to finalize.
  • force (bool): Set to enforce termination, regardless. Default is False.

__determine_and_process_simulation_data(simulation_data)

Determines the status of the simulation data and handles actions accordingly.

Parameters
  • simulation_data (PartialSimulationData): The incoming simulation data to process.
Returns
  • Optional[Union[SimulationData, PartialSimulationData]]: return of the _check_simulation_data method.

start() abstractmethod

The high level organization of server events. Unique to melissa flavors.

melissa.server.base_server.ServerStatus

Bases: Enum

Server status enum.