TensorBoard logging for Deep Learning¶
For leveraging the default TensorBoard logger, either tensorflow
, or torch
module must be installed.
Warning
If you are using a different framework. It is preferred that you at least have tensorflow-cpu
installed locally.
Logging within a server class¶
Users are encouraged to use the built-in TensorBoard logging feature designed to help users more easily monitor and post-process their deep-learning studies.
As exemplified in examples/heat-pde/heatpde_server.py
, the TensorBoard logger is available anywhere in the custom server class under the method self.tb_logger.log_*
. Following methods are available to the user,
self.tb_logger.log_scalar("Loss/train", batch_loss, batch_idx)
self.tb_logger.log_scalars("Metrics", metrics_dict, batch_idx)
self.tb_logger.log_figure("Plots/metric", metric_plot_fig, batch_idx)
self.tb_logger.log_histogram("Histograms/dist", dist, batch_idx)
Note
If users want more flexibility, they can access SummaryWriter
object through self.tb_logger.writer
attribute.
TensorBoard allows you to monitor these values in real-time. To start, open a new terminal and run:
By default, this launches a server at http://localhost:6006. You can now track the training progress in real-time by accessing the TensorBoard dashboard.
Melissa makes use of the TensorBoard logger for a variety of other metrics including:
Metric | Description | Scope |
---|---|---|
samples_per_second |
Average number of samples trained per second | Local to MPI rank |
buffer_size |
Size of the buffer at a given time | Local to MPI rank |
put_time |
Time spent to put each sample into the buffer |
Local to MPI rank |
get_time |
Time spent to get each sample from the buffer |
Local to MPI rank |
Additionally, get_buffer_statistics
method is implemented in examples/heat-pde/heat-pde-dl/heatpde_dl_server.py
to record,
Metric | Description |
---|---|
buffer_std/{param} |
Standard deviation of {param} in the buffer |
buffer_mean/{param} |
Mean of {param} in the buffer |
Deeper post-processing¶
Users have the option of automatically generating a pandas
dataframe from the TensorBoard logs via a configuration flag convert_log_to_df
. By default, it is not set. The dataframe contains all information logged by the function self.tb_logger.log_scalar*
.
The following is an example dl_config
for users who wish to generate a dataframe from their TensorBoard logs:
Warning
This function requires an additional installation of pandas
and tensorflow
, which can both be installed via pip with pip install pandas tensorflow
. These are, by default, added in deep learning requirements.