TensorVue Callback¶
TensorFlow is an open-source machine learning library developed by Google, which allows you to design and create custom Machine Learning algorithms. These algorithms can take a long time to train, with accuracy and loss statistics reported after each epoch. To make it easy to keep track of these statistics as the training progresses, a Keras Callback has been created to upload information about the training of any Tensorflow Keras model to Simvue.
Further-docs
To view a detailed example of monitoring the training of a Tensorflow ML algorithm using the TensorVue callback, see the example here.
What is tracked¶
By default, the TensorVue
callback will create a Simulation
run, which represents the training of the entire model and contains statistics collected after each training epoch, and a series of Epoch
runs, which contains statistics for a specific epoch collected after each training batch (this can be disabled if desired). If you have a separate validation session using model.evaluate
, then an Evaluation
run will also be created. The following things are tracked by the TensorVue
callback:
- Uploads the Python script creating the model as a
Code
Artifact - Uploads the model config as an
Input
Artifact - Uploads parameters about the model as Metadata
- Uploads the Training Accuracy and Loss after each batch to an Epoch run
- Uploads the Training and Validation Accuracy and Loss after each Epoch to the Simulation run
- Uploads model checkpoints after each Epoch to the corresponding Epoch run as
Output
Artifacts(if enabled by the user) - Uploads the final model to the Simulation run as an
Output
Artifact
Usage¶
To use the TensorVue
class, you must have the simvue_integrations
repository installed. Create a virtual environment if you haven't already:
pip
:
pip install git+https://github.com/simvue-io/integrations.git@main#egg=simvue-integrations[tensorflow]
Before beginning training for your Tensorflow model, you need to create an instance of the TensorVue class. This class can take the following arguments:
run_name
: Name of the Simvue run to createrun_folder
: Name of the folder to store the run in, will create a folder with the same name as the run if not specifiedrun_description
: Description of the run, optionalrun_tags
: List of tags associated with the run, optionalrun_metadata
: Metadata associated with the run, optionalrun_mode
: Whether Simvue should run in Online or Offline mode, by default Onlinealert_definitions
: Definitions of any alerts to add to the run as a dictionary of key/value pairs, optionalmanifest_alerts
: If using the Optimisation framework, which of the alerts defined above to add to the manifest run, by default Nonesimulation_alerts
: Which of the alerts defined above to add to the simulation run, by default Noneepoch_alerts
: Which of the alerts defined above to add to the epoch runs, by default Noneevaluation_alerts
: Which of the alerts defined above to add to the evaluation runs, by default Nonestart_alerts_from_epoch
: If epoch alerts are enabled, the number of the epoch which you would like to begin setting alerts for, by default 0script_filepath
: Path of the file to upload as Code to the simulation run, by default uses the file where the callback was instantiatedmodel_checkpoint_filepath
: If using the ModelCheckpoint callback, the path where the checkpoint files are saved after each epoch, optionalmodel_final_filepath
: The location where the final model should be stored after training is complete, by default/tmp/simvue/final_model.keras
evaluation_parameter
: The parameter to check the value of after each Epoch, either 'accuracy', 'loss', 'val_accuracy', or 'val_loss', optionalevaluation_target
: The target value of the parameter, which will cause the training to stop if satisfied, optionalevaluation_condition
: How you wish to compare the latest value of the parameter to the target value, either '<', '>', '<=', '>=', '==', optionalcreate_epoch_runs
: bool, Whether to create runs for the training data for each Epoch individually, by default Trueoptimisation_framework
: Whether to use the Simvue ML Optimisation framework, by default Falsesimulation_run
: If using the ML Opt framework and this callback is being called within the simulation function, the 'data' run which has been created by the framework for this trial, by default Noneevaluation_run
: If using the ML Opt framework and this callback is being called within the evaluation function, the 'eval' run which has been created by the framework for this trial, by default None
Your Python script may look something like this:
from tensorflow import keras
from simvue_integrations.plugins.tensorflow import TensorVue
# Define your model
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28)))
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.01),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Load your training data
img_train, label_train, img_test, label_test = keras.datasets.fashion_mnist.load_data()
# Initialise your callback - minimum required is the Simvue run name, but can include any other details described above
tensorvue = sv_tf.TensorVue("recognising_clothes")
# Fit the model, using the tensorvue callback
model.fit(
img_train,
label_train,
epochs=10,
validation_split=0.2,
callbacks=[tensorvue,]
)
# Evaluate the model, again using the tensorvue callback
results = model.evaluate(
img_test,
label_test,
callbacks=[tensorvue,]
)
Adding Functionality¶
If you wish to store more data than the default TensorVue callback provides, you can create your own callback class which inherits from TensorVue. For detailed information on creating your own custom callbacks, see this guide.
For example, say you wanted the callback to upload the final accuracy and loss measurements as metadata to the Simvue run. To do this we will inherit from TensorVue, but override the on_train_end()
method to add our new functionality:
class MyTensorVue(sv_tf.TensorVue):
# This method will be called whenever a training session ends
def on_train_end(self, logs):
# Accuracy and Loss measurements are stored in `logs`:
final_measurements = {
"final_accuracy": logs.get("accuracy"),
"final_loss": logs.get("loss")
}
# You can then access the Simulation run to upload these values to through `self.simulation_run`
# Any of the methods available in the standard `simvue.Run` class are available here
self.simulation_run.update_metadata(final_measurements)
# Don't forget to then call the base TensorVue method!
super().on_train_end(logs)