Skip to content

Concepts

Before using Simvue it is important to understand some of the core concepts.

Run

Simvue provides the functionality to monitor the execution of an application in real time, such as a simulation, processing task or AI/ML training algorithm. To do this, a user must create a run - this is where information about the application is stored during execution. It is typically instantiated before the execution of the application itself begins, and is closed once the execution is completed.

Each run has associated information such as a name, description, metadata (key-value pairs) and tags (list of short labels). Metadata values are integers, floating point numbers or strings, and would typically be input or output parameters associated with a run. Runs can be filtered using both metadata and tags.

Folder

Simvue has an arbitrary hierarchical structure of folders allowing users to group runs in different ways. Each folder can have a variety of information associated with it, such as metadata and tags.

For example, a user could make use of folders to separate different projects they are working on. In this scenario the hierarchical nature of folders means that different activities within each project can be associated with different folders within the main project folder.

graph LR
  A[root] --> B[Project A];
  A --> C[Project B];
  B --> D[Simulations];
  B --> E[Surrogate models];

Metrics

Metrics are floating point measurements which are collected repeatedly over time during the execution of a run. They can be visualised in the Simvue web UI in real time or used to notify users in the event of problems (see alerts below).

A plot of the loss and accuracy metrics which are being tracked during a certain run. This plot is a line graph of the value of these metrics along the y axis (between 0 and 1), with step along the x axis (the iteration of the program at which the metric was evaluated). This type of plot can be seen in the UI for any given run while it is executing, and the plot is updated live with the latest values to allow the user to keep track of the progress of their simulation.

Artifacts

An artifact is any ASCII or binary object used as input to a run or generated by a run. Most simulations are likely to have both input and output artifacts, but artifacts can also be saved at any point during the execution of a run.

Artifacts can be files or Python objects such as NumPy arrays, PyTorch tensors or Matplotlib plots.

There are three catagories of artifacts:

  • input: input data or configuration,
  • code: source code,
  • output: output or intermediate files or objects, i.e. anything generated by the application.

Simvue keeps track of the connections between simulations via artifacts, i.e. it maintains a digital thread. A simple example is the situation where one simulation uses as input the output generated by another simulation. A more complex example is shown below, where the outputs from multiple simulations are merged and converted into a single file which is then used to create a surrogate model using machine learning.

artifacts workflow

For any artifact it is possible to see its whole lineage, including all artifacts and runs which were used to generate it.

Events

An event is a timestamped text record. Examples include exceptions, errors and other useful log messages.

Alerts

The metrics being collected during the lifetime of a run can have important implications on the rest of the simulation. For example: if a given metric is not converging after a certain period of time, is it even worth continuning running the simulation? Alerts automatically calculate averages of a specified metric over a given window and frequency, and checks if this value falls outside of the specified criteria. Users can choose to be notified by email in the event of an alert being triggered. This allows them to take action based on an alert, for example by stopping a run to change some input parameters before running again. This allows the user to save time and computational cost, by stopping runs early instead of allowing them to complete and return an unusable result.