Metadata¶

The get_runs method of the Client class has an optional argument output_format which enables the format of the data retrieved to be specified. There are two options:

'dict': a dictionary (the default),
'dataframe': a Pandas dataframe.

The dataframe format makes it easy to create plots using the plot method in Pandas or using Matplotlib directly. pandas.DataFrame.columns can be used to get a list of the columns, for example:

df = client.get_runs(
    ['/fusion/neutronics/adaptive/run4'],
    metadata=True,
    output_format='dataframe'
)
print(df.columns)

Below we go through some example plots.

Basic scatter plot from metadata¶

Here is a simple example of a scatter plot using metadata from multiple runs. We plot final accuracy vs trial.number for all runs in the specified folder (/optuna/tests/binary-model in this case).

df = client.get_runs(
    ['/optuna/tests/binary-model'],
    metadata=True,
    output_format='dataframe'
)

plot = df.plot(
    kind='scatter',
    x='metadata.trial.number',
    y='metadata.final accuracy'
)

This results in the following plot: A scatter plot which uses metadata from all runs in a given folder. Shows a plot of the trial number on the x axis, versus final accuracy of the simulation on the y axis.

A scatter plot which uses metadata from all runs in a given folder. Shows a plot of the trial number on the x axis, versus final accuracy of the simulation on the y axis.

Scatter plot with coloured markers¶

We can easily extend a scatter plot by using the value of another metadata attribute to colour the markers. For example:

df = client.get_runs(
    ["/fusion/neutronics/adaptive/run4"],
    metadata=True,
    output_format="dataframe"
)

plot = df.plot(
    kind="scatter",
    x="metadata.blanket_breeder_li6_enrichment",
    y="metadata.breeder_percent_in_breeder_plus_multiplier_ratio",
    c="metadata.tbr",
)

gives:

A scatter plot which uses metadata from all runs in a given folder. Shows a plot of the breeding percentage in a breeder blanket (plus multiplier ratio) on the y axis, versus enrichment of a Lithium breeder blanket on the x axis. Each point is a differnt colour, with a colorbar corresponding to the Tritium Breding Ratio (TBR) on the right of the plot.

Bar chart¶

In this example we create a bar chart showing how many runs are associated with each possible value of a specified metadata attribute, in this case optimizer:

df = client.get_runs(
    ["/optuna/tests/binary-model"],
    metadata=True,
    output_format="dataframe"
)

plot = df.groupby("metadata.optimizer")["name"].nunique().plot(kind="bar", rot=0)

This gives: A bar chart showing the total number of runs in a folder which use a given optimizer (either AdamW, RMSprop or SGD), based on the value of the metadata attribute 'optimizer' for each run.

A bar chart showing the total number of runs in a folder which use a given optimizer (either AdamW, RMSprop or SGD), based on the value of the metadata attribute 'optimizer' for each run.

Box plot¶

Box and whisker plots can be easily created. In this example we show a metadata attribute final accuracy grouped by another attribute, n_layers:

df = client.get_runs(
    ['/optuna/tests/binary-model'],
    metadata=True,
    output_format='dataframe'
)

plot = df.boxplot(column=['metadata.final accuracy'], by=['metadata.n_layers'])

This results in the following plot:

A box and whisker plot showing the distribution of final accuracies for all runs in a folder. The runs are grouped by their number of layers (based on the n_layers metadata attribute) along the x axis, with values of either 1, 2, or 3, while the y axis shows the final accuracy of each run. The plots show the minimum and maximum values of each distribution as 'whiskers', with the first quartile, mean and third quartile values shown as 'boxes'. Outliers are marked as points outside of the whiskers.

Parallel coordinates plot¶

While parallel coordinates plots can be made directly from a dataframe (see documentation for how to directly plot parallel coordinates from a Pandas dataframe) this has some limitations, such as common y-axis limits across all variables. An alternative is to use Plotly where it's possible to have much more control - see documentation for creating a parallel coordinates plot using Plotly. Handling categorical values requires some additional work (view a solution for handling categorical values here) as is illustrated in the example:

import plotly.graph_objects as go
import pandas as pd
from simvue import Client

client = Client()
df = client.get_runs(
    ['/optuna/tests/binary-model'],
    metadata=True,
    output_format='dataframe'
)

group_vars = df['metadata.optimizer'].unique()
dfg = pd.DataFrame({'metadata.optimizer': df['metadata.optimizer'].unique()})
dfg['dummy'] = dfg.index
df = pd.merge(df, dfg, on='metadata.optimizer', how='left')

fig = go.Figure(
    data=go.Parcoords(
        line=dict(
            color=df["metadata.final accuracy"],
            colorscale="Electric",
            showscale=True,
            cmin=0,
            cmax=1,
        ),
        dimensions=list(
            [
                dict(label="lr", values=df["metadata.lr"]),
                dict(
                    label="optimizer",
                    range=[0, df["dummy"].max()],
                    tickvals=df["dummy"],
                    ticktext=dfg["metadata.optimizer"],
                    values=df["dummy"],
                ),
                dict(
                    range=[0, 1],
                    label="final accuracy",
                    values=df["metadata.final accuracy"],
                ),
            ]
        ),
    )
)

fig.write_image("output.png")

Which gives:

Further Documentation