Experiment Tracking#

Source file
  • twiga/tracking/tracker.py - TwigaTracker

Overview#

TwigaTracker is a context manager that wraps an MLflow run lifecycle and exposes helpers tuned to Twiga’s data structures - Pydantic configs, metrics DataFrames, and checkpoint directories. It is installed as part of the [mlops] extra.

pip install twiga[mlops]

Quick start#

from twiga.tracking import TwigaTracker

with TwigaTracker(experiment="load-forecast", run_name="lgbm-v1") as tracker:
    forecaster.fit(train_df, val_df)
    _, metrics_df = forecaster.evaluate(test_df)
    tracker.log_metrics(metrics_df)
    tracker.log_forecaster(forecaster)

The block above:

  1. Creates (or resumes) the MLflow experiment "load-forecast".

  2. Starts a new run named "lgbm-v1".

  3. Logs evaluation metrics keyed as {model}/{metric}.

  4. Logs data pipeline params and checkpoint artefacts.

  5. Ends the run with status FINISHED on exit (or FAILED on exception).


API reference#

TwigaTracker#

class twiga.tracking.TwigaTracker(experiment='twiga', run_name=None, tracking_uri=None, tags=None, system_metrics=False)#

Bases: object

Context manager for tracking Twiga experiments in MLflow.

Provides high-level utilities for logging metadata, models, and evaluation results while maintaining strict MLOps standards for lineage and reproducibility.

__init__(experiment='twiga', run_name=None, tracking_uri=None, tags=None, system_metrics=False)#

Initializes the tracker and sets the tracking URI.

Parameters:
  • experiment (str) – Name of the MLflow experiment.

  • run_name (str | None) – Optional name for this specific run.

  • tracking_uri (str | None) – Optional URI for the MLflow tracking server.

  • tags (dict[str, str] | None) – Initial tags to attach to the run.

  • system_metrics (bool) – Enable MLflow system-metrics monitoring (CPU/RAM/GPU). Disabled by default to avoid noisy log output for short runs.

log_dataset(df, name, source='unknown', context='training')#

Logs dataset lineage for reproducibility.

Parameters:
  • df (DataFrame) – The pandas DataFrame to track.

  • name (str) – Human-readable name of the dataset.

  • source (str) – Origin of data (e.g., S3 URI, SQL Query).

  • context (str) – Context of the data usage (e.g., ‘training’, ‘test’).

Return type:

None

log_evaluation(metrics_df, results_df)#

Logs summary metrics and interactive evaluation tables.

Parameters:
  • metrics_df (DataFrame) – DataFrame containing computed metrics per split/sample.

  • results_df (DataFrame) – DataFrame containing actual values and predictions.

Return type:

None

log_forecaster_metadata(forecaster)#

Harvests all available configurations and hyperparameters.

Consolidates Pydantic configs and top-level primitive attributes into MLflow parameters.

Return type:

None

log_model(forecaster, sample_input)#

Logs the entire forecaster as a PyFunc model with environment dependencies.

Parameters:
  • forecaster (TwigaForecaster) – The fitted TwigaForecaster.

  • sample_input (DataFrame) – Sample data used to infer the model schema/signature.

Return type:

None

property run_id: str#

The active MLflow run ID, or 'unknown' if no run is active.


Logging helpers#

Parameters from Pydantic configs#

log_config uses model_dump() and flattens nested dicts with dot-separated keys so that all Pydantic config fields appear as searchable MLflow parameters:

tracker.log_config(forecaster.data_pipeline, prefix="data")
# → logs  data.target_feature, data.forecast_horizon, data.lookback_window_size, …

Metrics from a DataFrame#

log_metrics reads the standard Twiga evaluation DataFrame (columns include mae, rmse, corr, Model). Each model gets its own metric namespace:

tracker.log_metrics(metrics_df)
# → logs  lgbm/mae, lgbm/rmse, catboost/mae, catboost/rmse, …

Full forecaster log#

log_forecaster is a convenience wrapper that logs:

What

MLflow path

data_pipeline fields

params

project_name, model_type

params

Checkpoint directory

artefacts/checkpoints/

tracker.log_forecaster(forecaster)

ExperimentEngine tracking#

ExperimentEngine automatically logs to MLflow when a tracking URI is configured. No code changes are needed — just set the URI before running any experiment script.

Starting the MLflow UI#

After running one or more experiments, start the UI in a separate terminal:

uv run mlflow ui \
    --backend-store-uri sqlite:///mlruns.db \
    --port 5000

Then open http://localhost:5000 in your browser. The experiment named after spec.name (e.g. MLPGAF Regularization Ablation) appears in the left sidebar under Experiments. Click it and go to the Runs tab to see all runs.

To keep the UI running in the background:

uv run mlflow ui --backend-store-uri sqlite:///mlruns.db --port 5000 &

Run hierarchy#

Each engine.run() call writes a three-level hierarchy:

MLflow Experiment: <spec.name>
  └─ Parent Run: <YYYYMMDD_HHMMSS_<git-hash>>
       ├─ hpo/<dataset>/<model>         ← backbone HPO (one per model/dataset)
       └─ <dataset>/<group>/<condition> ← one per ablation condition
            └─ fold_1, fold_2, …        ← per-fold NN training runs

The parent run logs CV split settings and a results/ artifact with the cross-condition summary CSV. Each condition child run logs mean ± std metrics across folds and the effective model config params.

Disabling tracking#

Omit the env var and do not pass --tracking-uri — the engine detects that no URI is configured and runs without any MLflow calls.


Using a remote tracking server#

Pass tracking_uri to point at a remote MLflow instance:

with TwigaTracker(
    experiment="solar-forecast",
    tracking_uri="http://mlflow.internal:5000",
) as tracker:
    ...

Without tracking_uri, MLflow defaults to a local mlruns/ directory in the working directory.


Prefect integration#

Inside a Prefect flow, the log_to_mlflow task wraps TwigaTracker so you do not need to call it directly:

from twiga.pipeline import training_flow

training_flow(
    forecaster=forecaster,
    data_path="data/load.parquet",
    experiment="load-forecast",
    run_name="lgbm-v1",
)

See Pipeline for the complete flow reference.