Hyperparameter Optimization#

Source Files
  • twiga/core/config/base.py

  • twiga/core/config/data.py

  • twiga/forecaster/base.py

Twiga integrates with Optuna for hyperparameter optimization. Every model config includes a search_space field that defines the tunable parameter ranges, and the TwigaForecaster.tune() method orchestrates the optimization process.

Architecture#

        graph TD
    A[TwigaForecaster.tune] --> B[For each model]
    B --> C[create_optuna_study]
    C --> D[TPESampler + HyperbandPruner]
    D --> E[study.optimize]
    E --> F[_objective_fn]
    F --> G[model.update trial]
    G --> H[BaseSearchSpace.get_optuna_params]
    H --> I[suggest_int / suggest_float / suggest_categorical]
    F --> P[_update_pipeline_for_trial]
    P --> Q[DataPipelineConfig.search_space.get_optuna_params]
    Q --> R[Rebuild DataPipeline with sampled scalers]
    F --> J[_fit + _evaluate]
    J --> K[Return MAE cost]
    E --> L[study.best_trial.params]
    L --> M[Update model config]
    L --> N[Apply best pipeline params to data_config]
    

Search Spaces#

BaseSearchSpace#

The BaseSearchSpace class (twiga/core/config/base.py) is a Pydantic model that defines and validates hyperparameter ranges:

from twiga.core.config import BaseSearchSpace

space = BaseSearchSpace(
    learning_rate=(1e-3, 1e-1),           # float range → suggest_float
    max_depth=(1, 10),                     # int range → suggest_int
    n_estimators=(50, 500),                # int range → suggest_int
    boosting_type=["gbdt", "dart"],        # list → suggest_categorical
)

Type inference rules:

Input Format

Optuna Method

Log Scale

(int, int)

suggest_int

If ratio >= 10 and both > 0

(float, float)

suggest_float

If ratio >= 10 and both > 0

[val1, val2, ...]

suggest_categorical

N/A

Log-scale detection: Applied automatically when high / low >= 10 and both values are positive. This is controlled by the _should_use_log() static method.

Validation rules:

  • Tuples must have exactly 2 numeric values with low < high

  • Lists must have at least 1 element with no duplicates

Per-Model Search Spaces#

Each model config defines default search spaces:

CatBoost#

BaseSearchSpace(
    learning_rate=(1e-3, 1e-1),      # log scale
    depth=(1, 12),
    iterations=(20, 1000),            # log scale
    min_data_in_leaf=(1, 100),        # log scale
)

XGBoost#

BaseSearchSpace(
    learning_rate=(1e-3, 1e-1),      # log scale
    subsample=(0.05, 1.0),
    gamma=(0, 10),
    colsample_bytree=(0.05, 1.0),
    min_child_weight=(1, 20),         # log scale
    n_estimators=(10, 500),           # log scale
    max_depth=(1, 10),
)

LightGBM#

BaseSearchSpace(
    learning_rate=(1e-3, 1e-1),      # log scale
    num_leaves=(2, 1024),             # log scale
    subsample=(0.05, 1.0),
    colsample_bytree=(0.05, 1.0),
    min_data_in_leaf=(1, 100),        # log scale
    n_estimators=(10, 200),           # log scale
    max_depth=(1, 10),
    linear_tree=[True, False],
    iterations=(20, 1000),            # log scale
)

MLPF / MLPGAM / Neural Models#

BaseSearchSpace(
    embedding_size=[8, 16, 32, 64],
    hidden_size=[16, 32, 64, 128, 256, 512],
    num_layers=(1, 5),
    dropout=(0.1, 0.9),
    alpha=(0.01, 0.9),
    combination_type=["attn-comb", "weighted-comb", "addition-comb"],
    activation_function=["ReLU", "GELU", "SiLU"],
)

Pipeline Search Space#

DataPipelineConfig also accepts a search_space field, letting Optuna co-optimise data preprocessing (scalers) alongside model hyperparameters in a single study. The sampled scaler values are prefixed with "pipeline_" in the Optuna trial to avoid clashing with model parameter names.

from twiga.core.config import BaseSearchSpace, DataPipelineConfig

data_config = DataPipelineConfig(
    target_feature="load_mw",
    period="1h",
    lookback_window_size=168,
    forecast_horizon=48,
    search_space=BaseSearchSpace(
        input_scaler=["standard", "robust", "minmax"],
        target_scaler=["standard", "robust"],
    ),
)

How it works:

  1. At the start of each Optuna trial, _update_pipeline_for_trial samples the pipeline search space with prefix="pipeline" — producing keys like "pipeline_input_scaler".

  2. A new DataPipeline is constructed from the sampled config for that trial.

  3. After the study completes, the best pipeline parameters are stripped of the "pipeline_" prefix and applied permanently to data_config. Only model params (no pipeline_ prefix) are returned from tune().

  4. The original data_config is restored after every trial via a try/finally guard, so trial mutations never leak between trials.

Validation — unknown field names in search_space raise a ValidationError at construction time, catching typos before tuning starts:

# Raises ValidationError: "unknown fields: {'typo_scaler'}"
DataPipelineConfig(
    ...,
    search_space=BaseSearchSpace(typo_scaler=["standard"]),
)

Supported pipeline search space fields

Only input_scaler and target_scaler are currently tunable via the pipeline search space. Both accept lists of ScalerType string identifiers.

The Tuning Process#

TwigaForecaster.tune()#

forecaster.tune(
    train_df=train_df,
    val_df=val_df,
    num_trials=20,              # number of Optuna trials
    reduction_factor=3,        # Hyperband reduction factor
    patience=5,                # early stopping patience
    load_if_exists=True,       # resume existing study
    direction="minimize",      # optimize direction
)

Parameter

Type

Default

Description

train_df

pd.DataFrame

Required

Training data

val_df

pd.DataFrame

Required

Validation data

num_trials

int

10

Number of Optuna trials

reduction_factor

int

3

Hyperband pruner reduction factor

patience

int

10

Patient pruner patience

load_if_exists

bool

True

Load existing study from disk

initial_params

dict | None

None

Initial parameters to try first

direction

str

"minimize"

"minimize" or "maximize"

sampler

object | None

None

Custom Optuna sampler

base_pruner

object | None

None

Custom Optuna pruner

Study Configuration#

create_optuna_study() in BaseForecaster configures:

  • Sampler: TPESampler (Tree-structured Parzen Estimator) with:

    • seed=self.seed for reproducibility

    • multivariate=True for correlated parameters

    • n_startup_trials=patience * 2 random trials before TPE

    • constant_liar=True for parallel optimization

    • group=True for grouped parameters

  • Pruner: HyperbandPruner with:

    • min_resource=patience

    • max_resource="auto"

    • reduction_factor=reduction_factor

  • Storage: JournalFile-based storage at {logs_path}/{project_name}_{model_type}.log

Objective Function#

The _objective_fn per trial:

  1. Calls model.update(trial) — uses BaseSearchSpace.get_optuna_params() to suggest model values

  2. Calls _update_pipeline_for_trial(trial) — if data_config.search_space is set, samples scaler choices and rebuilds DataPipeline for this trial

  3. Calls _fit(train_df, val_df, trial) — trains the model

  4. Calls _evaluate(val_df) — computes validation metrics

  5. Returns mean(MAE) as the cost to minimize

  6. Sets user attributes: rmse and std_dev for dashboard visualization

After Tuning#

Best parameters are:

  • Saved to {results_path}/best_params.npy

  • Applied to the model config via model_copy(update=best_params)

  • The model is re-instantiated with the updated config

Conformal-Aware Tuning#

By default tune minimises a point-forecast metric (MAE). When tuning a probabilistic model it often makes more sense to optimise directly for interval quality. Pass calib_df together with conformal_params to activate conformal-aware tuning: each trial extends the normal fit → evaluate loop with a calibrate(calib_df) step so that the trial score is an interval metric.

from twiga.core.config import ConformalConfig
from twiga.models.nn.mlpgam_model import MLPGAMConfig

conf_config = ConformalConfig(method="crc", score_type="residual", alpha=0.1)

forecaster = TwigaForecaster(
    data_params=data_config,
    model_params=[MLPGAMConfig()],
    cv_params=train_config,
    conformal_params=conf_config,
)

best_params = forecaster.tune(
    train_df=train_df,
    val_df=val_df,
    calib_df=calib_df,               # held-out calibration window
    conformal_params=conf_config,    # if None, uses forecaster.conformal_params
    objective_metric="winkler",      # optimise Winkler score; or "picp", "pinaw", …
    num_trials=50,
)

conformal_params overrides self.conformal_params for the duration of the study only — the original value is restored when tune returns. objective_metric can be any column produced by evaluate_interval_forecast (e.g. "winkler", "picp", "pinaw", "ace").

NN Parameter Budget#

Neural-network search spaces can produce architectures ranging from tiny to enormous. The max_model_params argument lets you prune oversized trials before they waste training time: if the instantiated model’s trainable parameter count exceeds the budget the trial is pruned immediately without fitting.

best_params = forecaster.tune(
    train_df=train_df,
    val_df=val_df,
    num_trials=50,
    max_model_params=2_000_000,   # prune any trial > 2 M parameters; default 5 M
)

Set max_model_params=None to disable the check entirely. The parameter has no effect for ML-domain models (CatBoost, LightGBM, XGBoost).

Full Example#

from twiga.core.config import BaseSearchSpace, DataPipelineConfig, ExperimentConfig
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.xgboost_model import XGBOOSTConfig
from twiga.models.nn.mlpf_model import MLPFConfig

# Data pipeline config with pipeline search space — scalers are tuned alongside model params
data_config = DataPipelineConfig(
    target_feature="load_mw",
    period="1h",
    lookback_window_size=168,
    forecast_horizon=48,
    search_space=BaseSearchSpace(
        input_scaler=["standard", "robust", "minmax"],
        target_scaler=["standard", "robust"],
    ),
)

train_config = ExperimentConfig(
    split_freq="days",
    train_size=14,
    test_size=7,
)

# Model search space (tuned in the same study as the pipeline)
xgb_config = XGBOOSTConfig(
    search_space=BaseSearchSpace(
        learning_rate=(0.01, 0.3),
        max_depth=(3, 8),
        n_estimators=(100, 1000),
    )
)

mlpf_config = MLPFConfig.from_data_config(data_config)

forecaster = TwigaForecaster(
    data_params=data_config,
    model_params=[xgb_config, mlpf_config],
    cv_params=train_config,
)

# tune() co-optimises model hyperparameters + pipeline scalers in a single study
forecaster.tune(
    train_df=train_df,
    val_df=val_df,
    num_trials=30,
    patience=5,
)

# After tune(), data_config is updated with the best scaler combination
# Fit with tuned parameters
forecaster.fit(train_df=train_df, val_df=val_df)

# Evaluate
predictions_df, metrics_df = forecaster.evaluate_point_forecast(test_df=test_df)

Tip

Use load_if_exists=True (default) to resume tuning from a previous run. The study state is persisted to disk automatically.

API Reference#

class twiga.core.config.BaseSearchSpace(**data)

Bases: BaseModel

Pydantic model for validating hyperparameter optimisation search spaces.

Each field must be either:

  • A tuple[float, float] or tuple[int, int] representing a continuous range (low, high). Float ranges spanning more than one order of magnitude (high / low >= 10) are sampled on a log scale automatically.

  • A list of at least one categorical value.

The class uses extra="allow" so that concrete search spaces can be defined inline without subclassing:

space = BaseSearchSpace(
    latent_size=[64, 128, 256],
    dropout=(0.0, 0.5),
)
Parameters:

**kwargs – Any keyword argument whose value is a valid range tuple or categorical list.

Examples

>>> space = BaseSearchSpace(lr=(1e-4, 1e-2), activation=["relu", "tanh"])
>>> params = space.get_optuna_params(trial, prefix="mlp")
get_optuna_params(trial, prefix='')

Generate Optuna parameter suggestions for all fields.

Parameters:
  • trial (Trial) – Active Optuna trial.

  • prefix (str) – Prefix prepended to each parameter name in the trial (e.g. the model name) to avoid collisions when multiple search spaces are sampled in the same trial. Defaults to "".

Return type:

dict[str, Any]

Returns:

dict[str, Any]

Mapping of field names (without prefix) to their

sampled values.

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_against(config)

Raise ValueError if any search space field name is not present on config.

Catches typos in search space definitions early - before an Optuna trial is run - so that mis-spelled field names produce a clear error instead of silently sampling a parameter that never gets applied.

Parameters:

config (BaseModel) – The model config instance (or class) whose fields define the valid parameter names.

Raises:

ValueError – If one or more field names in this search space do not exist on config.

Examples

Return type:

None

>>> space = BaseSearchSpace(hiddn_dim=[64, 128])  # typo!
>>> space.validate_against(my_model_config)
Traceback (most recent call last):
    ...
ValueError: Search space contains unknown fields: {'hiddn_dim'}. ...
validate_search_space()

Validate all fields have valid types and structure.

Return type:

BaseSearchSpace