Hyperparameter Optimization#
Source Files
twiga/core/config/base.pytwiga/core/config/data.pytwiga/forecaster/base.py
Twiga integrates with Optuna for hyperparameter optimization. Every model config includes a search_space field that defines the tunable parameter ranges, and the TwigaForecaster.tune() method orchestrates the optimization process.
Architecture#
graph TD
A[TwigaForecaster.tune] --> B[For each model]
B --> C[create_optuna_study]
C --> D[TPESampler + HyperbandPruner]
D --> E[study.optimize]
E --> F[_objective_fn]
F --> G[model.update trial]
G --> H[BaseSearchSpace.get_optuna_params]
H --> I[suggest_int / suggest_float / suggest_categorical]
F --> P[_update_pipeline_for_trial]
P --> Q[DataPipelineConfig.search_space.get_optuna_params]
Q --> R[Rebuild DataPipeline with sampled scalers]
F --> J[_fit + _evaluate]
J --> K[Return MAE cost]
E --> L[study.best_trial.params]
L --> M[Update model config]
L --> N[Apply best pipeline params to data_config]
Search Spaces#
BaseSearchSpace#
The BaseSearchSpace class (twiga/core/config/base.py) is a Pydantic model that defines and validates hyperparameter ranges:
from twiga.core.config import BaseSearchSpace
space = BaseSearchSpace(
learning_rate=(1e-3, 1e-1), # float range → suggest_float
max_depth=(1, 10), # int range → suggest_int
n_estimators=(50, 500), # int range → suggest_int
boosting_type=["gbdt", "dart"], # list → suggest_categorical
)
Type inference rules:
Input Format |
Optuna Method |
Log Scale |
|---|---|---|
|
|
If ratio >= 10 and both > 0 |
|
|
If ratio >= 10 and both > 0 |
|
|
N/A |
Log-scale detection: Applied automatically when high / low >= 10 and both values are positive. This is controlled by the _should_use_log() static method.
Validation rules:
Tuples must have exactly 2 numeric values with
low < highLists must have at least 1 element with no duplicates
Per-Model Search Spaces#
Each model config defines default search spaces:
CatBoost#
BaseSearchSpace(
learning_rate=(1e-3, 1e-1), # log scale
depth=(1, 12),
iterations=(20, 1000), # log scale
min_data_in_leaf=(1, 100), # log scale
)
XGBoost#
BaseSearchSpace(
learning_rate=(1e-3, 1e-1), # log scale
subsample=(0.05, 1.0),
gamma=(0, 10),
colsample_bytree=(0.05, 1.0),
min_child_weight=(1, 20), # log scale
n_estimators=(10, 500), # log scale
max_depth=(1, 10),
)
LightGBM#
BaseSearchSpace(
learning_rate=(1e-3, 1e-1), # log scale
num_leaves=(2, 1024), # log scale
subsample=(0.05, 1.0),
colsample_bytree=(0.05, 1.0),
min_data_in_leaf=(1, 100), # log scale
n_estimators=(10, 200), # log scale
max_depth=(1, 10),
linear_tree=[True, False],
iterations=(20, 1000), # log scale
)
MLPF / MLPGAM / Neural Models#
BaseSearchSpace(
embedding_size=[8, 16, 32, 64],
hidden_size=[16, 32, 64, 128, 256, 512],
num_layers=(1, 5),
dropout=(0.1, 0.9),
alpha=(0.01, 0.9),
combination_type=["attn-comb", "weighted-comb", "addition-comb"],
activation_function=["ReLU", "GELU", "SiLU"],
)
Pipeline Search Space#
DataPipelineConfig also accepts a search_space field, letting Optuna co-optimise data preprocessing (scalers) alongside model hyperparameters in a single study. The sampled scaler values are prefixed with "pipeline_" in the Optuna trial to avoid clashing with model parameter names.
from twiga.core.config import BaseSearchSpace, DataPipelineConfig
data_config = DataPipelineConfig(
target_feature="load_mw",
period="1h",
lookback_window_size=168,
forecast_horizon=48,
search_space=BaseSearchSpace(
input_scaler=["standard", "robust", "minmax"],
target_scaler=["standard", "robust"],
),
)
How it works:
At the start of each Optuna trial,
_update_pipeline_for_trialsamples the pipeline search space withprefix="pipeline"— producing keys like"pipeline_input_scaler".A new
DataPipelineis constructed from the sampled config for that trial.After the study completes, the best pipeline parameters are stripped of the
"pipeline_"prefix and applied permanently todata_config. Only model params (nopipeline_prefix) are returned fromtune().The original
data_configis restored after every trial via atry/finallyguard, so trial mutations never leak between trials.
Validation — unknown field names in search_space raise a ValidationError at construction time, catching typos before tuning starts:
# Raises ValidationError: "unknown fields: {'typo_scaler'}"
DataPipelineConfig(
...,
search_space=BaseSearchSpace(typo_scaler=["standard"]),
)
Supported pipeline search space fields
Only input_scaler and target_scaler are currently tunable via the pipeline search space. Both accept lists of ScalerType string identifiers.
The Tuning Process#
TwigaForecaster.tune()#
forecaster.tune(
train_df=train_df,
val_df=val_df,
num_trials=20, # number of Optuna trials
reduction_factor=3, # Hyperband reduction factor
patience=5, # early stopping patience
load_if_exists=True, # resume existing study
direction="minimize", # optimize direction
)
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
Required |
Training data |
|
|
Required |
Validation data |
|
|
|
Number of Optuna trials |
|
|
|
Hyperband pruner reduction factor |
|
|
|
Patient pruner patience |
|
|
|
Load existing study from disk |
|
|
|
Initial parameters to try first |
|
|
|
|
|
|
|
Custom Optuna sampler |
|
|
|
Custom Optuna pruner |
Study Configuration#
create_optuna_study() in BaseForecaster configures:
Sampler:
TPESampler(Tree-structured Parzen Estimator) with:seed=self.seedfor reproducibilitymultivariate=Truefor correlated parametersn_startup_trials=patience * 2random trials before TPEconstant_liar=Truefor parallel optimizationgroup=Truefor grouped parameters
Pruner:
HyperbandPrunerwith:min_resource=patiencemax_resource="auto"reduction_factor=reduction_factor
Storage: JournalFile-based storage at
{logs_path}/{project_name}_{model_type}.log
Objective Function#
The _objective_fn per trial:
Calls
model.update(trial)— usesBaseSearchSpace.get_optuna_params()to suggest model valuesCalls
_update_pipeline_for_trial(trial)— ifdata_config.search_spaceis set, samples scaler choices and rebuildsDataPipelinefor this trialCalls
_fit(train_df, val_df, trial)— trains the modelCalls
_evaluate(val_df)— computes validation metricsReturns
mean(MAE)as the cost to minimizeSets user attributes:
rmseandstd_devfor dashboard visualization
After Tuning#
Best parameters are:
Saved to
{results_path}/best_params.npyApplied to the model config via
model_copy(update=best_params)The model is re-instantiated with the updated config
Conformal-Aware Tuning#
By default tune minimises a point-forecast metric (MAE). When tuning a probabilistic model it often makes more sense to optimise directly for interval quality. Pass calib_df together with conformal_params to activate conformal-aware tuning: each trial extends the normal fit → evaluate loop with a calibrate(calib_df) step so that the trial score is an interval metric.
from twiga.core.config import ConformalConfig
from twiga.models.nn.mlpgam_model import MLPGAMConfig
conf_config = ConformalConfig(method="crc", score_type="residual", alpha=0.1)
forecaster = TwigaForecaster(
data_params=data_config,
model_params=[MLPGAMConfig()],
cv_params=train_config,
conformal_params=conf_config,
)
best_params = forecaster.tune(
train_df=train_df,
val_df=val_df,
calib_df=calib_df, # held-out calibration window
conformal_params=conf_config, # if None, uses forecaster.conformal_params
objective_metric="winkler", # optimise Winkler score; or "picp", "pinaw", …
num_trials=50,
)
conformal_params overrides self.conformal_params for the duration of the study only — the original value is restored when tune returns. objective_metric can be any column produced by evaluate_interval_forecast (e.g. "winkler", "picp", "pinaw", "ace").
NN Parameter Budget#
Neural-network search spaces can produce architectures ranging from tiny to enormous. The max_model_params argument lets you prune oversized trials before they waste training time: if the instantiated model’s trainable parameter count exceeds the budget the trial is pruned immediately without fitting.
best_params = forecaster.tune(
train_df=train_df,
val_df=val_df,
num_trials=50,
max_model_params=2_000_000, # prune any trial > 2 M parameters; default 5 M
)
Set max_model_params=None to disable the check entirely. The parameter has no effect for ML-domain models (CatBoost, LightGBM, XGBoost).
Full Example#
from twiga.core.config import BaseSearchSpace, DataPipelineConfig, ExperimentConfig
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.xgboost_model import XGBOOSTConfig
from twiga.models.nn.mlpf_model import MLPFConfig
# Data pipeline config with pipeline search space — scalers are tuned alongside model params
data_config = DataPipelineConfig(
target_feature="load_mw",
period="1h",
lookback_window_size=168,
forecast_horizon=48,
search_space=BaseSearchSpace(
input_scaler=["standard", "robust", "minmax"],
target_scaler=["standard", "robust"],
),
)
train_config = ExperimentConfig(
split_freq="days",
train_size=14,
test_size=7,
)
# Model search space (tuned in the same study as the pipeline)
xgb_config = XGBOOSTConfig(
search_space=BaseSearchSpace(
learning_rate=(0.01, 0.3),
max_depth=(3, 8),
n_estimators=(100, 1000),
)
)
mlpf_config = MLPFConfig.from_data_config(data_config)
forecaster = TwigaForecaster(
data_params=data_config,
model_params=[xgb_config, mlpf_config],
cv_params=train_config,
)
# tune() co-optimises model hyperparameters + pipeline scalers in a single study
forecaster.tune(
train_df=train_df,
val_df=val_df,
num_trials=30,
patience=5,
)
# After tune(), data_config is updated with the best scaler combination
# Fit with tuned parameters
forecaster.fit(train_df=train_df, val_df=val_df)
# Evaluate
predictions_df, metrics_df = forecaster.evaluate_point_forecast(test_df=test_df)
Tip
Use load_if_exists=True (default) to resume tuning from a previous run. The study state is persisted to disk automatically.
API Reference#
- class twiga.core.config.BaseSearchSpace(**data)
Bases:
BaseModelPydantic model for validating hyperparameter optimisation search spaces.
Each field must be either:
A
tuple[float, float]ortuple[int, int]representing a continuous range(low, high). Float ranges spanning more than one order of magnitude (high / low >= 10) are sampled on a log scale automatically.A
listof at least one categorical value.
The class uses
extra="allow"so that concrete search spaces can be defined inline without subclassing:space = BaseSearchSpace( latent_size=[64, 128, 256], dropout=(0.0, 0.5), )
- Parameters:
**kwargs – Any keyword argument whose value is a valid range tuple or categorical list.
Examples
>>> space = BaseSearchSpace(lr=(1e-4, 1e-2), activation=["relu", "tanh"]) >>> params = space.get_optuna_params(trial, prefix="mlp")
- get_optuna_params(trial, prefix='')
Generate Optuna parameter suggestions for all fields.
- Parameters:
trial (
Trial) – Active Optuna trial.prefix (
str) – Prefix prepended to each parameter name in the trial (e.g. the model name) to avoid collisions when multiple search spaces are sampled in the same trial. Defaults to"".
- Return type:
- Returns:
dict[str, Any] –
- Mapping of field names (without prefix) to their
sampled values.
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_against(config)
Raise ValueError if any search space field name is not present on config.
Catches typos in search space definitions early - before an Optuna trial is run - so that mis-spelled field names produce a clear error instead of silently sampling a parameter that never gets applied.
- Parameters:
config (
BaseModel) – The model config instance (or class) whose fields define the valid parameter names.- Raises:
ValueError – If one or more field names in this search space do not exist on config.
Examples
- Return type:
>>> space = BaseSearchSpace(hiddn_dim=[64, 128]) # typo! >>> space.validate_against(my_model_config) Traceback (most recent call last): ... ValueError: Search space contains unknown fields: {'hiddn_dim'}. ...
- validate_search_space()
Validate all fields have valid types and structure.
- Return type: