API Reference#
Complete reference for all public classes, functions, and exceptions exported by twiga.
Note
Symbols marked stable follow semantic versioning. Symbols marked experimental may change in minor versions.
Entry Point#
- class twiga.forecaster.core.TwigaForecaster(data_params, model_params, cv_params=None, conformal_params=None, training_params=None)
Bases:
BaseForecasterMachine Learning Forecaster for time series predictions.
This forecaster initializes a data pipeline and dynamically loads machine learning models based on provided configurations. The configurations can be specified as Pydantic models or dictionaries. Once the models are loaded, they can be trained, evaluated, and backtested.
Example
>>> from twiga.core.config import BaseModelConfig, DataPipelineConfig, ExperimentConfig >>> data_params = DataPipelineConfig(date_column="date", ...) >>> model_config = BaseModelConfig(name="linear", ...) >>> cv_params = ExperimentConfig(...) >>> forecaster = TwigaForecaster(data_params, model_config, cv_params) >>> forecaster.fit(train_df) >>> predictions, metrics = forecaster.evaluate_point_forecast(test_df)
- __init__(data_params, model_params, cv_params=None, conformal_params=None, training_params=None)
Initialize TwigaForecaster.
- Parameters:
data_params (
DataPipelineConfig) – Configuration for the data pipeline.model_params (
BaseModelConfig|list[BaseModelConfig] |dict|list[dict]) – Configuration for the model(s). Can be a single Pydantic config, a dictionary, or a list of either. Neural network configs with unset dims (num_target_feature,forecast_horizon,lookback_window_sizeequal to 0) are auto-populated from data_params. Base arch configs with adistributionfield set are automatically resolved to the corresponding probabilistic variant (e.g.MLPFConfig(distribution='normal')becomes anMLPFNormalConfig).cv_params (
ExperimentConfig|None) – Cross-validation and experiment configuration. Defaults toExperimentConfig()when not provided.conformal_params (
ConformalConfig|None) – Optional conformal prediction configuration.training_params (
NeuralTrainingConfig|None) – Training infrastructure overrides applied to all NN model configs (e.g.NeuralTrainingConfig(early_stop_patience=None, max_epochs=50)). Only non-Nonefields are applied.
- calibrate(calibrate_df=None, covariate_df=None, ensemble_strategy=None, ensemble_weights=None)
Calibrate conformal prediction models using calibration data.
- Parameters:
calibrate_df (
DataFrame|None) – Calibration dataset. If None, uses the stored training data.covariate_df (
DataFrame|None) – Optional covariate dataset.ensemble_strategy (
str|None) – Strategy for combining model predictions.ensemble_weights (
dict[str,float] |None) – Weights for weighted ensemble strategy.
- Raises:
ValueError – If
conformal_paramsis not set.- Return type:
- explain(X, model_idx=0, n_background=100)
Compute SHAP feature attributions for a fitted ML model.
Builds a
ShapExplainerfor the model at position model_idx inself.models, runs SHAP over X, and returns aShapResultwith values reshaped to(B, L, F)- one attribution per sample, per lookback step, per feature.Only ML models (
domain="ml") are supported. Neural-network models require gradient-based attribution and are not currently handled.- Parameters:
X (
ndarray) – Feature array of shape(B, L, F)as produced by the data pipeline (e.g. fromDataPipeline.transform()).model_idx (
int) – Index intoself.modelsof the model to explain. Defaults to0(the first / only model).n_background (
int) – Number of background samples forLinearExplainerandKernelExplainer. Ignored for tree models.
- Return type:
ShapResult- Returns:
ShapResultwith –values- SHAP array(B, L, F)feature_names- original F feature namestimestep_labels- L lookback labels ('t-L+1'…'t0')expected_value- SHAP base value (mean prediction)
- Raises:
IndexError – If model_idx is out of range.
RuntimeError – If no models are fitted or the domain is not
"ml".ImportError – If
shapis not installed.
Example
>>> result = forecaster.explain(X_test) >>> result.plot_importance(top_n=20) >>> importance = result.mean_importance()
- classmethod quick(target, period, horizon, model='catboost', distribution=None, lookback=None, calendar=None, scaler='standard', seed=42)
Minimal factory for getting started quickly.
Builds
DataPipelineConfigand a model config from a handful of plain-Python arguments, then returns a ready-to-useTwigaForecaster.- Parameters:
target (
str|list[str]) – Target column name(s) to forecast.period (
str) – Sampling frequency (pandas offset alias, e.g."1h","30min").horizon (
int) – Number of future steps to predict.model (
str) – Model name registered in the model registry (e.g."catboost","lightgbm","mlpf"). Defaults to"catboost".distribution (
str|None) – Probabilistic distribution variant for base NN architectures (e.g."normal","laplace"). Pass this together with a base arch name like"mlpf"or"nhits"to select the matching probabilistic variant automatically. Defaults toNone(point forecast).lookback (
int|None) – Lookback window size. Defaults tomax(2 * horizon, 24).calendar (
list[str] |None) – Calendar feature names (e.g.["hour", "day_of_week"]). Defaults toNone.scaler (
str) – Target scaler identifier (e.g."standard","minmax"). Defaults to"standard".seed (
int) – Global random seed. Defaults to42.
- Return type:
- Returns:
Configured
TwigaForecasterready forfit().
Example:
forecaster = TwigaForecaster.quick( target="load_kw", period="1h", horizon=24, model="lightgbm", calendar=["hour", "day_of_week"], ) forecaster.fit(train_df)
Configuration#
- class twiga.core.config.DataPipelineConfig(**data)
Bases:
BaseModelConfiguration for a time-series data pipeline.
Captures everything the pipeline needs to know about the raw dataset: which column to forecast, which features are available, how long the lookback and forecast windows are, what scalers to apply, and which lag/rolling-window features to engineer.
Feature category guide — classify each feature by when its values are available:
- Parameters:
target_feature (list[str] | str) – Target variable name(s) to forecast.
period (str) – Sampling frequency using pandas offset aliases (e.g.
"1H","30min").lookback_window_size (int | str) – Number of past timesteps fed to the model as input, or a pandas-compatible duration string that is converted to timesteps using period (e.g.
"7D"withperiod="1H"gives 168 steps).forecast_horizon (int | str) – Number of future timesteps to predict, or a duration string converted in the same way (e.g.
"1D"withperiod="30min"gives 48 steps).latitude (float | None, optional) – Latitude for day/night feature calculation. Defaults to None.
longitude (float | None, optional) – Longitude for day/night feature calculation. Defaults to None.
past_features (list[str] | None, optional) – Features available only in the lookback window (unknown in the forecast horizon). Defaults to None.
calendar_features (list[CalendarFeature] | None, optional) – Temporal features derived from the timestamp column. Accepts raw component names (e.g.
"hour","wday") or Fourier-encoded column names auto-selected by the dataset loader (e.g."hour_cosin","yweek_cos"). SeeCalendarFeaturefor all valid values. Defaults to None.known_future_features (list[str] | None, optional) – Features known over the full lookback + forecast horizon (e.g. weather forecast that was also recorded historically). Defaults to None.
forecast_period_features (list[str] | None, optional) – Features known only during the forecast horizon (e.g. scheduled load). Defaults to None.
input_scaler (ScalerType, optional) – Scaler applied to input features. Defaults to
"passthrough".target_scaler (ScalerType, optional) – Scaler applied to the target variable. Defaults to
"standard".lags (list[int] | None, optional) – Lag intervals in periods for feature engineering. Defaults to None.
windows (list[int] | int | None, optional) – Window sizes for rolling statistics. Defaults to None.
window_funcs (list[str] | str | None, optional) – Aggregation functions applied to rolling windows (e.g.
"mean","std"). Defaults to None.date_column (str, optional) – Name of the datetime column. Defaults to
"timestamp".window_stride (int, optional) – Step between consecutive sliding windows.
1= fully overlapping (maximum data augmentation). Set toforecast_horizonfor non-overlapping windows — recommended for baseline evaluation. Defaults to 1.
- calendar_features: list[Literal['index_num', 'year', 'year_iso', 'yearstart', 'yearend', 'leapyear', 'half', 'quarter', 'quarterstart', 'quarterend', 'month', 'monthstart', 'monthend', 'yweek', 'mweek', 'wday', 'mday', 'qday', 'yday', 'weekend', 'hour', 'minute', 'second', 'msecond', 'nsecond', 'day_night', 'half_sin', 'half_cos', 'half_cosin', 'quarter_sin', 'quarter_cos', 'quarter_cosin', 'month_sin', 'month_cos', 'month_cosin', 'yweek_sin', 'yweek_cos', 'yweek_cosin', 'mweek_sin', 'mweek_cos', 'mweek_cosin', 'wday_sin', 'wday_cos', 'wday_cosin', 'mday_sin', 'mday_cos', 'mday_cosin', 'qday_sin', 'qday_cos', 'qday_cosin', 'yday_sin', 'yday_cos', 'yday_cosin', 'hour_sin', 'hour_cos', 'hour_cosin', 'minute_sin', 'minute_cos', 'minute_cosin', 'second_sin', 'second_cos', 'second_cosin']] | None
- date_column: str
- forecast_horizon: int
- input_scaler: Literal['standard', 'minmax', 'robust', 'maxabs', 'normalizer', 'quantile_uniform', 'quantile_normal', 'power_yeo_johnson', 'power_box_cox', 'passthrough']
- lookback_window_size: int
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- n_jobs: int
- period: str
- recommended_lookback_search_space(max_multiplier=7, *, scalers=True)
Return a search space for lookback window size relative to this config’s forecast horizon.
The range is
[forecast_horizon, max_multiplier * forecast_horizon], which guarantees the lookback is always at least as long as the forecast horizon. Bothforecast_horizonandperiodare read from the config instance, so string horizons (e.g."1D") are resolved correctly before the range is computed.- Parameters:
- Return type:
- Returns:
A
BaseSearchSpaceready to assign tosearch_space.
Example:
cfg = DataPipelineConfig( target_feature="load", period="30min", forecast_horizon="1D", # resolved to 48 steps lookback_window_size=48, # initial value, overridden by HPO ) cfg = cfg.model_copy(update={"search_space": cfg.recommended_lookback_search_space()})
- search_space: BaseSearchSpace | None
- target_scaler: Literal['standard', 'minmax', 'robust', 'maxabs', 'normalizer', 'quantile_uniform', 'quantile_normal', 'power_yeo_johnson', 'power_box_cox', 'passthrough']
- window_stride: int
- class twiga.core.config.ExperimentConfig(**data)
Bases:
BaseModelConfiguration for the forecaster cross-validation runner.
Controls how the time-series is split for evaluation (split frequency, window type, train/test sizes), and holds project-level metadata such as the project name and output file name.
The
date_columnis intentionally absent here — it is always read fromDataPipelineConfig, which is the single source of truth for dataset structure.- Parameters:
domain (Literal["ml"], optional) – Modelling domain identifier. Fixed to
"ml"; excluded from parameter tuning. Defaults to"ml".split_freq (str, optional) – Unit for
train_size,test_size, andgap. One of"days","hours","weeks","months","years". Defaults to"months".test_size (int, optional) – Number of
split_frequnits in each test fold. Defaults to 1.train_size (int, optional) – Number of
split_frequnits in each training fold (rolling window only). Defaults to 1.gap (int, optional) – Number of
split_frequnits between the end of the training fold and the start of the test fold. Defaults to 0.stride (int | None, optional) – Step size between consecutive splits in
split_frequnits. None usestest_sizeas the stride. Defaults to None.window (Literal["expanding", "rolling"], optional) – Cross-validation window strategy. Defaults to
"expanding".num_splits (int | None, optional) – Maximum number of CV splits. None uses all available splits. Defaults to None.
project_name (str, optional) – Experiment / project name used for logging and output paths. Defaults to
"experiment".file_name (str | None, optional) – Output file name. None auto-generates from the project name. Defaults to None.
seed (int, optional) – Random seed for reproducibility. Defaults to 42.
root_dir (str, optional) – Root directory for output artefacts. Defaults to
"../".metrics (tuple[str] | list[str] | None, optional) – Evaluation metrics to compute and log. None uses the runner’s defaults. Defaults to None.
- calib_size: int
- calib_source: Literal['train_tail', 'gap', 'test_prefix', 'season_matched']
- domain: Literal['ml']
- gap: int
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- project_name: str
- root_dir: str
- seed: int
- split_freq: Literal['days', 'minutes', 'hours', 'weeks', 'months', 'years']
- test_size: int
- train_size: int
- val_size: int
- window: Literal['expanding', 'rolling']
- class twiga.core.config.BaseModelConfig(**data)
Bases:
BaseModelShared base configuration for all forecasting models.
Provides the
name,domain, andsearch_spacefields that every concrete config is expected to expose, along with a uniformget_optuna_params()that merges fixed config values with any search-space suggestions.Subclass this to define model-specific configurations:
class MyModelConfig(BaseModelConfig): name: Literal["my_model"] = Field(default="my_model", exclude=True) hidden_size: int = 128 dropout: float = 0.3 search_space: BaseSearchSpace = BaseSearchSpace( hidden_size=[64, 128, 256], dropout=(0.0, 0.5), )
- Parameters:
name (Literal["base_model"], optional) – Model type identifier. Excluded from parameter tuning. Defaults to
"base_model".domain (Literal["nn"], optional) – Modelling domain identifier. Excluded from parameter tuning. Defaults to
"nn".search_space (BaseSearchSpace | None, optional) – Hyperparameter search space. When set, its fields are merged into the output of
get_optuna_params()for HPO. Defaults to None.
- domain: Literal['nn']
- get_optuna_params(trial)
Return fixed config values merged with Optuna search-space suggestions.
Fixed parameters come from
pydantic.BaseModel.model_dump()(withnameandsearch_spaceexcluded). If asearch_spaceis set, its fields are sampled fortrialand override any overlapping fixed values, allowing a single config object to serve both fixed and tuned usage patterns.
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: Literal['base_model']
- search_space: BaseSearchSpace | None
- to_estimator_params()
Return a parameter dict safe to pass directly to the underlying estimator.
Excludes Twiga-internal fields (
name,domain,search_space) and maps the unifiedseedfield to the library-specific keyword defined by_LIBRARY_SEED_KEY(e.g."random_state"for sklearn estimators).
- class twiga.core.config.ConformalConfig(**data)
Bases:
BaseModelConfiguration for conformal prediction methods.
Supports three conformal predictors - residual-based, quantile-based, and residual-fitting - each with compatible nonconformity score types.
- Parameters:
method (Literal["residual", "quantile", "residual-fitting"], optional) –
Conformal prediction method:
"residual"- nonconformity scores based on absolute residuals|y - ŷ|."quantile"- quantile regression for prediction intervals."residual-fitting"- fits a secondary model to predict residuals for adaptive interval widths.
Defaults to
"residual".score_type (str, optional) – Nonconformity score type.
"scaled"/"unscaled"for quantile method;"res"/"sign-res"for residual-based methods. Defaults to"res".alpha (float, optional) – Significance level controlling the confidence level
(1 - alpha)of the prediction intervals. Must be in(0, 1). For examplealpha=0.1→ 90 % coverage. Defaults to 0.1.
- Raises:
ValueError – If
method="quantile"is combined with a residual score type, or if a residual method is combined with a quantile score type.
Examples
>>> ConformalConfig(method="residual", score_type="res", alpha=0.1) ConformalConfig(method='residual', score_type='res', alpha=0.1) >>> ConformalConfig(method="quantile", score_type="scaled", alpha=0.05) ConformalConfig(method='quantile', score_type='scaled', alpha=0.05)
- alpha: Annotated[float, FieldInfo(annotation=NoneType, required=False, default=0.1, description='Significance level for prediction intervals. Controls coverage as (1 - alpha). Example: alpha=0.1 → 90% prediction intervals.', metadata=[Gt(gt=0.0), Lt(lt=1.0)])]
- calib_method: Annotated[Literal['uniform', 'temporal'], FieldInfo(annotation=NoneType, required=False, default='uniform', description="Quantile estimation method for calibration scores. 'uniform': standard empirical quantile (equal weight per sample). 'temporal': exponentially weighted quantile (Tibshirani et al. 2019) — recent calibration samples receive higher weight, reducing the influence of seasonally misaligned older samples. Only used for 'residual-fitting'.")]
- classmethod from_coverage(coverage, **kwargs)
Construct from a coverage level rather than a significance level.
- Parameters:
coverage (
float) – Desired coverage probability, e.g.0.9for 90 % intervals. Must be in(0, 1). Converted toalpha = 1 - coverage.**kwargs – Any additional
ConformalConfigfields (e.g.method,score_type).
- Return type:
- Returns:
ConformalConfigwithalpha = 1 - coverage.
Example
>>> cfg = ConformalConfig.from_coverage(0.9, method="residual") >>> cfg.alpha 0.1
- lambda_: Annotated[float, FieldInfo(annotation=NoneType, required=False, default=1.0, description="Exponential decay rate for temporal calibration weighting. Ignored when calib_method='uniform'. lambda_=0 recovers uniform weights; larger values concentrate weight on the most recent calibration samples.", metadata=[Ge(ge=0.0)])]
- method: Annotated[Literal['residual', 'quantile', 'residual-fitting'], FieldInfo(annotation=NoneType, required=False, default='residual', description="Conformal prediction method. 'residual': absolute residual scores. 'quantile': quantile regression intervals. 'residual-fitting': secondary model predicts residuals for adaptive widths.")]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- score_type: Annotated[Literal['scaled', 'unscaled', 'res', 'sign-res'], FieldInfo(annotation=NoneType, required=False, default='res', description="Nonconformity score type. 'scaled'/'unscaled': for quantile method. 'res'/'sign-res': for residual-based methods.")]
- validate_method_score_compatibility()
Validate that method and score_type are compatible.
- Return type:
- classmethod warn_extreme_alpha(v)
Warn if alpha is likely to produce degenerate intervals.
- Return type:
- class twiga.core.config.NeuralModelConfig(**data)
Bases:
BaseModelConfigConfiguration for neural network-based forecasting models.
Extends
BaseModelConfigwith training infrastructure fields and a shared three-dict HPO system for optimizer, scheduler, and batch-size search. See the module docstring for a full explanation of the search space design.The optimizer and scheduler are selected via
optimizer_typeandlr_scheduler_type. Both are captured bysave_hyperparameters()inBaseNeuralModelat training time, so they must be declared as fields here.Optional fine-grained overrides can be supplied via
optimizer_paramsandscheduler_params. When provided they are merged into the corresponding entry ofBaseNeuralModel.OPTIMIZERS/BaseNeuralModel.SCHEDULERS, allowing partial overrides (e.g. onlylr) without replacing the full dict.- Parameters:
name (Literal["neural_model"], optional) – Model type identifier. Defaults to
"neural_model".domain (Literal["nn"], optional) – Modelling domain identifier. Defaults to
"nn".rich_progress_bar (bool, optional) – Enable rich progress bars. Defaults to True.
drop_last (bool, optional) – Drop the last incomplete batch. Defaults to True.
num_workers (int, optional) – DataLoader worker count. Defaults to 8.
batch_size (int, optional) – Training batch size. Defaults to 64.
pin_memory (bool, optional) – Pin memory for faster GPU transfer. Defaults to True.
max_epochs (int, optional) – Maximum training epochs. Defaults to 10.
early_stop_patience (int | None, optional) – Early-stopping patience in epochs.
Nonedisables early stopping. Defaults to 10.resume_training (bool, optional) – Resume from last checkpoint. Defaults to True.
seed (int, optional) – Positive integer random seed. Defaults to 42.
metric (Literal["mae", "mse", "smape"], optional) – Validation metric. Defaults to
"mae".optimizer_type (Literal[...], optional) – Native
torch.optimoptimizer. Defaults to"adamw".lr_scheduler_type (Literal[...], optional) – Native
torch.optim.lr_schedulerclass. Defaults to"multi_step".optimizer_params (dict | None, optional) – Partial override for the selected optimizer’s default params. Defaults to None.
scheduler_params (dict | None, optional) – Partial override for the selected scheduler’s default params. Defaults to None.
- BASE_TRAINING_SEARCH_SPACE: ClassVar[BaseSearchSpace] = BaseSearchSpace(optimizer_type=['adam', 'adamw'], lr_scheduler_type=['warmup_cosine', 'multi_step', 'reduce_on_plateau'], batch_size=[8, 16, 32, 64])
- OPTIMIZER_PARAM_SEARCH: ClassVar[dict[str, BaseSearchSpace]] = {'adam': BaseSearchSpace(lr=(0.0001, 0.01), weight_decay=(1e-07, 0.0001)), 'adamw': BaseSearchSpace(lr=(0.0001, 0.01), weight_decay=(1e-06, 0.001)), 'muon': BaseSearchSpace(lr=(0.001, 0.1), momentum=(0.9, 0.99), ns_steps=[4, 6, 8])}
- SCHEDULER_PARAM_SEARCH: ClassVar[dict[str, BaseSearchSpace]] = {'multi_step': BaseSearchSpace(prob_decay_1=(0.3, 0.6), prob_decay_2=(0.7, 0.95), gamma=[0.1, 0.2, 0.5]), 'reduce_on_plateau': BaseSearchSpace(factor=[0.1, 0.2, 0.5], prob_patience=(0.05, 0.2)), 'warmup_cosine': BaseSearchSpace(warmup_epochs=[3, 5, 10], eta_min=(1e-07, 1e-05))}
- batch_size: int
- domain: Literal['nn']
- drop_last: bool
- classmethod from_data_config(data_config, **kwargs)
Create a config instance with dimensions derived from a DataPipelineConfig.
- Parameters:
data_config (DataPipelineConfig) – Pipeline config providing feature counts and sequence dimensions.
**kwargs – Additional fields forwarded to the constructor, allowing any field to be overridden at instantiation time.
- Returns:
NeuralModelConfig – Populated config instance.
- Raises:
TypeError – If
data_config.target_featureis notstrorlist[str].AttributeError – If
data_configis missingforecast_horizon.
- get_optuna_params(trial)
Standard HPO sampling for all neural models.
Combines child-specific architecture parameters with the standardized conditional optimizer and scheduler search space.
- Return type:
- lr_scheduler_type: Literal['step', 'multi_step', 'multiplicative', 'exponential', 'constant', 'linear_decay', 'polynomial', 'cosine_annealing', 'cosine_annealing_lr', 'cyclic', 'reduce_on_plateau', 'one_cycle', 'warmup_multi_step', 'warmup_cosine']
- max_epochs: int
- metric: Literal['mae', 'mse', 'smape']
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: Literal['neural_model']
- num_workers: int
- optimizer_type: Literal['adam', 'adamw', 'nadam', 'radam', 'adamax', 'adafactor', 'adagrad', 'adadelta', 'rmsprop', 'rprop', 'asgd', 'sgd', 'muon']
- pin_memory: bool
- resume_training: bool
- rich_progress_bar: bool
- classmethod sample_training_params(trial)
Sample optimizer, scheduler, and batch-size using BaseSearchSpace logic.
- Return type:
- seed: int
- class twiga.core.config.BaseSearchSpace(**data)
Bases:
BaseModelPydantic model for validating hyperparameter optimisation search spaces.
Each field must be either:
A
tuple[float, float]ortuple[int, int]representing a continuous range(low, high). Float ranges spanning more than one order of magnitude (high / low >= 10) are sampled on a log scale automatically.A
listof at least one categorical value.
The class uses
extra="allow"so that concrete search spaces can be defined inline without subclassing:space = BaseSearchSpace( latent_size=[64, 128, 256], dropout=(0.0, 0.5), )
- Parameters:
**kwargs – Any keyword argument whose value is a valid range tuple or categorical list.
Examples
>>> space = BaseSearchSpace(lr=(1e-4, 1e-2), activation=["relu", "tanh"]) >>> params = space.get_optuna_params(trial, prefix="mlp")
- get_optuna_params(trial, prefix='')
Generate Optuna parameter suggestions for all fields.
- Parameters:
trial (
Trial) – Active Optuna trial.prefix (
str) – Prefix prepended to each parameter name in the trial (e.g. the model name) to avoid collisions when multiple search spaces are sampled in the same trial. Defaults to"".
- Return type:
- Returns:
dict[str, Any] –
- Mapping of field names (without prefix) to their
sampled values.
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_against(config)
Raise ValueError if any search space field name is not present on config.
Catches typos in search space definitions early - before an Optuna trial is run - so that mis-spelled field names produce a clear error instead of silently sampling a parameter that never gets applied.
- Parameters:
config (
BaseModel) – The model config instance (or class) whose fields define the valid parameter names.- Raises:
ValueError – If one or more field names in this search space do not exist on config.
Examples
- Return type:
>>> space = BaseSearchSpace(hiddn_dim=[64, 128]) # typo! >>> space.validate_against(my_model_config) Traceback (most recent call last): ... ValueError: Search space contains unknown fields: {'hiddn_dim'}. ...
- validate_search_space()
Validate all fields have valid types and structure.
- Return type:
Registry#
- twiga.forecaster.registry.get_model(name, domain=None)
Lazily load the model and config classes from models/ml/ or models/nn/.
- Parameters:
- Return type:
- Returns:
tuple[Type, Type] – A tuple of (model_class, config_class).
- Raises:
ValueError – If the model is not found in the specified or default domains.
Evaluation#
- twiga.core.metrics.point.evaluate_point_forecast(result, metric_names=None, axis=1)
Evaluate point forecasts by computing daily pointwise metrics.
- Parameters:
result (
ForecastResult) –ForecastResultwithground_truthset,kind=ForecastKind.POINT.metric_names (
list[str] |None) – Metric names to compute. WhenNoneall supported point metrics are computed.axis (
int|None) – Axis along which to compute aggregate metrics. If None, metrics that require an axis will use their default behavior.
- Return type:
- Returns:
DataFrame of per-day, per-target metrics indexed by daily timestamp.
- twiga.core.metrics.interval.evaluate_interval_forecast(result, alpha=0.01, true_nmpi=None, spread='std', nmpi_scale='range', axis=1, metric_names=None)
Evaluate interval forecasts by computing daily point and interval metrics.
- Parameters:
result (
ForecastResult) –ForecastResultwithground_truth,lower, andupperset,kind=ForecastKind.INTERVAL.alpha (
float) – Significance level used for Winkler score and coverage computations. Must be in(0, 1). Defaults to0.01.true_nmpi (
float|None) – Override for κ — absolute spread of the target used as the CWE reference numerator. WhenNone, derived fromspread.spread (
Literal['iqr','mad','std']) – Spread measure for the CWE reference κ."iqr"(default),"mad", or"std". Seeget_interval_metrics().nmpi_scale (
Literal['range','max','mean','median']) – Denominator R for NMPI and κ/R."range"(default),"max","mean", or"median".axis (
int|None) – Axis along which to compute aggregate metrics.metric_names (
list[str] |None) – List of interval metric names to compute.
- Return type:
- Returns:
DataFrame of per-day, per-target point and interval metrics indexed by daily timestamp.
- twiga.core.metrics.quantile.tail_pinball_score(true, quantile_preds, quantile_levels, tail_taus=(0.05, 0.1, 0.9, 0.95), quantile_axis=None, axis=None)
Mean pinball loss restricted to tail quantile levels only.
Standard pinball averages uniformly over all quantile levels, so interior levels (near the median) dominate the aggregate and mask tail skill differences. This function selects only the subset of levels closest to
tail_tausand computes the pinball loss on those alone, giving a metric that is directly sensitive to the quality of tail-concentrated evaluation grids (e.g. the Kumaraswamy proposal) relative to uniform ones.For N=9 quantiles the four default targets
(0.05, 0.10, 0.90, 0.95)are approximated by the nearest available levels. If the available levels do not reach below 0.10 or above 0.90 the nearest boundary level is used, and a warning is logged.- Parameters:
true (
ndarray) – Ground truth values, any shape.quantile_preds (
ndarray) – Predicted quantiles with one axis holding the quantile dimension.quantile_levels (
ndarray|list[float] |tuple[float,...]) – 1-D array of quantile levels in(0, 1).tail_taus (
tuple[float,...]) – Target tail levels. Each entry maps to the nearest available level inquantile_levels.quantile_axis (
int|None) – Axis ofquantile_predsholding quantiles. Inferred ifNone.axis (
int|None) – Aggregation axis over sample dimensions. Returns a scalar whenNone.
- Return type:
- Returns:
Scalar tail pinball loss if
axis=None, otherwise an array.
Forecast Results (experimental)#
- class twiga.forecaster.result.ForecastResult(timestamps, loc, targets, model_name, kind, ground_truth=None, scale=None, quantiles=None, quantile_levels=None, conf_level=None, samples=None, lower=None, upper=None, inference_time=0.0)
Bases:
objectContainer for one model’s forecast output.
- Variables:
timestamps – shape (n_batch, n_horizon, n_targets)
loc – point predictions (mean/median), shape (n_batch, n_horizon, n_targets)
targets – ordered list of target variable names
model_name – human-readable model identifier
kind – determines which optional arrays are expected and how to convert
ground_truth – optional, same shape as loc
scale – parametric std-dev / scale, same shape as loc
quantiles – shape (n_batch, n_q, n_horizon, n_targets)
quantile_levels – corresponding probability levels (e.g. [0.1, 0.5, 0.9])
samples – shape (n_batch, n_samples, n_horizon, n_targets)
lower – lower bound, same shape as loc
upper – upper bound, same shape as loc
inference_time – inference duration in seconds
conf_level
metric_name
- evaluate(ground_truth=None, **kwargs)
Evaluate forecast against ground truth using kind-appropriate metrics.
Forwards to
twiga.core.metrics.evaluate_forecast().- Parameters:
- Return type:
- Returns:
DataFrame of per-day, per-target metrics.
- Raises:
ValueError – if no ground truth is available.
- inference_time: float = 0.0
- kind: ForecastKind
- loc: ndarray
- model_name: str
- timestamps: ndarray
- to_dataframe(fmt='long')
Convert forecast to tidy DataFrame.
Always includes: timestamp, target, model, forecast. Optional: actual (when ground_truth is present).
Additional columns depend on forecast kind:
POINT: no extra columnsPARAMETRIC: scaleINTERVAL: lower, upperQUANTILE(fmt=”wide”): q_0.10, q_0.50, …QUANTILE(fmt=”long”): q_level, quantile_forecastSAMPLES: q_0.10, q_0.50, q_0.90 (empirical quantiles)
- Parameters:
fmt (
str) – “long” (default) or “wide” - only affects QUANTILE- Return type:
- Returns:
pandas DataFrame in long or wide format
- Raises:
ValueError – if fmt is invalid
- class twiga.forecaster.result.ForecastCollection(results=<factory>)
Bases:
objectCollection of ForecastResult objects from multiple models.
- add(result)
Add or replace result using its model_name as key.
- Return type:
- evaluate(**kwargs)
Evaluate all models and return a combined metrics DataFrame.
Calls
ForecastResult.evaluate()on each result and concatenates the output, adding a"Model"column derived from each result’smodel_name. Ground truth must be attached to each result (i.e.forecast()must have been called with test data that contains the target column).- Parameters:
**kwargs – Forwarded to each
ForecastResult.evaluate()call (e.g.metric_names,freq).- Return type:
- Returns:
Combined metrics DataFrame with a
"Model"column.- Raises:
ValueError – If the collection is empty or any result lacks ground truth.
- results: dict[str, ForecastResult]
- to_dataframe(fmt='long')
Concatenate all model forecasts into one DataFrame.
- Parameters:
fmt (
str) – passed to each ForecastResult.to_dataframe()- Return type:
- Returns:
Combined long-format DataFrame
- Raises:
ValueError – if collection is empty
- class twiga.forecaster.result.ForecastKind(*values)
Bases:
StrEnumSupported forecast output types.
Values are strings and can be used directly as dict keys.
- INTERVAL = 'interval'
- PARAMETRIC = 'parametric'
- POINT = 'point'
- QUANTILE = 'quantile'
- SAMPLES = 'samples'
Ensemble (experimental)#
- twiga.forecaster.ensemble.compute_ensemble_predictions(predictions, model_names, ensemble_strategy, ensemble_weights=None)
Generate ensemble predictions by combining predictions from multiple models.
- Parameters:
predictions (
list[ndarray]) – List of model predictions, where each prediction is a 3D NumPy array with shape (num_samples, horizon, num_targets).model_names (
list[str]) – List of model names corresponding to the predictions.ensemble_strategy (
EnsembleStrategy) – Strategy for combining predictions, one of EnsembleStrategy.MEAN, EnsembleStrategy.MEDIAN, or EnsembleStrategy.WEIGHTED.ensemble_weights (
dict[str,float] |None) – Dictionary mapping model names to their weights for the weighted ensemble strategy. Required if ensemble_strategy is EnsembleStrategy.WEIGHTED. Defaults to None.
- Return type:
- Returns:
A 3D NumPy array of ensemble predictions with shape (num_samples, horizon, num_targets).
- Raises:
ValueError – If predictions is empty, prediction shapes are inconsistent, weights are required but not provided, the number of weights does not match the number of models, or the ensemble strategy is unknown.
MLOps core (twiga.mlops)#
Streamlit-free building blocks for workspace orchestration, dataset transforms,
storage, capture-data access, and the training model registry. The Streamlit
dashboard built on top of these ships as a demo under examples/mlops/, not in
the installed wheel.
- twiga.mlops.workspace.create_workspace(*, name, raw_df, dataset_filename, data_setup, catalog, workspaces_dir=PosixPath('mlops_demo/workspaces'))
Create a new workspace and persist its dataset. Returns the slug.
Raises
WorkspaceNameTakenErrorifnamecollides. Does not load the workspace — callers that want the splits hydrated should follow withload_workspace_data().Order is: write dataset.parquet first, then catalog INSERT. A crash between the two leaves an orphan folder that the next attempt can safely overwrite — but never a catalog row without a dataset.
- Return type:
- twiga.mlops.workspace.load_workspace_data(slug, catalog)
Load
slug: set MLflow tracking, rebuild splits, return everything.Raises
WorkspaceNotFoundErrorif the slug is unknown, orWorkspaceArtifactMissingErrorif the dataset is gone. No session state is touched.- Return type:
LoadedWorkspace
- class twiga.mlops.workspace.LoadedWorkspace(slug, train_df, test_df, data_config, train_config, setup, dataset_filename, pipeline_state)
Bases:
objectEverything a caller needs to hydrate after loading a workspace.
- data_config: DataPipelineConfig
- dataset_filename: str
- pipeline_state: dict
- setup: dict
- slug: str
- test_df: DataFrame
- train_config: ExperimentConfig
- train_df: DataFrame
- twiga.mlops.workspace.list_workspaces(catalog)
- Return type:
list[WorkspaceSummary]
- twiga.mlops.data.parse_dataset(file_or_path)
Read a parquet or CSV file and return the raw DataFrame.
Dispatches on the filename extension. No column filtering, no datetime coercion, no splitting — just bytes in, frame out.
- Return type:
- twiga.mlops.data.split_raw_frame(raw_df, *, timestamp_col, target_col, exog_cols, train_cutoff, test_start)
Filter to selected columns, normalise timestamps, split on cutoffs.
Returns
(train_df, test_df). Both frames have atimestampcolumn (renamed fromtimestamp_colif needed), are TZ-naive, and are de-duplicated.
- twiga.mlops.data.build_configs_from_setup(setup, *, checkpoints_path)
Build the
DataPipelineConfig+ExperimentConfigpair from a setup dict.checkpoints_pathis workspace-scoped and supplied by the caller — this module never resolves filesystem paths on its own.- Return type:
- class twiga.mlops.catalog.Catalog(db_path)
Bases:
objectSQLite-backed workspace catalog.
- close()
- Return type:
- delete(slug)
- Return type:
- get(slug)
- Return type:
WorkspaceRow|None
- insert(row)
- Return type:
- list_all()
- Return type:
list[WorkspaceRow]
- name_exists(name)
- Return type:
- rename(slug, new_name)
- Return type:
- touch_last_opened(slug)
- Return type:
- update_data_setup(slug, setup, *, dataset_filename=None, dataset_hash=None)
- Return type:
- update_pipeline_state(slug, state)
- Return type:
- class twiga.mlops.catalog.WorkspaceRow(slug, name, storage_root, tracking_uri, dataset_filename, dataset_hash, data_setup, model_config, seasonal_config, pipeline_state, created_at, updated_at, last_opened_at)
Bases:
object- created_at: datetime
- data_setup: dict
- last_opened_at: datetime
- model_config: dict
- name: str
- pipeline_state: dict
- seasonal_config: dict
- slug: str
- storage_root: str
- tracking_uri: str
- updated_at: datetime
- exception twiga.mlops.catalog.WorkspaceNameTakenError
Bases:
ExceptionRaised when an INSERT collides with the unique
nameconstraint.
- exception twiga.mlops.catalog.WorkspaceNotFoundError
Bases:
ExceptionRaised when a slug lookup misses.
- class twiga.mlops.storage.LocalFsStorage(root)
Bases:
objectFilesystem-backed workspace storage.
Layout under
root:dataset.parquet mlruns.db monitoring_config.json checkpoints/ reports/ capture/ features/ # JSONL per day — feature rows seen at /predict predictions/ # JSONL per day — per-model predicted values actuals/ # JSONL per day — ground-truth submitted via /actuals monitoring/ reports/ # Evidently HTML/JSON per scheduled or manual run runs_index.jsonl # one row per run with paths + headline metrics
- actuals_dir()
- Return type:
- capture_dir()
- Return type:
- checkpoints_dir()
- Return type:
- dataset_path()
- Return type:
- ensure_initialized()
- Return type:
- features_dir()
- Return type:
- mlruns_db_path()
- Return type:
- monitoring_config_path()
- Return type:
- monitoring_dir()
- Return type:
- monitoring_reports_dir()
- Return type:
- monitoring_runs_index_path()
- Return type:
- predictions_dir()
- Return type:
- read_dataset()
- Return type:
- reports_dir()
- Return type:
- property root: Path
- write_dataset(raw_df)
- Return type:
- twiga.mlops.storage.local_tracking_uri_for(storage)
Return the SQLite tracking URI for a local workspace.
- Return type:
- twiga.mlops.mlflow_query.list_runs(experiment=None, max_results=200)
Return a tidy DataFrame of runs for display in the Experiments page.
- Parameters:
- Return type:
- Returns:
A DataFrame with the columns in
DISPLAY_COLUMNS. Missing columns are filled with empty strings orNaNso the table never crashes on a fresh tracking store.
- twiga.mlops.mlflow_query.latest_run()
Return a dict summarising the most recent run, or
Noneif no runs exist.
- twiga.mlops.monitoring.read_predictions(storage, *, window_start, version_id=None)
- Return type:
- twiga.mlops.monitoring.read_actuals(storage, *, window_start)
- Return type:
- twiga.mlops.monitoring.predictions_vs_actuals(storage, *, start, end, version_id=None)
Long-form chart frame with one row per (timestamp, series, value).
seriesis either"actual"or a model name. Predictions are deduped on(timestamp, model, target)keeping the latestreceived_at; actuals are deduped ontimestamp. The result is sorted by timestamp ascending, ready to feed into a lets_plotgeom_linewithcolor=series.- Return type:
- twiga.mlops.monitoring.daily_capture_counts(storage)
One row per (date, kind, version_id) with the number of captured records.
Features and predictions carry their producing version, so the table on the Monitor page can show how each deployed version’s stream evolves day by day. Actuals are version-agnostic and reported with
version_id="—".- Return type:
- twiga.mlops.monitoring.build_retraining_frame(storage, *, target_col, date_col='timestamp', version_id=None)
Assemble a training-ready frame from captured features + predictions + actuals.
Schema (one row per unique feature timestamp):
date_col— timestamp.Every non-target feature column the caller sent on
/predict.predicted_<target_col>— mean of per-model predictions for that timestamp (latest write wins on duplicates).actual_<target_col>— the ground-truth value. Initialised from whatever the features row carried (if anything), then overridden by the actuals log when an upload exists for that timestamp.
Both target columns can be NaN independently:
No actuals uploaded yet, but the model forecasted this timestamp:
predicted_<target>is filled,actual_<target>is NaN.Caller sent target in the lookback row but the model never forecasted it (e.g. this timestamp was always in lookback, never in any horizon):
actual_<target>is filled,predicted_<target>is NaN.
Rows are emitted per feature timestamp because retraining needs inputs — predictions or actuals at timestamps with no features behind them don’t contribute and are not included.
- Return type:
- twiga.mlops.monitoring.run_batch(*, storage, reference_df, target_col, cadence, feature_cols=None, drift_threshold=0.5, now=None, version_id=None)
Run the full four-report batch for one trailing window. Persist the result.
Side effects: writes four Evidently HTML+JSON pairs into
monitoring_reports_dir / <run_id>/and appends a row toruns_index.jsonlsummarising the run.When version_id is supplied, the capture window is restricted to rows tagged with that version — this is what keeps prediction drift and performance metrics anchored to the model that actually produced them rather than mixing predictions across deployed champions. Actuals stay version-agnostic; the version filter on the prediction side carries over into the join.
Returns the run summary (suitable for surfacing in the UI).
- twiga.mlops.training.get_registry()
Return the model registry, building it on first use.
Lazy so that importing this module (or
twiga.mlops) does not eagerly scan every model config class for callers that never train.
- twiga.mlops.training.build_model_config(entry, overrides)
Instantiate a model config, merging registry defaults with user overrides.
- Return type:
Exceptions#
- exception twiga.core.exceptions.TwigaError#
Bases:
ExceptionBase class for all twiga library exceptions.
- exception twiga.core.exceptions.ConfigurationError#
Bases:
TwigaError,ValueErrorRaised when a configuration is invalid or incompatible.
- exception twiga.core.exceptions.MissingExtraError#
Bases:
TwigaError,ImportErrorRaised when an optional dependency is not installed.
- exception twiga.core.exceptions.NotFittedError#
Bases:
TwigaError,RuntimeErrorRaised when a model or pipeline is used before fitting.
- exception twiga.core.exceptions.PipelineError#
Bases:
TwigaError,RuntimeErrorRaised for errors in the data pipeline.
- twiga.core.exceptions.require_extra(package, extra)#
Raise a helpful ImportError if an optional dependency is missing.
- Parameters:
- Raises:
MissingExtraError – If package cannot be imported.
- Return type:
Example
>>> require_extra("shap", "explain")
Experiment Engine#
Run structured ablation experiments across multiple datasets, conditions, and
CV folds. All MLflow tracking is automatic when MLFLOW_TRACKING_URI is set.
- class twiga.experiment.ExperimentEngine(spec)
Bases:
objectRuns a
ExperimentSpecend to end.Usage:
engine = ExperimentEngine(SPEC) engine.cli_main(base_cfg=PipelineConfig(...))
Or programmatically:
summary = engine.run(base_cfg, groups=["gating"], dataset_keys=["MLVS-PT"])
- cli_main(base_cfg, argv=None)
Parse CLI args then call
run().Recognised flags:
--group,--dataset,--skip-hpo,--tracking-uri,--epochs,--num-trials,--folds.
- run(base_cfg, groups=None, dataset_keys=None, skip_hpo=False, tracking_uri=None)
Run all conditions × datasets and return a cross-condition summary.
- Parameters:
base_cfg (
PipelineConfig) – RootPipelineConfig. Dataset-specific keys are applied on top viadataclasses.replace.groups (
list[str] |None) – Condition groups to run.Noneruns all groups.dataset_keys (
list[str] |None) – Dataset keys fromspec.datasets.Noneruns all datasets.skip_hpo (
bool) – Skip Phase 1 backbone HPO (reuse saved params).tracking_uri (
str|None) – MLflow tracking URI. Falls back to theMLFLOW_TRACKING_URI/TWIGA_MLFLOW_TRACKING_URIenv vars. PassNoneto disable tracking entirely.
- Return type:
- Returns:
Summary
DataFramewith mean ± std per condition.
- class twiga.experiment.ExperimentSpec(name, output_prefix, condition_cls, backbone_cls, conditions, datasets, controlled_fields=<factory>, fixed_overrides=<factory>, cv_train_size=12, cv_test_size=4, cv_val_size=2, cv_calib_size=0, cv_stride=1, cv_folds=10, hemisphere='NH', reference_conditions=<factory>, plot_figures=True, save_condition_plots=True, sample_plot_steps=336)
Bases:
objectFull declaration of a twiga ablation / benchmark experiment.
Pass an instance to
ExperimentEngineto run the experiment.- Variables:
name – Human-readable experiment title (used in logs and plot titles).
output_prefix – Prefix for all CSV output files (e.g.
"mlgaf_ablation"→mlgaf_ablation_summary.csv).condition_cls – Model config class instantiated per condition (e.g.
MLPGAFConfig).backbone_cls – Model config class used for Phase 1 backbone HPO. Must be specified explicitly — typically the plain backbone without a probabilistic head (e.g.
MLPGAMConfigfor a CRC experiment).conditions – List of
Conditionobjects defining the experimental grid.datasets – Registry of datasets. Keys are short names used with
--dataset; values are dicts ofPipelineConfigfield overrides (dataset_name,train_start,window_stride, …).hemisphere – Meteorological hemisphere used when annotating fold seasons in the summary.
"NH"(default) uses Northern-Hemisphere conventions (Dec–Feb = Winter). Use"SH"for Southern Hemisphere sites where seasons are reversed.
- CV protocol (all fields default to the standard 10-fold expanding window):
cv_train_size: Initial training window in
split_frequnits. cv_test_size: Test window per fold insplit_frequnits. cv_val_size: Validation window carved from the training tail. cv_calib_size: Calibration window for conformal experiments (0 =disabled).
cv_stride: Advance between folds. cv_folds: Maximum number of folds.
- Output:
- fixed_overrides: Applied to every model config before backbone params
(e.g.
{"use_revin": False, "value_embed_type": "ConvEmb"}).- controlled_fields: Stripped from backbone HPO params so ablation
overrides always win.
- reference_conditions: Maps group name → reference condition name for
Δ-vs-reference columns in the summary.
plot_figures: Whether to call
save_ablation_plotsafter the run.
- backbone_cls: type
- condition_cls: type
- controlled_fields: frozenset
- cv_calib_size: int = 0
- cv_folds: int = 10
- cv_stride: int = 1
- cv_test_size: int = 4
- cv_train_size: int = 12
- cv_val_size: int = 2
- fixed_overrides: dict
- hemisphere: Literal['NH', 'SH'] = 'NH'
- name: str
- output_prefix: str
- plot_figures: bool = True
- sample_plot_steps: int = 336
- save_condition_plots: bool = True
- class twiga.experiment.Condition(name, group, description='', overrides=<factory>, model_cls=None, hpo_variant='', metric_types=<factory>, conformal_config=None, stage1_epochs_frac=None, calib_source='train_tail')
Bases:
objectOne experimental condition — what varies between backtesting runs.
- Variables:
name – Short identifier used in filenames and summaries.
group – Experiment group this condition belongs to (e.g.
"gating").description – Human-readable note, shown in logs.
overrides – Key–value pairs applied to the model config after backbone HPO params and fixed overrides. These always win.
model_cls – Override the spec’s
condition_clsfor this condition. Use for multi-model experiments (e.g. MLPF vs MLPGAM vs MLPGAF).metric_types – Which evaluation methods to call per fold. Each entry maps to one call:
"point"→evaluate_point_forecast;"interval"→evaluate_interval_forecast;"quantile"→evaluate_quantile_forecast. Defaults to["point"].conformal_config – When set the forecaster is given these conformal params and
calib_sizefrom the spec is used for calibration within each backtesting fold.
- calib_source: str = 'train_tail'
- conformal_config: ConformalConfig | None = None
- description: str = ''
- group: str
- hpo_variant: str = ''
- name: str
- overrides: dict
- twiga.experiment.run_backbone_hpo(backbone_cls, cfg, data, target_series, calendar_variables, exogenous_features, lags, latitude, longitude, dataset_key, hpo_cache_dir, hpo_variant='')
Run Optuna HPO for backbone_cls on a fixed 14-month / 2-month split.
Saves best params to
<hpo_cache_dir>/<dataset_key>/<model_name>_best_params.jsonand returns the param dict. The file is shared across runs — params are never recomputed unless the file is deleted.- Parameters:
backbone_cls (
type) – Model config class with afrom_data_configfactory.cfg (
PipelineConfig) – Pipeline config providingtrain_start,epochs,num_trials.data (
DataFrame) – Full dataset DataFrame (must have atimestampcolumn).target_series (
str) – Target variable name.calendar_variables (
list) – Calendar feature names.exogenous_features (
list) – Exogenous feature names.lags (
list) – Lag indices.latitude (
float) – Site latitude (used by some feature builders).longitude (
float) – Site longitude.dataset_key (
str) – Short dataset identifier used in the cache path.hpo_cache_dir (
Path) – Root directory for cached HPO params.hpo_variant (
str) – Optional suffix appended to the model name when computing the cache key (e.g."3group"→mlpgaf_3group_best_params.json). Allows a single config class to have per-variant HPO files.
- Return type:
- Returns:
Best hyperparameter dict (same format as
load_backbone_params()).
- twiga.experiment.load_backbone_params(dataset_key, hpo_cache_dir, model_name_str, controlled_fields, fallback_paths=None)
Load saved backbone HPO params, strip controlled fields and model prefix.
Searches fallback_paths first (in order), then the canonical engine path. Returns an empty dict — with a warning — when no file is found.
- Parameters:
dataset_key (
str) – Short dataset identifier (e.g."MLVS-PT").hpo_cache_dir (
Path) – Root directory for cached HPO params (typically<experiment_root>/backbone_hpo).model_name_str (
str) – Model name string (e.g."mlpgaf").controlled_fields (
frozenset) – Keys to strip from the loaded params so ablation condition overrides always take precedence.fallback_paths (
list[Path] |None) – Additional JSON files to try before the canonical path.
- Return type:
- Returns:
Dict of hyperparameter names → values, ready to
setattronto a model config object.
- twiga.experiment.aggregate(combined, prefix, root, suffix, reference_conditions)
Compute mean ± std per (group, condition, metric_type) and save CSVs.
- Parameters:
combined (
DataFrame) – Long-form DataFrame with one row per fold/horizon, tagged withgroup,condition,dataset, and optionallymetric_typecolumns.prefix (
str) – Filename prefix for output CSVs.root (
Path) – Directory to write<prefix>_full<suffix>.csvand<prefix>_summary<suffix>.csv.suffix (
str) – Optional tag appended to filenames (e.g."_val").reference_conditions (
dict[str,str]) – Maps group → reference condition name for Δ-vs-reference columns.
- Return type:
- Returns:
Summary
DataFramewith MultiIndex (group, condition[, metric_type]) and one column per metric, plus_stdvariants,n_runs, and Δ columns. Empty DataFrame if no recognised metric columns are present.
Experiment Tracking#
MLflow bridge used by :class:~twiga.experiment.ExperimentEngine. All
helpers are safe no-ops when MLflow is absent or no tracking URI is
configured.
- twiga.experiment.detect_tracking_uri(explicit=None)
Return a tracking URI if MLflow tracking is configured, else None.
Priority: 1. explicit argument (caller-supplied). 2.
MLFLOW_TRACKING_URIenv var (standard MLflow convention). 3.TWIGA_MLFLOW_TRACKING_URIenv var (Twiga-specific).Returns
None— never a default localhost — so callers know tracking is genuinely absent rather than pointed at an unreachable server.
- twiga.experiment.tracking.parent_run_context(tracking_uri, spec, run_id, dataset_keys, groups)
Open an MLflow parent run for the whole engine.run() call.
Yields the active run object, or
Nonewhen MLflow is absent / unconfigured.
- twiga.experiment.tracking.hpo_run_context(dataset_key, model_name, n_trials)
Open an MLflow HPO child run under the active parent.
Yields the active run object, or
Nonewhen no parent run is active.
- twiga.experiment.tracking.condition_run_context(dataset_key, group, condition_name, metric_types, model_type=None)
Open an MLflow condition child run under the active parent.
NN fold-grandchild runs are opened automatically inside
BaseNeuralForecast._configure_logger()whenever this run is active.Yields the active run object, or
Nonewhen no parent run is active.
- twiga.experiment.tracking.log_hpo_result(best_params_path)
Log HPO best-params artifact to the active MLflow run.
- Return type:
- twiga.experiment.tracking.log_model_config_params(model_config)
Log effective model config (post-merge) as MLflow params on the active run.
- Return type:
- twiga.experiment.tracking.log_condition_results(metrics_df, metrics_csv_path=None)
Log aggregated fold metrics (mean ± std) and optional CSV artifact.
Called after
forecaster.backtesting()returns, inside the condition child run context.- Return type:
- twiga.experiment.tracking.log_experiment_summary(summary_path)
Log the cross-condition summary CSV as an artifact on the active parent run.
- Return type:
Logging#
- twiga.core.utils.configure(level='INFO', *, colour=True, log_file=None, file_level='DEBUG', capture_warnings=True)#
Activate Twiga logging. Call once from user code or experiment scripts.
Sets up a console handler (optionally colour-coded) and an optional file handler. Safe to call multiple times - existing handlers are cleared before new ones are attached.
- Parameters:
level (
str|int) – Console log level. Accepts level names ("DEBUG","INFO", …) or integer constants (logging.DEBUG, …). Defaults to"INFO".colour (
bool) – Enable ANSI colour in console output. Automatically disabled when stdout is not a TTY (e.g. CI or redirected output). Defaults toTrue.log_file (
str|Path|None) – Optional path for a plain-text log file. Parent directory is created automatically if it does not exist. Defaults toNone.file_level (
str|int) – Log level for the file handler. Defaults to"DEBUG"so full detail is always captured on disk even when the console shows only"INFO".capture_warnings (
bool) – Routewarnings.warn()calls through the logging system. Defaults toTrue.
- Return type:
- Returns:
The configured root Twiga
logging.Logger.- Raises:
ValueError – If
levelorfile_levelis not a recognised log-level string.
Example:
configure(level="DEBUG", log_file="results/run.log")
- twiga.core.utils.get_logger(name)#
Return a named child of the Twiga root logger.
Call once at module level in every Twiga submodule:
log = get_logger(__name__)
- Parameters:
name (
str) – Dotted module name, typically__name__. Automatically prefixed with"twiga."if not already present.- Return type:
- Returns:
A
logging.Loggerthat inherits handlers from the Twiga root logger.