TwigaForecaster & Forecaster Architecture#
Source Files
twiga/forecaster/core.py-TwigaForecaster(user-facing entry point)twiga/forecaster/abstract.py-AbstractForecaster(fit / predict / evaluate / tune orchestration)twiga/forecaster/base.py-BaseForecaster(checkpointing, feature preparation, Optuna integration)twiga/forecaster/registry.py- Dynamic model loadingtwiga/forecaster/ensemble.py- Ensemble prediction strategiestwiga/forecaster/utils.py- Shape validation, DataFrame construction helperstwiga/core/config/base.py-DataPipelineConfig,ForecasterConfig,ConformalConfig,BaseModelConfig
Overview#
TwigaForecaster is the primary interface for building, training, and evaluating time series forecasting models in Twiga. It accepts configuration objects, dynamically loads model implementations from a registry, and exposes a unified API that covers:
Multi-model training - fit one or many models (ML and/or neural network) in a single call.
Point and interval predictions - generate raw forecasts or conformal prediction intervals.
Evaluation - compute metrics against held-out data for both point and interval forecasts.
Backtesting - walk-forward cross-validation over expanding or rolling windows.
Hyperparameter tuning - Optuna-powered search with Hyperband pruning and TPE sampling.
Ensemble strategies - combine predictions from multiple models via mean, median, or weighted aggregation.
For a hands-on introduction see the Quick Start Guide.
Class Hierarchy#
classDiagram
direction TB
class TimeBasedCV {
+split_freq: str
+train_size: int
+test_size: int
+gap: int
+stride: int
+window: str
+split(data, start_dt, end_dt)
}
class AbstractForecaster {
<<abstract>>
+get_model_from_registry(model_params)
+fit(train_df, val_df, train_ratio)
+predict(test_df, ...) dict, dict
+predict_interval(test_df, ...) dict, dict
+evaluate(test_df, ...) DataFrame, DataFrame
+evaluate_point_forecast(test_df, ...)
+evaluate_interval_forecast(test_df, ...)
+backtesting(data, ...)
+tune(train_df, val_df, ...)
#_fit()*
#_predict()*
#_evaluate()*
#_tune()*
#_backtester()*
#_create_folder()*
}
class BaseForecaster {
+models: list
+data_pipeline: DataPipeline
+conformals: dict
+conformal_params
+checkpoints_path: Path
+logs_path: Path
+results_path: Path
+figures_path: Path
+_create_folder()
+on_save_checkpoint()
+on_load_checkpoint()
+_fit(train_df, val_df, train_ratio, trial)
+_predict(test_df, covariate_df)
+_evaluate(test_df, covariate_df)
+_tune(train_df, val_df, ...)
+_backtester(data, ...)
+create_optuna_study(...)
+prepare_test_data(test_df)
+get_ground_truth(test_df)
}
class TwigaForecaster {
+data_pipeline: DataPipeline
+models: list
+conformal: dict
+domain: str
+__init__(data_params, model_params, train_params, conformal_params)
+calibrate(calibrate_df, ...)
}
TimeBasedCV <|-- BaseForecaster
AbstractForecaster <|-- BaseForecaster
BaseForecaster <|-- TwigaForecaster
AbstractForecaster defines the orchestration logic (loops over models, builds DataFrames), while BaseForecaster supplies the concrete single-model implementations (_fit, _predict, _evaluate, _tune, _backtester) together with checkpointing, feature preparation, and Optuna study creation. TwigaForecaster wires everything together through configuration objects.
Constructing a Forecaster#
TwigaForecaster.__init__#
from twiga.forecaster.core import TwigaForecaster
forecaster = TwigaForecaster(
data_params=data_config, # DataPipelineConfig
model_params=[xgb_config], # BaseModelConfig | list | dict | list[dict]
train_params=train_config, # ForecasterConfig
conformal_params=conf_config, # ConformalConfig | None
)
Parameter |
Type |
Description |
|---|---|---|
|
Defines target features, feature engineering, scaling, and temporal settings for the |
|
|
|
One or more model configurations. Each configuration is validated against the model’s registered config class. Accepts Pydantic models or plain dictionaries. |
|
Cross-validation split parameters ( |
|
|
|
Optional. Enables conformal prediction intervals when provided. Requires a subsequent |
During construction the forecaster:
Copies the
date_columnfromdata_paramsintotrain_params.Initialises the
DataPipelinefromdata_params.Iterates over
model_params, callingget_model_from_registryto dynamically load each model class and instantiate it.
Mixing ML and NN models
You can pass both ML configs (e.g. XGBOOSTConfig) and neural network configs (e.g. NHITSConfig) in a single model_params list. The registry resolves each model independently from twiga.models.ml.* or twiga.models.nn.*. See Models for the full catalogue.
Core Workflow#
The typical lifecycle is configure -> fit -> predict / evaluate -> (optionally) backtest or tune.
sequenceDiagram
participant User
participant TF as TwigaForecaster
participant DP as DataPipeline
participant Reg as ModelRegistry
participant Model as Model(s)
participant Conf as Conformal
User->>TF: __init__(data_params, model_params, train_params, conformal_params)
TF->>DP: DataPipeline(**data_params)
TF->>Reg: get_model_from_registry(model_params)
Reg-->>TF: [model_1, model_2, ...]
User->>TF: fit(train_df, val_df)
TF->>DP: fit(train_df)
loop each model
TF->>TF: _create_folder()
TF->>DP: transform(train_split)
TF->>Model: model.fit(features, targets)
TF->>TF: on_save_checkpoint()
end
User->>TF: calibrate(calibrate_df)
TF->>TF: predict(calibrate_df)
loop each model
TF->>Conf: Conformal.calibrate(forecast, ground_truth)
end
User->>TF: evaluate_point_forecast(test_df)
TF->>TF: prepare_test_data(test_df)
TF->>TF: get_ground_truth(test_df)
loop each model
TF->>DP: transform_features(test_df)
TF->>Model: model.forecast(features)
TF->>TF: _rescale_predictions(forecast)
end
TF-->>User: (results_df, metrics_df)
Method Reference#
Training#
Method |
Signature |
Description |
|---|---|---|
|
|
Fits the data pipeline (if not already fitted) and trains every registered model. For each model it creates the artifact directory, prepares features via |
The fit method applies temporal filtering internally: it keeps only the last lookback_window_size + max_data_drop rows of training data to limit memory use.
Note
The data pipeline is fitted once on the first call. Subsequent calls to fit (e.g. during backtesting) skip the pipeline fit if it has already been initialised.
Prediction#
Method |
Signature |
Returns |
|---|---|---|
|
|
|
|
|
|
|
|
predict and predict_interval return a two-element tuple:
Prediction dictionary - keys are model names (and optionally
"Ensemble"), values are 3-D NumPy arrays(num_samples, horizon, num_targets)for point forecasts, or(lower, point, upper)tuples for intervals.Inference time dictionary - keys are model names, values are wall-clock seconds.
forecast is the higher-level alternative: it wraps each model’s output in a typed ForecastResult and returns a ForecastCollection. Ground truth is automatically extracted from test_df and attached to every ForecastResult. Use forecast when you want structured downstream access (.to_dataframe(), .evaluate(), etc.).
collection = forecaster.forecast(test_df)
result = collection["xgboost"] # ForecastResult
df = collection.to_dataframe() # tidy long-format DataFrame
metrics = collection.evaluate() # metrics DataFrame across all models
predict_interval requires that conformal_params was supplied at construction and that calibrate() has been called.
Warning
Calling predict_interval before calibrate() raises a ValueError. Always calibrate on a held-out calibration set that was not used during training.
Calibration#
Method |
Signature |
Description |
|---|---|---|
|
|
Generates point predictions on the calibration set and fits a |
See Conformal Prediction for the available method and score_type options in ConformalConfig.
Evaluation#
Method |
Signature |
Returns |
|---|---|---|
|
|
|
|
|
|
|
|
|
evaluate_point_forecast and evaluate_interval_forecast are convenience wrappers around evaluate that pre-bind the appropriate prediction and evaluation functions.
When using forecast(), evaluation is also available directly on the returned ForecastCollection:
Method |
Signature |
Returns |
|---|---|---|
|
|
|
ForecastCollection.evaluate() calls ForecastResult.evaluate() on each result in the collection and concatenates the metrics into a single DataFrame. This is the preferred path when you have already called forecast(), since ground truth is attached automatically.
The returned DataFrames have the following structure:
Results DataFrame (results_df):
Column |
Description |
|---|---|
index (timestamp) |
Date column set as the index |
|
Model name (uppercased) |
|
Target variable name |
|
Point forecast value |
|
Ground truth value |
|
Lower bound (interval evaluation only) |
|
Upper bound (interval evaluation only) |
Metrics DataFrame (metrics_df):
Column |
Description |
|---|---|
|
Model name (uppercased) |
|
Target variable name |
metric columns |
One column per metric (e.g. |
|
Wall-clock prediction time in seconds |
See Metrics for the full list of supported evaluation metrics.
Backtesting#
Method |
Signature |
Returns |
|---|---|---|
|
|
|
Backtesting performs walk-forward cross-validation by iterating over temporal splits produced by the inherited TimeBasedCV.split() method. For each fold it:
Calls
fit(train_df)to retrain on the expanding (or rolling) training window.Calls
evaluate_point_forecast(test_df)on the held-out test window.Tags results with a
Foldscolumn.
The split behaviour is controlled by ForecasterConfig parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Time unit: |
|
|
|
Length of the training window in |
|
|
|
Length of the test window in |
|
|
|
Gap between training end and test start |
|
|
|
Step size between folds (defaults to |
|
|
|
|
See Backtesting for a detailed explanation of the cross-validation scheme.
Hyperparameter Tuning#
Method |
Returns |
|---|---|
|
|
forecaster.tune(
train_df, val_df,
num_trials=10, reduction_factor=3, patience=10,
load_if_exists=True, initial_params=None,
direction="minimize", sampler=None, base_pruner=None,
objective_metric=None, calib_df=None, conformal_params=None,
)
tune performs Optuna-based hyperparameter optimisation for every registered model. For each model it:
Creates an Optuna study (or loads an existing one from a
JournalFileBackendlog).Configures a
HyperbandPrunerwrapped in aPatientPrunerand aTPESampler(seeded withself.seed).Enqueues
initial_paramsif provided.Runs
num_trialstrials. Each trial callsmodel.update(trial)to sample hyperparameters, then_fitand_evaluateto compute a cost.Replaces each model instance with a new one instantiated from the best parameters.
Interval-metric tuning (Strategy 1): Pass calib_df alongside conformal_params to extend each trial with a calibrate → evaluate-interval step. The trial score then reflects an interval quality metric (e.g. PICP, PINAW) instead of a point-forecast error. Use objective_metric to select which column of the metrics DataFrame to optimise; defaults to the first entry in self.metrics or "mae".
conformal_params overrides self.conformal_params for the duration of the tuning loop only — the original value is restored afterward.
After tuning, call fit again to train with the optimised configuration.
Resuming studies
When load_if_exists=True (the default) the study is persisted to {logs_path}/{project_name}_{model_type}.log. Re-running tune continues from where the previous run left off, which is useful for incremental search.
See Hyperparameter Tuning for advanced recipes.
Model Registry#
Models are loaded dynamically at construction time through the registry in twiga/forecaster/registry.py. The registry uses the name field from each config to resolve a module at twiga.models.{domain}.{name}_model, then retrieves {NAME}Model and {NAME}Config from that module.
# Internally called during TwigaForecaster.__init__
model_cls, config_cls = get_model("xgboost", domain="ml")
# Imports twiga.models.ml.xgboost_model -> (XGBOOSTModel, XGBOOSTConfig)
Results are cached after the first load. If domain is not specified, the registry searches both ml and nn directories.
When configs are supplied as dictionaries, get_model_from_dict extracts the "name" key and follows the same resolution path:
forecaster = TwigaForecaster(
data_params=data_config,
model_params={"name": "lightgbm", "domain": "ml"},
train_params=train_config,
)
See Models for all available model implementations.
Ensemble Strategies#
When more than one model is registered and an ensemble_strategy is passed to predict, predict_interval, or the evaluation methods, predictions are combined into an additional "Ensemble" entry.
Strategy |
|
Formula |
|---|---|---|
Mean |
|
Element-wise mean across models |
Median |
|
Element-wise median across models |
Weighted |
|
|
For the weighted strategy, pass a dict[str, float] mapping model names to weights:
predictions, times = forecaster.predict(
test_df=test_df,
ensemble_strategy="weighted",
ensemble_weights={"xgboost": 0.6, "lightgbm": 0.4},
)
For interval predictions, the ensemble is computed independently over the lower, point, and upper arrays.
Checkpointing & Directory Structure#
BaseForecaster._create_folder() creates a standardised directory layout under root_dir:
{root_dir}/
results/{project_name}/{model_type}/
logs/{project_name}/{model_type}/
figures/{project_name}/{model_type}/
checkpoints/{project_name}/{model_type}/[file_name]/
Checkpoint persistence differs by domain:
Domain |
Save |
Load |
|---|---|---|
|
|
|
|
Delegated to the model’s own save logic |
|
Inverse Scaling#
BaseForecaster._rescale_predictions automatically reverses the target scaling applied by the data pipeline. It supports three input formats:
Format |
Handling |
|---|---|
|
Direct inverse transform via the pipeline’s target scaler |
|
Each element is inverse-transformed individually; 4-D arrays (e.g. quantile or sample dimensions) are reshaped appropriately |
|
|
API Reference#
- class twiga.forecaster.core.TwigaForecaster(data_params, model_params, train_params, conformal_params=None)#
Bases:
BaseForecasterMachine Learning Forecaster for time series predictions.
This forecaster initializes a data pipeline and dynamically loads machine learning models based on provided configurations. The configurations can be specified as Pydantic models or dictionaries. Once the models are loaded, they can be trained, evaluated, and backtested.
Example
>>> from twiga.core.config import BaseModelConfig, DataPipelineConfig, ForecasterConfig >>> data_params = DataPipelineConfig(date_column="date", ...) >>> model_config = BaseModelConfig(name="linear", ...) >>> train_params = ForecasterConfig(...) >>> forecaster = TwigaForecaster(data_params, model_config, train_params) >>> forecaster.fit(train_df) >>> predictions, metrics = forecaster.evaluate_point_forecast(test_df)
- __init__(data_params, model_params, train_params, conformal_params=None)#
Initialize TwigaForecaster.
- Parameters:
data_params (
DataPipelineConfig) – Configuration for the data pipeline.model_params (
BaseModelConfig|list[BaseModelConfig] |dict|list[dict]) – Configuration for the model(s). Can be a single Pydantic config, a dictionary, or a list of either. Neural network configs with unset dims (num_target_feature,forecast_horizon,lookback_window_sizeequal to 0) are auto-populated from data_params. Base arch configs with adistributionfield set are automatically resolved to the corresponding probabilistic variant (e.g.MLPFConfig(distribution='normal')becomes anMLPFNormalConfig).train_params (
ForecasterConfig) – Training configuration parameters.conformal_params (
ConformalConfig|None) – Optional conformal prediction configuration.
- calibrate(calibrate_df=None, covariate_df=None, ensemble_strategy=None, ensemble_weights=None)#
Calibrate conformal prediction models using calibration data.
- Parameters:
calibrate_df (
DataFrame|None) – Calibration dataset. If None, uses the stored training data.covariate_df (
DataFrame|None) – Optional covariate dataset.ensemble_strategy (
str|None) – Strategy for combining model predictions.ensemble_weights (
dict[str,float] |None) – Weights for weighted ensemble strategy.
- Raises:
ValueError – If
conformal_paramsis not set.- Return type:
- explain(X, model_idx=0, n_background=100)#
Compute SHAP feature attributions for a fitted ML model.
Builds a
ShapExplainerfor the model at position model_idx inself.models, runs SHAP over X, and returns aShapResultwith values reshaped to(B, L, F)- one attribution per sample, per lookback step, per feature.Only ML models (
domain="ml") are supported. Neural-network models require gradient-based attribution and are not currently handled.- Parameters:
X (
ndarray) – Feature array of shape(B, L, F)as produced by the data pipeline (e.g. fromDataPipeline.transform()).model_idx (
int) – Index intoself.modelsof the model to explain. Defaults to0(the first / only model).n_background (
int) – Number of background samples forLinearExplainerandKernelExplainer. Ignored for tree models.
- Return type:
ShapResult- Returns:
ShapResultwith –values- SHAP array(B, L, F)feature_names- original F feature namestimestep_labels- L lookback labels ('t-L+1'…'t0')expected_value- SHAP base value (mean prediction)
- Raises:
IndexError – If model_idx is out of range.
RuntimeError – If no models are fitted or the domain is not
"ml".ImportError – If
shapis not installed.
Example
>>> result = forecaster.explain(X_test) >>> result.plot_importance(top_n=20) >>> importance = result.mean_importance()
Base Classes#
- class twiga.forecaster.abstract.AbstractForecaster#
Bases:
ABCAbstract base class for time series forecasters with default implementations.
Provides implementations for fitting, evaluating, and tuning models. Subclasses must implement model-specific methods.
- backtesting(data, train_ratio=1.0, start_dt=None, end_dt=None, verbose=True, trial=None, ensemble_strategy=None, ensemble_weights=None)#
Perform backtesting on the forecaster models.
- Parameters:
data (
DataFrame) – Complete dataset for backtesting.train_ratio (
float) – Ratio of data to use for training in backtesting.start_dt (
object|None) – Start date for backtesting, if any.verbose (
bool) – Whether to display detailed logs during backtesting.trial (
object|None) – Trial identifier or configuration, if any.ensemble_strategy (
str|None) – Strategy for combining model predictions, if any.ensemble_weights (
dict[str,float] |None) – Weights for weighted ensemble strategy, if any.
- Return type:
- Returns:
Tuple of concatenated predictions and backtesting metrics.
- Raises:
ValueError – If no models are available or no metrics are returned.
- evaluate(test_df=None, covariate_df=None, ensemble_strategy=None, ensemble_weights=None, prediction_fn=None, evaluation_fn=None, is_interval=False)#
Evaluate forecaster models on test data, supporting point or interval forecasts.
- Parameters:
test_df (
DataFrame|None) – Test dataset, if any. Uses default test data if None.covariate_df (
DataFrame|None) – Dataset with additional covariates, if any.ensemble_strategy (
str|None) – Strategy for combining model predictions, if any.ensemble_weights (
dict[str,float] |None) – Weights for weighted ensemble strategy, if any.prediction_fn (
Callable|None) – Function to generate predictions (e.g., predict or predict_interval).evaluation_fn (
Callable|None) – Function to evaluate predictions (e.g., evaluate_point_forecast or evaluate_interval_forecast).is_interval (
bool) – Whether the evaluation is for interval forecasts (adds lower/upper bounds).
- Return type:
- Returns:
Tuple of –
DataFrame with columns: timestamp, Model, target, forecast, Actual, [lower, upper] (if is_interval=True).
DataFrame with columns: Model, target, MetricName, Value, inference-time.
- Raises:
ValueError – If ground truth, timestamp, or prediction shapes do not match.
- evaluate_interval_forecast(test_df=None, covariate_df=None, ensemble_strategy=None, ensemble_weights=None)#
Evaluate interval forecasts (wrapper for evaluate).
- Parameters:
test_df (
DataFrame|None) – Test dataset, if any. Uses default test data if None.covariate_df (
DataFrame|None) – Dataset with additional covariates, if any.ensemble_strategy (
str|None) – Strategy for combining model predictions, if any.ensemble_weights (
dict[str,float] |None) – Weights for weighted ensemble strategy, if any.
- Return type:
- Returns:
Tuple of – - DataFrame with columns: timestamp, Model, target, forecast, Actual, lower, upper. - DataFrame with columns: Model, target, MetricName, Value, inference-time.
- evaluate_parametric_forecast(test_df=None, covariate_df=None, ensemble_strategy=None, ensemble_weights=None)#
Evaluate parametric probabilistic forecasts, computing NLL and point metrics.
Works with Gaussian ML models (
GAUSSCATBOOSTConfig) and neural parametric heads (MLPFConfig(distribution="normal"),"laplace","gamma", etc.).NLL is computed under a Normal distribution assumption unless the model supplies a
"log_likelihood"key (neural parametric heads do this automatically viaForecastResult).
- evaluate_point_forecast(test_df=None, covariate_df=None, ensemble_strategy=None, ensemble_weights=None)#
Evaluate point forecasts (wrapper for evaluate).
- Parameters:
test_df (
DataFrame|None) – Test dataset, if any. Uses default test data if None.covariate_df (
DataFrame|None) – Dataset with additional covariates, if any.ensemble_strategy (
str|None) – Strategy for combining model predictions, if any.ensemble_weights (
dict[str,float] |None) – Weights for weighted ensemble strategy, if any.
- Return type:
- Returns:
Tuple of – - DataFrame with columns: timestamp, Model, target, forecast, Actual. - DataFrame with columns: Model, target, MetricName, Value, inference-time.
- evaluate_quantile_forecast(test_df=None, covariate_df=None, ensemble_strategy=None, ensemble_weights=None)#
Evaluate quantile forecasts, computing pinball loss, calibration error, and sharpness.
Requires QR models (e.g.
QRXGBOOSTConfig,MLPGAMConfig(distribution="qr")).
- fit(train_df, val_df=None, train_ratio=1.0)#
Fit the forecaster models using training and optional validation data.
- forecast(test_df, covariate_df=None, ensemble_strategy=None, ensemble_weights=None)#
Generate predictions and return them as a typed
ForecastCollection.Unlike
predict(), this method wraps each model’s output in aForecastResultwith timestamps and target names populated, enabling structured downstream access (to_dataframe(),evaluate(), etc.).The test data must contain the target column(s) so that timestamps can be aligned with the pipeline’s sequence layout. For forward-looking prediction where actuals are unavailable, use
predict()directly.- Parameters:
- Return type:
- Returns:
ForecastCollectioncontaining oneForecastResultper model (plus an optional"Ensemble"entry when ensemble_strategy is set).
- abstractmethod get_ground_truth(test_df, **kwargs)#
Retrieve ground truth data.
- get_model_from_registry(model_params)#
Load and instantiate models based on the provided configurations.
- Parameters:
model_params (
list[BaseModelConfig] |list[dict]) – List of model configurations as Pydantic models or dictionaries.- Raises:
TypeError – If model_params contains invalid types or mismatched configurations.
ValueError – If a dictionary configuration lacks a model name.
- Return type:
- predict(test_df, covariate_df=None, ensemble_strategy=None, ensemble_weights=None, prepare_test_data=True)#
Generate point predictions using the ensemble of forecasting models.
- Parameters:
test_df (
DataFrame) – Test dataset.covariate_df (
DataFrame|None) – Dataset with additional covariates, if any.ensemble_strategy (
str|None) – Strategy for combining model predictions, if any.ensemble_weights (
dict[str,float] |None) – Weights for weighted ensemble strategy, if any.prepare_test_data (
bool) – Whether to preprocess the test data.
- Return type:
- Returns:
Tuple of – - Dictionary mapping model names to 3D NumPy array predictions. - Dictionary mapping model names to inference times.
- Raises:
ValueError – If predictions are not 3D NumPy arrays.
- predict_interval(test_df, covariate_df=None, ensemble_strategy=None, ensemble_weights=None, prepare_test_data=True)#
Generate conformal interval predictions for the given test data.
- Parameters:
test_df (
DataFrame) – Test dataset.covariate_df (
DataFrame|None) – Dataset with additional covariates, if any.ensemble_strategy (
str|None) – Strategy for combining model predictions, if any.ensemble_weights (
dict[str,float] |None) – Weights for weighted ensemble strategy, if any.prepare_test_data (
bool) – Whether to preprocess the test data.
- Return type:
tuple[dict[str,tuple[ndarray,ndarray,ndarray]],dict[str,float]]- Returns:
Tuple of – - Dictionary mapping model names to tuples of (lower, forecast, upper) arrays. - Dictionary mapping model names to inference times.
- Raises:
ValueError – If conformal parameters or models are not set, or predictions are not 3D NumPy arrays.
- abstractmethod prepare_test_data(test_df)#
Prepare test data for evaluation.
- tune(train_df, val_df, num_trials=10, reduction_factor=3, patience=10, load_if_exists=True, initial_params=None, direction='minimize', sampler=None, base_pruner=None, objective_metric=None, calib_df=None, conformal_params=None)#
Perform hyperparameter tuning and update models with optimal parameters.
- Parameters:
train_df (
DataFrame) – DataFrame containing training data.val_df (
DataFrame) – DataFrame containing validation data.num_trials (
int) – Number of trials for hyperparameter tuning.reduction_factor (
int) – Reduction factor for the tuning pruner.patience (
int) – Patience for the tuning process.load_if_exists (
bool) – If True, load an existing tuning study if available.initial_params (
dict|None) – Initial parameters for tuning.direction (
str) – Direction of optimization, either ‘minimize’ or ‘maximize’.sampler (
object|None) – Sampler object for hyperparameter sampling.base_pruner (
object|None) – Pruner object for early stopping.objective_metric (
str|None) – Column name in the evaluation metrics DataFrame to optimise. None defaults to the first metric inself.metrics(or'mae'as a fallback). Example:'rmse','smape'.calib_df (
DataFrame|None) – Hold-out calibration dataset. When provided alongside conformal parameters, each trial fits → calibrates → evaluates interval metrics so thatobjective_metriccan target an interval score (e.g.'winkler','picp').conformal_params (
ConformalConfig|None) – Conformal prediction configuration to use during tuning. If None, uses the forecaster’s existingconformal_paramsattribute. Ignored whencalib_dfis None.
- Return type:
- class twiga.forecaster.base.BaseForecaster(split_freq='months', test_size=1, train_size=1, gap=0, domain='ml', stride=None, window='expanding', date_column='timestamp', num_splits=None, project_name='experiment', file_name=None, seed=42, root_dir='../', metrics=None, checkpoints_path=None)#
Bases:
TimeBasedCV,AbstractForecaster,ABCBase forecaster class that provides model training, checkpointing,prediction, and evaluation capabilities.
Example
>>> class MyForecaster(BaseForecaster): ... def predict(self, test_df: pd.DataFrame, covariate_df: pd.DataFrame | None) -> dict: ... # Implement prediction logic here ... return {"loc": np.zeros((10, 1))} >>> forecaster = MyForecaster(split_freq="months", test_size=1, train_size=1, project_name="my-project") >>> forecaster.fit(train_df) >>> results_df, metrics_df = forecaster.evaluate(test_df)
- __init__(split_freq='months', test_size=1, train_size=1, gap=0, domain='ml', stride=None, window='expanding', date_column='timestamp', num_splits=None, project_name='experiment', file_name=None, seed=42, root_dir='../', metrics=None, checkpoints_path=None)#
Initialize the BaseForecaster.
- Parameters:
split_freq (
str) – Frequency of the data, e.g., “months”.test_size (
int) – Number of periods to forecast.train_size (
int) – Training window size.gap (
int) – Gap between training and forecast periods.window (
str) – Window type (e.g., “expanding”).date_column (
str) – Name of the timestamp column.domain (
str) – Domain of the data (e.g., “ml”).num_splits (
int|None) – Number of splits for cross-validation.project_name (
str) – Experiment name.file_name (
str|None) – Optional file name for checkpoints.seed (
int) – Random seed.root_dir (
str) – Root directory for results.trial (Any | None) – Optional trial object (for hyperparameter optimization).
metrics (
tuple[str] |list[str] |None) – List of metrics to evaluate.checkpoints_path (
str|None) – Explicit checkpoint directory. When set, overrides the path derived from root_dir/project_name/model_type and is available immediately - before fit() is called.
- create_optuna_study(num_trials=10, reduction_factor=3, patience=2, load_if_exists=True, base_pruner=None, sampler=None, direction='minimize')#
Create or load an Optuna study for hyperparameter optimization.
This method configures an Optuna study with a Hyperband pruner wrapped by a PatientPruner (if no custom pruner is provided) and a TPESampler with the instance’s seed (if no custom sampler is provided). The study is stored in a SQLite database within the results directory.
- Parameters:
num_trials (
int) – Number of trials for the study (unused in study setup).reduction_factor (
int) – Reduction factor for HyperbandPruner.patience (
int) – Patience for PatientPruner.load_if_exists (
bool) – Whether to load an existing study if it exists.base_pruner (
Any|None) – Custom pruner to use instead of the default.sampler (
Any|None) – Custom sampler to use instead of the default.direction (
str) – Direction of optimization, either “minimize” or “maximize”.
- Returns:
optuna.Study – Configured Optuna study.
- create_results_df(time_stamp, ground_truth, predictions, target_feature, date_column)#
Create a results DataFrame with the timestamp index, ground truth, and forecasted values.
- Parameters:
- Return type:
- Returns:
pd.DataFrame – DataFrame with ground truth and forecasted values indexed by timestamp.
Example
>>> results_df = forecaster.create_results_df(ts, gt, pred, ["temp"], "Date")
- get_ground_truth(test_df=None)#
Load the latest checkpoint and return inverse-scaled ground truth sequences.
Side effect: calls
load_checkpoint_and_datapipe(), which callson_load_checkpoint()and may overwriteself.modelandself.data_pipelinefrom disk. Use this method during evaluation (where the checkpoint state is desired). For pure data extraction without checkpoint side effects - e.g. insideforecast()- useself.data_pipeline.get_ground_truth_sequences()directly instead.
- load_checkpoint_and_datapipe()#
Load the model checkpoint and restore the data pipeline from disk.
- Return type:
- prepare_test_data(test_df=None)#
Prepares and returns a formatted test DataFrame for prediction.
This method checks the consistency of the training DataFrame and test DataFrame, concatenates them if both are provided, and sorts the resulting DataFrame by the date column defined in the data pipeline.
- Parameters:
test_df (pandas.DataFrame, optional) – A DataFrame containing test data. If not provided, the training DataFrame is used as test data.
- Raises:
ValueError – If the training DataFrame length does not match the expected lookback window size plus maximum data drop.
ValueError – If neither training data nor test data is provided.
- Returns:
pandas.DataFrame – A sorted DataFrame ready for prediction.
Forecast Result Types#
- class twiga.forecaster.result.ForecastResult(timestamps, loc, targets, model_name, kind, ground_truth=None, scale=None, quantiles=None, quantile_levels=None, conf_level=None, samples=None, lower=None, upper=None, inference_time=0.0)#
Bases:
objectContainer for one model’s forecast output.
- Variables:
timestamps – shape (n_batch, n_horizon, n_targets)
loc – point predictions (mean/median), shape (n_batch, n_horizon, n_targets)
targets – ordered list of target variable names
model_name – human-readable model identifier
kind – determines which optional arrays are expected and how to convert
ground_truth – optional, same shape as loc
scale – parametric std-dev / scale, same shape as loc
quantiles – shape (n_batch, n_q, n_horizon, n_targets)
quantile_levels – corresponding probability levels (e.g. [0.1, 0.5, 0.9])
samples – shape (n_batch, n_samples, n_horizon, n_targets)
lower – lower bound, same shape as loc
upper – upper bound, same shape as loc
inference_time – inference duration in seconds
conf_level
metric_name
- evaluate(ground_truth=None, **kwargs)#
Evaluate forecast against ground truth using kind-appropriate metrics.
Forwards to
twiga.core.metrics.evaluate_forecast().- Parameters:
- Return type:
- Returns:
DataFrame of per-day, per-target metrics.
- Raises:
ValueError – if no ground truth is available.
- kind: ForecastKind#
- to_dataframe(fmt='long')#
Convert forecast to tidy DataFrame.
Always includes: timestamp, target, model, forecast. Optional: actual (when ground_truth is present).
Additional columns depend on forecast kind:
POINT: no extra columnsPARAMETRIC: scaleINTERVAL: lower, upperQUANTILE(fmt=”wide”): q_0.10, q_0.50, …QUANTILE(fmt=”long”): q_level, quantile_forecastSAMPLES: q_0.10, q_0.50, q_0.90 (empirical quantiles)
- Parameters:
fmt (
str) – “long” (default) or “wide” - only affects QUANTILE- Return type:
- Returns:
pandas DataFrame in long or wide format
- Raises:
ValueError – if fmt is invalid
- class twiga.forecaster.result.ForecastCollection(results=<factory>)#
Bases:
objectCollection of ForecastResult objects from multiple models.
- evaluate(**kwargs)#
Evaluate all models and return a combined metrics DataFrame.
Calls
ForecastResult.evaluate()on each result and concatenates the output, adding a"Model"column derived from each result’smodel_name. Ground truth must be attached to each result (i.e.forecast()must have been called with test data that contains the target column).- Parameters:
**kwargs – Forwarded to each
ForecastResult.evaluate()call (e.g.metric_names,freq).- Return type:
- Returns:
Combined metrics DataFrame with a
"Model"column.- Raises:
ValueError – If the collection is empty or any result lacks ground truth.
- results: dict[str, ForecastResult]#
- to_dataframe(fmt='long')#
Concatenate all model forecasts into one DataFrame.
- Parameters:
fmt (
str) – passed to each ForecastResult.to_dataframe()- Return type:
- Returns:
Combined long-format DataFrame
- Raises:
ValueError – if collection is empty
Registry#
- twiga.forecaster.registry.get_model(name, domain=None)#
Lazily load the model and config classes from models/ml/ or models/nn/.
- Parameters:
- Return type:
- Returns:
tuple[Type, Type] – A tuple of (model_class, config_class).
- Raises:
ValueError – If the model is not found in the specified or default domains.
Ensemble#
- twiga.forecaster.ensemble.compute_ensemble_predictions(predictions, model_names, ensemble_strategy, ensemble_weights=None)#
Generate ensemble predictions by combining predictions from multiple models.
- Parameters:
predictions (
list[ndarray]) – List of model predictions, where each prediction is a 3D NumPy array with shape (num_samples, horizon, num_targets).model_names (
list[str]) – List of model names corresponding to the predictions.ensemble_strategy (
EnsembleStrategy) – Strategy for combining predictions, one of EnsembleStrategy.MEAN, EnsembleStrategy.MEDIAN, or EnsembleStrategy.WEIGHTED.ensemble_weights (
dict[str,float] |None) – Dictionary mapping model names to their weights for the weighted ensemble strategy. Required if ensemble_strategy is EnsembleStrategy.WEIGHTED. Defaults to None.
- Return type:
- Returns:
A 3D NumPy array of ensemble predictions with shape (num_samples, horizon, num_targets).
- Raises:
ValueError – If predictions is empty, prediction shapes are inconsistent, weights are required but not provided, the number of weights does not match the number of models, or the ensemble strategy is unknown.
See Also#
Quick Start Guide - end-to-end walkthrough
Configuration System -
DataPipelineConfig,ForecasterConfig,ConformalConfigData Pipeline - feature engineering and scaling
Metrics - point and interval evaluation functions
Backtesting - cross-validation strategy
Models - available model implementations
Conformal Prediction - prediction intervals
Hyperparameter Tuning - Optuna integration