Key Concepts#

Time Series Terminology#

Term

Description

Config Field

Lookback window

Number of past time steps fed to the model as input

DataPipelineConfig.lookback_window_size

Forecast horizon

Number of future time steps to predict

DataPipelineConfig.forecast_horizon

Period

Sampling frequency (pandas offset alias, e.g. "1H", "30min")

DataPipelineConfig.period

Target feature

The variable(s) to forecast

DataPipelineConfig.target_feature

Historical features

Features whose future values are unknown (lookback only)

DataPipelineConfig.historical_features

Calendar features

Cyclical temporal features derived from the timestamp (e.g. hour, day of week)

DataPipelineConfig.calendar_features

Exogenous features

External features known over the full lookback + forecast horizon

DataPipelineConfig.exogenous_features

Future covariates

External features known only over the forecast horizon

DataPipelineConfig.future_covariates

Feature availability across the time axis#

The four feature types differ in which portion of the time axis they cover.

        %%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0f8f4', 'primaryTextColor': '#263238', 'primaryBorderColor': '#0f718e', 'lineColor': '#0f718e', 'clusterBkg': '#f0fdf9', 'clusterBorder': '#00bfa5', 'titleColor': '#263238'}}}%%
graph LR
    subgraph PAST["Lookback Window  ( t-L … t )"]
        TP["target_feature\nas lagged input"]
        HF["historical_features\npast only"]
        CP["calendar_features\nderived from timestamp"]
        EP["exogenous_features\nfull window known"]
    end
    subgraph FUTURE["Forecast Horizon  ( t+1 … t+H )"]
        TF["target_feature\npredicted output"]
        CF["calendar_features\nderived from timestamp"]
        EF["exogenous_features\nfull horizon known"]
        FC["future_covariates\nhorizon only"]
    end
    PAST -->|"t → t+1"| FUTURE

    classDef target fill:#0f718e,stroke:#0f718e,color:#fff,rx:6
    classDef hist   fill:#263238,stroke:#263238,color:#fff,rx:6
    classDef cal    fill:#00bfa5,stroke:#00897b,color:#fff,rx:6
    classDef exog   fill:#e0f8f4,stroke:#0f718e,color:#263238,rx:6
    classDef fcov   fill:#f0fdf9,stroke:#00bfa5,color:#263238,rx:6

    class TP,TF target
    class HF hist
    class CP,CF cal
    class EP,EF exog
    class FC fcov
    

Feature type

Lookback

Forecast horizon

Typical examples

target_feature

Used as lagged input

Predicted output

Electricity load, solar generation

historical_features

Available

Not available

Sensor readings without NWP forecast

calendar_features

Derived from timestamp

Derived from timestamp

Hour of day, day of week, month

exogenous_features

Available

Available (full horizon)

NWP weather forecast, scheduled output

future_covariates

Not used

Available (horizon only)

Day-ahead price signal, planned events

Why the distinction matters

historical_features can only contribute lag/rolling statistics — their future values are unknown. exogenous_features and future_covariates are passed directly into the forecast window so the model conditions on their future values. calendar_features are always derivable from the timestamp and computed automatically.

Data Format Requirements#

Twiga expects a pandas DataFrame with:

  1. A datetime column named "timestamp" by default (configurable via date_column)

  2. One or more target columns — the variable(s) to forecast

  3. Optional feature columns — any combination of the four feature types

import pandas as pd

df = pd.DataFrame({
    "timestamp":   pd.date_range("2024-01-01", periods=1000, freq="1h"),
    "load_mw":     [...],   # target_feature
    "temperature": [...],   # exogenous_features  — NWP forecast known for full horizon
    "wind_speed":  [...],   # future_covariates   — known only over forecast horizon
    "irradiance":  [...],   # historical_features — no future forecast available
    # calendar features (hour, dayofweek, etc.) are derived automatically from timestamp
})

The config tells the pipeline which columns play which role:

from twiga.core.config import DataPipelineConfig

data_config = DataPipelineConfig(
    target_feature="load_mw",
    period="1h",
    lookback_window_size=168,
    forecast_horizon=48,
    historical_features=["irradiance"],
    calendar_features=["hour", "dayofweek"],
    exogenous_features=["temperature"],
    future_covariates=["wind_speed"],
)

Note

The DataFrame must be sorted by timestamp with a regular frequency. Handle missing values before passing data to the pipeline.

Configuration-Driven Design#

Twiga follows a configuration-as-code pattern. Every component is configured via a Pydantic dataclass that validates inputs at construction time. The three core configs are:

DataPipelineConfig#

Controls data preprocessing — what features to engineer, how to scale, and how to create sequences.

from twiga.core.config import DataPipelineConfig

data_config = DataPipelineConfig(
    target_feature="load_mw",
    period="1h",
    lookback_window_size=168,           # 7 days of hourly data
    forecast_horizon=48,                # predict 2 days ahead
    historical_features=["irradiance"], # past-only, no future forecast
    calendar_features=["hour", "dayofweek"],
    exogenous_features=["temperature"], # known over full horizon
    future_covariates=["wind_speed"],   # known only for forecast window
    lags=[1, 24, 48, 168],
    windows=[24, 48],
    window_funcs=["mean", "std"],
)

See Configuration System for the full field reference.

ForecasterConfig#

Controls training orchestration — backtesting splits, project naming, and output directories.

from twiga.core.config import ForecasterConfig

train_config = ForecasterConfig(
    split_freq="months",
    train_size=6,
    test_size=1,
    gap=0,
    window="expanding",
    project_name="MyProject",
    seed=42,
)

Model Configs#

Each model has its own config class inheriting from BaseModelConfig (ML/baseline) or NeuralModelConfig (NN):

from twiga.models.ml.xgboost_model import XGBOOSTConfig

xgb_config = XGBOOSTConfig(
    device="cpu",
    random_state=42,
)

Model Domains#

Twiga organizes models into three domains:

Domain

Base Class

Training Framework

Models

"baseline"

BaseRegressor

scikit-learn API

Naive, SeasonalNaive, WindowAverage, Drift, ContextParrot

"ml"

BaseRegressor

scikit-learn API

CatBoost, XGBoost, LightGBM, RandomForest, LinearReg, NGBoost variants, QR variants

"nn"

BaseNeuralForecast

PyTorch Lightning

MLPF, MLPGAM, MLPGAF, N-HiTS, GANF and their probabilistic variants

The domain is set automatically from the model config’s domain field and controls how TwigaForecaster handles training, checkpointing, and prediction.

Baseline models require no training — fit() only stores metadata — making them fast reference points for computing skill scores. See Baseline Models and Model Catalog for the full list.

Forecasting Types#

Twiga supports three types of forecasting:

Point Forecasting#

Produces a single predicted value for each future time step. All models support point forecasting by default.

predictions = forecaster.predict(test_df)

Probabilistic Forecasting#

Produces a distribution of predicted values, either via quantile regression, parametric distributions, or a distribution-free conformal step.

ML probabilistic models:

  • QRCATBOOSTModel, QRXGBOOSTModel, QRLIGHTGBMModel — quantile regression

  • GAUSSCATBOOSTModel — Gaussian (mean + sigma) output

NN probabilistic models use a composable backbone/head design. Every architecture (MLPF, MLPGAM, MLPGAF, NHITS) can be paired with any distribution head by selecting the appropriate config:

Distribution

Use case

Example config

Normal

Symmetric, unbounded targets

MLPFNormalConfig, NHITSNormalConfig

Laplace

Heavy-tailed, outlier-robust

MLPFLaplaceConfig

LogNormal

Strictly positive, right-skewed

MLPGAMLogNormalConfig

Gamma

Strictly positive, flexible skew

MLPGAFGammaConfig

Beta

Bounded [0, 1] targets

NHITSBetaConfig

StudentT

Very heavy tails

MLPGAMStudentTConfig

QR

Fixed-grid quantile regression

MLPFQRConfig, NHITSQRConfig

FPQR

Adaptive quantile proposals

MLPGAMFPQRConfig

CRC

Conformal residual coverage

MLPGAMCRCConfig

See Distribution Families for the backbone/head architecture and Quantile Regression for the QR-specific approach.

Interval Forecasting#

Produces prediction intervals (lower and upper bounds) via conformal prediction. This is a post-hoc calibration step applied to any trained model.

# Calibrate conformal prediction on held-out data
forecaster.calibrate(calibration_df)

# Generate prediction intervals
intervals = forecaster.predict_interval(test_df)

Next: Quick Start Guide