Key Concepts#
Time Series Terminology#
Term |
Description |
Config Field |
|---|---|---|
Lookback window |
Number of past time steps fed to the model as input |
|
Forecast horizon |
Number of future time steps to predict |
|
Period |
Sampling frequency (pandas offset alias, e.g. |
|
Target feature |
The variable(s) to forecast |
|
Historical features |
Features whose future values are unknown (lookback only) |
|
Calendar features |
Cyclical temporal features derived from the timestamp (e.g. hour, day of week) |
|
Exogenous features |
External features known over the full lookback + forecast horizon |
|
Future covariates |
External features known only over the forecast horizon |
|
Feature availability across the time axis#
The four feature types differ in which portion of the time axis they cover.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e0f8f4', 'primaryTextColor': '#263238', 'primaryBorderColor': '#0f718e', 'lineColor': '#0f718e', 'clusterBkg': '#f0fdf9', 'clusterBorder': '#00bfa5', 'titleColor': '#263238'}}}%%
graph LR
subgraph PAST["Lookback Window ( t-L … t )"]
TP["target_feature\nas lagged input"]
HF["historical_features\npast only"]
CP["calendar_features\nderived from timestamp"]
EP["exogenous_features\nfull window known"]
end
subgraph FUTURE["Forecast Horizon ( t+1 … t+H )"]
TF["target_feature\npredicted output"]
CF["calendar_features\nderived from timestamp"]
EF["exogenous_features\nfull horizon known"]
FC["future_covariates\nhorizon only"]
end
PAST -->|"t → t+1"| FUTURE
classDef target fill:#0f718e,stroke:#0f718e,color:#fff,rx:6
classDef hist fill:#263238,stroke:#263238,color:#fff,rx:6
classDef cal fill:#00bfa5,stroke:#00897b,color:#fff,rx:6
classDef exog fill:#e0f8f4,stroke:#0f718e,color:#263238,rx:6
classDef fcov fill:#f0fdf9,stroke:#00bfa5,color:#263238,rx:6
class TP,TF target
class HF hist
class CP,CF cal
class EP,EF exog
class FC fcov
Feature type |
Lookback |
Forecast horizon |
Typical examples |
|---|---|---|---|
|
Used as lagged input |
Predicted output |
Electricity load, solar generation |
|
Available |
Not available |
Sensor readings without NWP forecast |
|
Derived from timestamp |
Derived from timestamp |
Hour of day, day of week, month |
|
Available |
Available (full horizon) |
NWP weather forecast, scheduled output |
|
Not used |
Available (horizon only) |
Day-ahead price signal, planned events |
Why the distinction matters
historical_features can only contribute lag/rolling statistics — their future values are unknown. exogenous_features and future_covariates are passed directly into the forecast window so the model conditions on their future values. calendar_features are always derivable from the timestamp and computed automatically.
Data Format Requirements#
Twiga expects a pandas DataFrame with:
A datetime column named
"timestamp"by default (configurable viadate_column)One or more target columns — the variable(s) to forecast
Optional feature columns — any combination of the four feature types
import pandas as pd
df = pd.DataFrame({
"timestamp": pd.date_range("2024-01-01", periods=1000, freq="1h"),
"load_mw": [...], # target_feature
"temperature": [...], # exogenous_features — NWP forecast known for full horizon
"wind_speed": [...], # future_covariates — known only over forecast horizon
"irradiance": [...], # historical_features — no future forecast available
# calendar features (hour, dayofweek, etc.) are derived automatically from timestamp
})
The config tells the pipeline which columns play which role:
from twiga.core.config import DataPipelineConfig
data_config = DataPipelineConfig(
target_feature="load_mw",
period="1h",
lookback_window_size=168,
forecast_horizon=48,
historical_features=["irradiance"],
calendar_features=["hour", "dayofweek"],
exogenous_features=["temperature"],
future_covariates=["wind_speed"],
)
Note
The DataFrame must be sorted by timestamp with a regular frequency. Handle missing values before passing data to the pipeline.
Configuration-Driven Design#
Twiga follows a configuration-as-code pattern. Every component is configured via a Pydantic dataclass that validates inputs at construction time. The three core configs are:
DataPipelineConfig#
Controls data preprocessing — what features to engineer, how to scale, and how to create sequences.
from twiga.core.config import DataPipelineConfig
data_config = DataPipelineConfig(
target_feature="load_mw",
period="1h",
lookback_window_size=168, # 7 days of hourly data
forecast_horizon=48, # predict 2 days ahead
historical_features=["irradiance"], # past-only, no future forecast
calendar_features=["hour", "dayofweek"],
exogenous_features=["temperature"], # known over full horizon
future_covariates=["wind_speed"], # known only for forecast window
lags=[1, 24, 48, 168],
windows=[24, 48],
window_funcs=["mean", "std"],
)
See Configuration System for the full field reference.
ForecasterConfig#
Controls training orchestration — backtesting splits, project naming, and output directories.
from twiga.core.config import ForecasterConfig
train_config = ForecasterConfig(
split_freq="months",
train_size=6,
test_size=1,
gap=0,
window="expanding",
project_name="MyProject",
seed=42,
)
Model Configs#
Each model has its own config class inheriting from BaseModelConfig (ML/baseline) or NeuralModelConfig (NN):
from twiga.models.ml.xgboost_model import XGBOOSTConfig
xgb_config = XGBOOSTConfig(
device="cpu",
random_state=42,
)
Model Domains#
Twiga organizes models into three domains:
Domain |
Base Class |
Training Framework |
Models |
|---|---|---|---|
|
|
scikit-learn API |
Naive, SeasonalNaive, WindowAverage, Drift, ContextParrot |
|
|
scikit-learn API |
CatBoost, XGBoost, LightGBM, RandomForest, LinearReg, NGBoost variants, QR variants |
|
|
PyTorch Lightning |
MLPF, MLPGAM, MLPGAF, N-HiTS, GANF and their probabilistic variants |
The domain is set automatically from the model config’s domain field and controls how TwigaForecaster handles training, checkpointing, and prediction.
Baseline models require no training — fit() only stores metadata — making them fast reference points for computing skill scores. See Baseline Models and Model Catalog for the full list.
Forecasting Types#
Twiga supports three types of forecasting:
Point Forecasting#
Produces a single predicted value for each future time step. All models support point forecasting by default.
predictions = forecaster.predict(test_df)
Probabilistic Forecasting#
Produces a distribution of predicted values, either via quantile regression, parametric distributions, or a distribution-free conformal step.
ML probabilistic models:
QRCATBOOSTModel,QRXGBOOSTModel,QRLIGHTGBMModel— quantile regressionGAUSSCATBOOSTModel— Gaussian (mean + sigma) output
NN probabilistic models use a composable backbone/head design. Every architecture (MLPF, MLPGAM, MLPGAF, NHITS) can be paired with any distribution head by selecting the appropriate config:
Distribution |
Use case |
Example config |
|---|---|---|
Normal |
Symmetric, unbounded targets |
|
Laplace |
Heavy-tailed, outlier-robust |
|
LogNormal |
Strictly positive, right-skewed |
|
Gamma |
Strictly positive, flexible skew |
|
Beta |
Bounded [0, 1] targets |
|
StudentT |
Very heavy tails |
|
QR |
Fixed-grid quantile regression |
|
FPQR |
Adaptive quantile proposals |
|
CRC |
Conformal residual coverage |
|
See Distribution Families for the backbone/head architecture and Quantile Regression for the QR-specific approach.
Interval Forecasting#
Produces prediction intervals (lower and upper bounds) via conformal prediction. This is a post-hoc calibration step applied to any trained model.
# Calibrate conformal prediction on held-out data
forecaster.calibrate(calibration_df)
# Generate prediction intervals
intervals = forecaster.predict_interval(test_df)
Next: Quick Start Guide