Distribution Families#
Source Files
twiga/distributions/nn/parametric.pytwiga/distributions/nn/quantile.pytwiga/distributions/nn/fpquantile.pytwiga/distributions/nn/residual_conformal.pytwiga/models/nn/prob/core.pytwiga/models/ml/gausscatboost_model.py-GAUSSCATBOOSTConfig/GAUSSCATBOOSTModeltwiga/models/ml/ngboostnormal_model.py-NGBOOSTNORMALConfig/NGBOOSTNORMALModeltwiga/models/ml/ngboostlognormal_model.py-NGBOOSTLOGNORMALConfig/NGBOOSTLOGNORMALModeltwiga/models/ml/ngboostexponential_model.py-NGBOOSTEXPONENTIALConfig/NGBOOSTEXPONENTIALModeltwiga/models/ml/prob/base_ngboost.py-BaseNGBoostRegressortwiga/distributions/ml/utils.py
Twiga implements several distribution families for probabilistic neural forecasting. All NN distributions slot into the composable backbone/head architecture described below.
Backbone/Head Architecture#
Probabilistic neural models in Twiga decouple the backbone (which learns to encode the time series) from the distribution head (which maps the latent vector to a forecast). This separation lives in twiga/models/nn/prob/core.py.
graph LR
X["Input (B, T, F)"] --> BB["Backbone\n(NHITSNetwork / MLPFNetwork / …)"]
BB -->|"encode(x) → z (B, encode_size)"| HEAD["Distribution Head\n(Normal / QR / …)"]
HEAD --> OUT["Distribution Parameters\n(loc, scale) / quantiles / …"]
Backbone contract - any backbone must implement:
encode(x) → Tensor- maps raw input(B, T, F)to latent(B, encode_size)encode_size: intproperty - width of the latent vectorpenalty_dict(epoch) → dict- returns regularisation terms (Lasso, gate sparsity …)
Head contract - any distribution head must implement:
forward(z) → tuple- maps latentzto distribution parametersstep(z, y, metric_fn, epoch) → (loss, metric)- computes training loss in one call
ProbabilisticModel (a LightningModule) wires them together and automatically injects hidden_size=backbone.encode_size into the head constructor so callers never specify it manually.
# Under the hood - TwigaForecaster does this for you when you pass a config like MLPFNormalConfig
from twiga.models.nn.prob.core import ProbabilisticModel
from twiga.models.nn.net.mlpf import MLPFNetwork
from twiga.distributions.nn.parametric import NormalDistribution
model = ProbabilisticModel(
backbone_cls=MLPFNetwork,
backbone_kwargs={...}, # num_target_feature, forecast_horizon, etc.
head_cls=NormalDistribution,
head_kwargs={"num_target_output": 1, "forecast_horizon": 48},
# hidden_size injected automatically from backbone.encode_size
)
Adding a new distribution requires implementing only the head interface - the backbone and training loop are unchanged.
NN Distributions#
Parametric Distributions#
All parametric heads live in twiga/distributions/nn/parametric.py. Each maps the latent vector z (B, encode_size) to two (or three for Student-T) distribution parameters and trains via negative log-likelihood.
Class |
Distribution |
Output params |
Constraint |
Best for |
|---|---|---|---|---|
|
Normal |
|
σ > 0 |
Symmetric, unbounded targets |
|
Laplace |
|
b > 0 |
Heavy-tailed, outlier-robust |
|
LogNormal |
|
σ > 0 |
Strictly positive, right-skewed |
|
Gamma |
|
α, β > 0 |
Strictly positive, flexible skew |
|
Beta |
|
α, β > 0 |
Bounded [0, 1] targets |
|
Student-T |
|
σ > 0, ν > 2 |
Very heavy tails |
All share the same interface:
from twiga.distributions.nn.parametric import NormalDistribution
dist = NormalDistribution(
num_target_output=1,
hidden_size=256, # injected automatically by ProbabilisticModel
forecast_horizon=48,
out_activation_function=None,
)
# Forward pass → distribution parameters
loc, scale = dist(z) # z shape: (B, encode_size)
# Get PyTorch distribution object
d = dist.get_distribution(loc, scale)
# Negative log-likelihood for training
nll = dist.get_log_likelihood(loc, scale, targets)
# training_step equivalent
loss, metric = dist.step(z, targets, metric_fn, epoch)
Student-T requires min_df
StudentTDistribution accepts an additional min_df: float parameter (default 2.1) that clips the predicted degrees-of-freedom from below to ensure finite variance.
Quantile Distribution#
The QRDistribution class (twiga/distributions/nn/quantile.py) outputs quantile predictions from a neural network.
from twiga.distributions.nn.quantile import QRDistribution
dist = QRDistribution(
quantiles=[0.05, 0.25, 0.5, 0.75, 0.95],
num_outputs=1,
hidden_size=256,
horizon=48,
loss_fn="pinball", # or "huber-pinball"
kappa=0.5,
conf_level=0.05,
)
# Forward pass: shape (batch, num_quantiles, horizon, num_outputs)
quantile_values = dist(hidden_repr)
# Forecast: returns (loc, quantile_values)
loc, quantiles = dist.forecast(hidden_repr)
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Quantile levels to predict |
|
|
|
Number of target variables |
|
|
|
Input hidden dimension |
|
|
|
Forecast horizon |
|
|
|
Loss function: |
|
|
|
Huber transition parameter |
|
|
|
Confidence level for bounds |
|
|
|
Numerical stability |
|
|
|
Optional output activation |
The step() method computes quantile loss and metrics during training:
loss, metric = dist.step(batch, metric_fn, epoch)
Median Extraction#
from twiga.distributions.nn.quantile import get_median_quantile
# Extract median from quantile predictions
median = get_median_quantile(quantile_hats, probs=[0.05, 0.25, 0.5, 0.75, 0.95])
# Shape: (batch, horizon, num_targets)
If \(\tau = 0.5\) is in the quantile list, it is extracted directly. Otherwise, linear interpolation between the two closest quantiles is used.
Full Parameterised Quantile Regression (FPQR)#
FPQRDistribution (twiga/distributions/nn/fpquantile.py) extends quantile regression by treating the quantile levels themselves as learnable. Rather than predicting at a fixed grid of quantiles, a QuantileProposal network proposes the quantile levels adaptively from the latent representation, which are then encoded via CosinetauEmbedding before producing forecast values.
References
A. Faustine and A. Moshi, “Full Parameterized Quantile Function for Probabilistic Forecasting,” IEEE Access, 2024. DOI: 10.1109/ACCESS.2024.3402001
A. Faustine and L. Aversano, “Adaptive Quantile Learning for Long-Term Probabilistic Forecasting,” IEEE Transactions, 2022. DOI: 10.1109/ACCESS.2022.3143387
graph LR
Z["z (B, hidden_size)"] --> QP["QuantileProposal\n→ taus, tau_hats, entropies"]
Z --> CE["CosinetauEmbedding\n(cosine τ embedding)"]
QP --> CE
CE --> QO["quantile_output\n→ (B, n_quantiles, horizon, outputs)"]
QP --> LOSS["QuantileProposalLoss\n+ QuantileLoss − entropy"]
QuantileProposal generates the quantile grid from the backbone’s latent vector:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Number of quantile levels to propose |
|
|
|
Input latent dimension |
|
|
|
Dropout for regularisation |
|
|
|
Clamps proposals to |
CosinetauEmbedding maps proposed tau values to a fixed-size embedding using cosine basis functions:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Number of cosine basis functions |
|
|
|
Output embedding dimension |
|
|
|
Scales input size (should match horizon) |
FPQRDistribution combines proposal + cosine embedding + output projection:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Quantile levels (None → 9) |
|
|
|
Number of target variables |
|
|
|
Latent dimension from backbone |
|
|
|
Forecast horizon |
|
|
|
Dropout in proposal network |
|
|
|
Significance level for interval bounds |
|
|
|
Huber transition parameter |
|
|
|
Cosine basis functions (0 to use |
|
|
|
|
from twiga.distributions.nn.fpquantile import FPQRDistribution
dist = FPQRDistribution(
n_quantiles=9,
num_outputs=1,
hidden_size=256,
horizon=48,
conf_level=0.05,
loss_fn="pinball",
)
# Training step: loss = quantile_loss + proposal_loss − entropy
loss, metric = dist.step(z, y, metric_fn, epoch)
# Inference: returns dict with loc, quantiles, quantile_levels, conf_level
result = dist.forecast(z)
# result["loc"] shape: (B, horizon, num_outputs)
# result["quantiles"] shape: (B, n_quantiles, horizon, num_outputs)
# result["quantile_levels"] shape: (B, n_quantiles, horizon) ← per-sample proposal
FPQR vs QR
QRDistribution predicts at a fixed, user-specified quantile grid. FPQRDistribution learns the quantile levels jointly with the forecast values, adapting the grid per sample. The proposal loss plus entropy regularisation encourages diverse, well-spread coverage. Use FPQR when you want the model to discover the most informative quantile levels automatically.
The training loss combines three terms:
where \(\mathbb{H}\) is the entropy of the proposed quantile distribution, acting as a diversity regulariser.
Conditional Residual Calibration (CRC)#
CRC heads (twiga/distributions/nn/residual_conformal.py) learn a point forecast \(\mu\) and a per-step scale \(\sigma\) end-to-end — no post-hoc conformal calibration step required. The scale MLP is trained to approximate the absolute residual \(|y - \mu|\), giving prediction intervals of the form \([\mu \pm z_\alpha \sigma]\) at inference time.
Two variants exist, differing only in how \(\mu\) is computed:
Class |
Backbone family |
Mu path |
|---|---|---|
|
MLPF, NHiTS, RNN |
Linear projection from latent |
|
MLPGAM, MLPGAF |
|
Loss function:
The joint training objective is:
The sigma calibration loss \(\mathcal{L}_\sigma\) is selected via sigma_loss_fn:
|
Formula |
Use when |
|---|---|---|
|
\(\alpha \cdot \text{MSE}(\sigma, \sqrt{r}) + (1-\alpha) \cdot \text{MAE}(\sigma, \sqrt{r})\) |
General purpose; variance-stabilised residual |
|
\(\alpha \cdot \text{MSE}(\sigma, r) + (1-\alpha) \cdot \text{MAE}(\sigma, r)\) |
General purpose; same \(\alpha\) as mu loss |
|
\(\frac{1}{2}\left(\frac{r^2}{\sigma^2} + \log \sigma^2\right)\) |
Heteroscedastic Gaussian NLL |
|
\(\frac{r}{\sigma} + \log(2\sigma)\) |
Robust to heavy-tailed residuals |
|
\((\sigma - r)^2\) |
Distribution-free MSE on residual magnitude |
|
\(\log(1 + (\sigma - r)^2)\) |
Robust MSE with log-damping of large errors |
where \(r = |y - \mu|\).
Two-stage training (MLPGAMCRC default, controlled by two_stage=True):
graph LR
S1["Stage 1\nBackbone + mu_layer\n(sigma_layer frozen)"] --> S2["Stage 2\nSigma MLP only\n(backbone frozen)"]
In stage 1 the backbone and mu path are updated via the combined \(\mathcal{L}_\mu + \mathcal{L}_\sigma\) loss. In stage 2 only sigma_layer parameters receive gradients — the backbone is frozen by Lightning’s toggle_optimizer. This prevents the scale head from distorting the point forecast.
from twiga.distributions.nn.residual_conformal import CRCDistribution, AdditiveCRCDistribution
# Standard backbone (MLPF, NHiTS)
crc = CRCDistribution(
num_target_output=1,
hidden_size=256, # injected from backbone.encode_size
forecast_horizon=48,
sigma_loss_fn="hybrid", # "hybrid" | "gaussian" | "laplace" | "mse"
alpha=0.1,
)
mu, sigma = crc(z) # z: (B, hidden_size)
loss = crc.get_log_likelihood(mu, sigma, y) # sigma calibration loss only (not point loss)
# Sigma-only step (stage 2 of two-stage training)
loss, metric = crc.step_sigma(z, y, metric_fn)
# Additive backbone (MLPGAM, MLPGAF) — mu_layer is Identity, no linear projection
add_crc = AdditiveCRCDistribution(
num_target_output=1,
hidden_dim=256, # internal sigma MLP width (independent of H*O)
forecast_horizon=48,
)
mu, sigma = add_crc(z) # z shape (B, H*O); mu = reshape(z), sigma from ResidualSigmaHead
CRC vs post-hoc conformal
Post-hoc conformal prediction (see Conformal Prediction) wraps any already-trained model and provides finite-sample coverage guarantees. CRC bakes the calibration objective directly into training — no separate calibration split needed — but the coverage guarantee is empirical rather than distribution-free. CRC typically produces tighter intervals when the signal-to-noise ratio is high.
ML Parametric Models#
Twiga provides two families of parametric ML models that output full distribution parameters rather than just point forecasts.
GaussCatBoost#
GAUSSCATBOOSTModel uses CatBoost’s built-in RMSEWithUncertainty loss, which jointly learns the mean μ and aleatoric scale σ in a single model. One CatBoostRegressor is trained per output step, and virtual ensemble snapshots provide epistemic uncertainty decomposition.
Model |
Name |
Config Class |
Output |
|---|---|---|---|
Gaussian CatBoost |
|
|
(μ, σ) per horizon step |
GAUSSCATBOOSTConfig extends CATBOOSTConfig with:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Model identifier. |
|
|
|
Overfitting detection strategy; active only when an eval set is provided to |
|
|
|
Iterations without improvement before early stopping triggers. |
|
|
|
Virtual ensemble snapshots for epistemic uncertainty estimation via |
from twiga.models.ml.gausscatboost_model import GAUSSCATBOOSTConfig
config = GAUSSCATBOOSTConfig(
iterations=500,
learning_rate=0.05,
task_type="CPU",
virtual_ensembles_count=20,
)
Uncertainty decomposition
predict_with_uncertainty() returns three components per output step: mean (predicted μ), knowledge_uncertainty (epistemic — variance across virtual ensemble snapshots), and data_uncertainty (aleatoric — the model’s second output, exp of predicted log-σ).
NGBoost#
NGBoost (Natural Gradient Boosting) fits all distribution parameters simultaneously via the natural gradient of a proper scoring rule. Unlike the two-stage GAUSSCATBOOSTModel, which fits mean and variance separately, NGBoost optimises both jointly. All models extend BaseNGBoostRegressor and train one NGBRegressor per flattened output column.
Model |
Name |
Config Class |
Distribution |
Best for |
|---|---|---|---|---|
NGBoost Normal |
|
|
Normal N(μ, σ²) |
Symmetric, unbounded targets |
NGBoost LogNormal |
|
|
LogNormal |
Strictly positive, right-skewed targets |
NGBoost Exponential |
|
|
Exponential (1/λ) |
Non-negative, memoryless processes |
NGBoost vs two-stage Gaussian
NGBoost jointly optimises μ and σ via natural gradient, producing better-calibrated intervals. The two-stage GAUSSCATBOOSTModel (RMSEWithUncertainty) is faster and scales better on very large datasets.
NGBOOSTNORMALModel#
Fits a Gaussian distribution N(μ, σ²) jointly optimising mean and standard deviation.
NGBOOSTNORMALConfig fields:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Model identifier. |
|
|
|
Domain identifier. |
|
|
|
Number of boosting rounds. |
|
|
|
Shrinkage applied to each boosting step. |
|
|
|
Fraction of training data used per round. |
|
|
|
Fraction of features sampled per tree. |
|
|
|
Random seed. |
|
|
|
Scoring rule for training objective. |
Output: (loc, scale) = (mean μ, standard deviation σ).
from twiga.models.ml.ngboostnormal_model import NGBOOSTNORMALConfig
config = NGBOOSTNORMALConfig(n_estimators=300, learning_rate=0.05)
NGBOOSTLOGNORMALModel#
Fits a LogNormal distribution, appropriate for strictly positive targets such as solar irradiance or non-negative electricity prices.
NGBOOSTLOGNORMALConfig shares all fields with NGBOOSTNORMALConfig with name="ngboostlognormal".
Output: (loc, scale) = (geometric mean exp(μ_log), log-space standard deviation σ_log).
Interpreting LogNormal output
loc is the geometric mean (the exponentiated log-space mean), not the arithmetic mean. To recover the arithmetic mean use exp(log(loc) + scale² / 2). For prediction intervals, exponentiate the log-space quantiles.
from twiga.models.ml.ngboostlognormal_model import NGBOOSTLOGNORMALConfig
config = NGBOOSTLOGNORMALConfig(n_estimators=300, score="CRPScore")
NGBOOSTEXPONENTIALModel#
Fits an Exponential distribution parameterised by its scale (1/λ). Suitable for non-negative targets with a memoryless property.
NGBOOSTEXPONENTIALConfig shares all fields with NGBOOSTNORMALConfig with name="ngboostexponential".
Output: Both loc and scale equal the predicted scale parameter 1/λ (the Exponential distribution has a single parameter).
Single-parameter distribution
The Exponential distribution is fully characterised by its scale (1/λ), so both loc and scale in the output tuple carry the same value. Mean and standard deviation are both equal to 1/λ.
from twiga.models.ml.ngboostexponential_model import NGBOOSTEXPONENTIALConfig
config = NGBOOSTEXPONENTIALConfig(n_estimators=400)
LogScore vs CRPScore
LogScore maximises the predictive log-likelihood — faster and often better-calibrated for in-distribution data. CRPScore (Continuous Ranked Probability Score) is more robust to distributional shift and outliers but typically slower to converge.
ML Distribution Utilities#
The twiga/distributions/ml/utils.py module provides utilities for post-processing quantile predictions from ML models.
interpolate_quantile()#
Linearly interpolates to extract a specific quantile from a set of predicted quantiles.
from twiga.distributions.ml.utils import interpolate_quantile
# predictions shape: (batch, num_quantiles, num_targets)
q30 = interpolate_quantile(
predictions=predictions,
sorted_quantiles=[0.05, 0.25, 0.5, 0.75, 0.95],
target_quantile=0.3,
)
# Output shape: (batch, num_targets)
get_median_prediction()#
Extracts the median prediction (quantile 0.5).
from twiga.distributions.ml.utils import get_median_prediction
median = get_median_prediction(predictions, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])
get_sigma_prediction()#
Estimates the standard deviation from the interquartile range:
from twiga.distributions.ml.utils import get_sigma_prediction
sigma = get_sigma_prediction(predictions, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])
Note
The constant 0.6745 is the 75th percentile of the standard normal distribution. This conversion assumes approximate normality of the residuals.
How Distributions Connect to Models#
graph TD
subgraph "Point Models"
A[CatBoost / XGBoost / LightGBM / MLPF / NHITS] --> B["predict() → loc array"]
end
subgraph "Parametric NN Models"
C["ProbabilisticModel\n(backbone + head)"] --> D["predict() → (loc, scale, …)"]
D --> E["NormalDistribution / LaplaceDistribution\nGammaDistribution / BetaDistribution / …"]
end
subgraph "Quantile Models"
G[QRCatBoost / QRXGBoost / QRLightGBM] --> H["predict() → quantile array"]
I["MLPFQR / NHITSQRModel / …"] --> H
H --> J[QRDistribution / FPQRDistribution]
end
subgraph "ML Parametric (Gaussian)"
K["GAUSSCATBOOSTModel"] --> L["predict() → (μ, σ)"]
end
B --> CP[Conformal Prediction]
D --> CP
H --> CP
L --> CP
CP --> PI[Prediction Intervals]
Point models produce a single prediction per time step and can be wrapped with split conformal prediction.
Parametric NN models (via ProbabilisticModel) produce full distribution parameters and train via NLL. They can be used stand-alone or combined with conformal residual fitting.
Quantile models produce multiple quantile predictions and can be used with conformal quantile regression or taken directly as prediction intervals.
ML Usage Example#
import pandas as pd
from sklearn.preprocessing import StandardScaler
from twiga.core.config import DataPipelineConfig, ForecasterConfig
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.gausscatboost_model import GAUSSCATBOOSTConfig
from twiga.models.ml.ngboostnormal_model import NGBOOSTNORMALConfig
data_config = DataPipelineConfig(
target_feature="load_mw",
period="1h",
lookback_window_size=168,
forecast_horizon=24,
lags=[1, 24, 168],
input_scaler=StandardScaler(),
)
train_config = ForecasterConfig(
split_freq="months",
train_size=6,
test_size=1,
window="expanding",
project_name="GaussianForecast",
)
# Gaussian CatBoost — fast, single model via RMSEWithUncertainty
gauss_config = GAUSSCATBOOSTConfig(task_type="CPU", virtual_ensembles_count=10)
# NGBoost Normal — joint optimisation of μ and σ
ngboost_config = NGBOOSTNORMALConfig(n_estimators=300, learning_rate=0.05)
forecaster = TwigaForecaster(
data_params=data_config,
model_params=[gauss_config],
train_params=train_config,
)
forecaster.fit(train_df=train_df)
interval_dict, _ = forecaster.predict_interval(test_df=test_df)
for model_name, (lower, point, upper) in interval_dict.items():
print(f"{model_name}: lower={lower.shape}, point={point.shape}, upper={upper.shape}")
API Reference#
Parametric distribution base#
- class twiga.distributions.nn.parametric.BaseDistribution(num_target_output, hidden_size, forecast_horizon)#
-
Abstract base for probabilistic output heads.
Subclasses must implement
forward(),get_distribution(), andget_log_likelihood().- All heads share the same input contract:
x: (B, hidden_size) - latent representation from encoder/decoder.
- All heads share the same output contract for forecasting tensors:
shape (B, forecast_horizon, num_target_output).
- forecast(z)#
Inference-mode forward (no gradient).
- abstractmethod get_distribution(*params)#
Construct a torch Distribution from predicted parameters.
- Return type:
- abstractmethod get_log_likelihood(*params, targets)#
Return mean negative log-likelihood over the batch.
- Return type:
- step(z, y, metric_fn, epoch=None)#
Compute NLL loss and metric for one training/validation step.
- Parameters:
z (
Tensor) – Latent tensor of shape(B, hidden_size)from the backbone.y (
Tensor) – Target tensor of shape(B, forecast_horizon, num_target_output).metric_fn (
Callable[...,Any]) – Callable returning a scalar metric given(pred, target).epoch (
int|None) – Current epoch (unused here; available for subclasses).
- Return type:
- Returns:
Tuple of
(nll_loss, metric).
Parametric heads — standard backbones#
- class twiga.distributions.nn.parametric.NormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionNormal (Gaussian) output head.
Best for: symmetric, unbounded targets - energy demand, temperature.
- Parameters predicted:
mu - mean (unconstrained) sigma - std dev (exp of log_scale, strictly positive)
- Parameters:
- forward(x)#
Predict mu and sigma.
- get_distribution(mu, sigma)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.LaplaceDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionLaplace output head.
Best for: targets with heavier tails than Normal, robust to outliers - electricity prices, wind speed residuals.
- Parameters predicted:
mu - location (unconstrained) scale - scale (exp of log_scale, strictly positive)
- forward(x)#
Predict mu and scale.
- get_distribution(mu, scale)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.LogNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionLog-Normal output head.
Best for: strictly positive, right-skewed targets - renewable generation, gas prices, load with zero floor.
- Parameters predicted:
mu - mean of log(target) (unconstrained) sigma - std of log(target) (exp of log_scale, strictly positive)
Note
Targets must be strictly positive. Apply a small epsilon shift in the data pipeline if zeros are possible (e.g., target = max(y, 1e-6)).
- forward(x)#
Predict mu and sigma of the underlying Normal (in log space).
- get_distribution(mu, sigma)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.GammaDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48)#
Bases:
BaseDistributionGamma output head.
Best for: strictly positive targets with flexible skew - solar irradiance, wind power, load at aggregate level.
- Parameters predicted:
concentration - shape parameter α (softplus, strictly positive) rate - rate parameter β (softplus, strictly positive)
Note
Mean of Gamma(α, β) = α/β. Targets must be strictly positive.
- forward(x)#
Predict concentration and rate.
- get_distribution(concentration, rate)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.BetaDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48)#
Bases:
BaseDistributionBeta output head.
Best for: bounded [0, 1] targets - capacity factors, state of charge, fill rates, normalised demand ratios.
- Parameters predicted:
alpha - first shape parameter (softplus, strictly positive) beta - second shape parameter (softplus, strictly positive)
Note
Targets must lie strictly in (0, 1). Apply a clamp in the data pipeline if boundary values are possible:
target = target.clamp(1e-6, 1 - 1e-6)- forward(x)#
Predict alpha and beta.
- get_distribution(alpha, beta)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.StudentTDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, min_df=2.1)#
Bases:
BaseDistributionStudent-T output head.
Best for: targets with very heavy tails and potential outliers - financial returns, spot electricity prices.
- Parameters predicted:
mu - location (unconstrained) sigma - scale (exp of log_scale, strictly positive) df - degrees of freedom (softplus, clipped to ≥ 2 for finite variance)
Note
Degrees of freedom are predicted per-sample but averaged across the forecast horizon so the model learns a single df per series.
- forecast(z)#
Inference-mode forward (no gradient).
- forward(x)#
Predict mu, sigma, and degrees of freedom.
- get_distribution(mu, sigma, df)#
Construct a torch Distribution from predicted parameters.
- Return type:
Parametric heads — additive backbones (MLPGAM / MLPGAF)#
These variants receive the pre-summed additive mean directly from the backbone instead of a latent vector, preserving the GAM decomposition.
- class twiga.distributions.nn.parametric.AdditiveNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionNormal output head preserving additive backbone structure.
For use with
MLPGAMNetworkandMLPGAFNetwork, whoseencode()already returns the additive mean directly as a flat(B, H*O)vector. The mean is taken as-is (no projection) to honour the additive decomposition; only the scale is learned via a separate linear layer.- Parameters predicted:
mu - additive mean (reshaped directly from latent, no projection) sigma - std dev (exp of log_scale, strictly positive)
- Parameters:
- forward(x)#
Predict mu and sigma.
- get_distribution(mu, sigma)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.AdditiveLaplaceDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionLaplace output head preserving additive backbone structure.
Analogous to
AdditiveNormalDistributionbut with Laplace tails. Best for: heavy-tailed, outlier-robust targets - electricity prices, wind residuals.- Parameters predicted:
mu - additive location (reshaped directly from latent, no projection) scale - scale (exp of log_scale, strictly positive)
- forward(x)#
Predict mu and scale.
- get_distribution(mu, scale)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.AdditiveLogNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionLog-Normal output head preserving additive backbone structure.
Best for: strictly positive, right-skewed targets - renewable generation, gas prices.
- Parameters predicted:
mu - additive log-space mean (reshaped directly from latent, no projection) sigma - std of log(target) (exp of log_scale, strictly positive)
Note
Targets must be strictly positive.
- forward(x)#
Predict mu and sigma of the underlying Normal (in log space).
- get_distribution(mu, sigma)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.AdditiveStudentTDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, min_df=2.1)#
Bases:
BaseDistributionStudent-T output head preserving additive backbone structure.
Best for: very heavy tails - spot electricity prices, financial returns.
- Parameters predicted:
mu - additive location (reshaped directly from latent, no projection) sigma - scale (exp of log_scale, strictly positive) df - degrees of freedom (softplus, clipped to ≥ min_df for finite variance)
- forecast(z)#
Inference-mode forward (no gradient).
- forward(x)#
Predict mu, sigma, and degrees of freedom.
- Parameters:
x (
Tensor) – Latent tensor of shape (B, hidden_size) - the additive location from the backbone.- Return type:
- Returns:
Tuple of (mu, sigma, df) – - mu, sigma: shape (B, forecast_horizon, num_target_output) - df: shape (B, 1, num_target_output), broadcast-compatible
- get_distribution(mu, sigma, df)#
Construct a torch Distribution from predicted parameters.
- Return type:
Quantile regression distribution#
- class twiga.distributions.nn.quantile.QRDistribution(quantiles=None, num_outputs=1, hidden_size=256, horizon=48, eps=1e-06, kappa=0.5, output_activation=None, conf_level=0.05, loss_fn='pinball', crossing_penalty=10.0)#
Bases:
ModuleQRNetwork is a neural network for forecasting quantiles using a quantile value network.
- Parameters:
quantiles (
list[float] |None) – List of quantiles to forecast. Default is None.num_outputs (
int) – Number of output features. Default is 1.latent_size (int) – Size of the hidden layers. Default is 256.
horizon (
int) – Number of time steps to forecast. Default is 48.crossing_penalty (
float) – Penalaise crossing quantile and ensure monotonicity.out_activation_function (nn.Module) – Activation function to use in the output layer. Default is nn.Identity().
- forecast(input_tensor)#
Forecast quantiles for the given input tensor.
- forward(x)#
Forward pass producing raw (unconstrained) quantile values.
Outputs K quantile predictions directly from the linear head without structural monotonicity enforcement. Non-crossing is encouraged during training via the
crossing_penaltyterm instep():loss = pinball_loss + λ · non_crossing_loss(Q̂)
where
non_crossing_losspenalises any pair (k, k+1) where Q̂(τₖ) > Q̂(τₖ₊₁). Thequantile_value_layeris initialised with small uniform weights to reduce crossing violations at the start of training.- Parameters:
x (torch.Tensor) – Latent features of shape (B, hidden_size).
- Returns:
torch.Tensor –
- Estimated quantile values Q(τⱼ | x),
shape (B, K, H, D) where K = len(self.taus).
- step(z, y, metric_fn, epoch=None)#
Perform a single training/validation step.
- Parameters:
z (
Tensor) – Latent tensor of shape(B, hidden_size)from the backbone.y (
Tensor) – Target tensor of shape(B, forecast_horizon, num_outputs).metric_fn (
Callable[...,Any]) – Callable returning a scalar metric given(pred, target).epoch (
int|None) – Current epoch (unused; accepted for interface consistency).
- Return type:
- Returns:
Tuple of
(quantile_loss, metric).
- twiga.distributions.nn.quantile.get_median_quantile(quantile_hats, probs)#
Compute the median (0.5 quantile) from a quantile tensor of shape (B, N, T, C).
- Parameters:
- Return type:
- Returns:
torch.Tensor – Median values of shape (B, T, C).
- Raises:
ValueError – If input shapes are invalid or interpolation is not possible.
FPQR components#
- class twiga.distributions.nn.fpquantile.QuantileProposal(n_quantiles=10, z_dim=64, dropout=0.1, conf_level=0.05, n_outputs=1, crossing_penalty=10.0)#
Bases:
ModuleA neural network module for proposing confidence quantiles.
This module generates quantile estimates from input features using a linear layer, dropout, and softmax normalization. It computes cumulative probabilities (taus), midpoints (tau_hats), and entropies, ensuring quantiles stay within specified bounds based on a significance level (conf_level).
- Variables:
n_quantiles – Number of quantiles to propose.
z_dim – Dimensionality of the input features.
conf_level – Significance level tensor for quantile bounds.
net – Linear layer transforming input features to quantile logits.
dropout – Dropout layer for regularization.
tau_0 – Initial tau value buffer (zeros).
- Parameters:
- __init__(n_quantiles=10, z_dim=64, dropout=0.1, conf_level=0.05, n_outputs=1, crossing_penalty=10.0)#
Initialize the QuantileProposal module.
- Parameters:
n_quantiles (
int) – Number of quantiles to propose (default: 10).z_dim (
int) – Dimensionality of the input features (default: 64).dropout (
float) – Dropout rate for regularization (default: 0.1).conf_level (
float) – Significance level for quantile estimation (default: 0.05).n_outputs (
int) – num of output dimension.crossing_penalty (
float) – Penalaise crossing quantile and ensure monotonicity.
- Raises:
ValueError – If n_quantiles <= 0, z_dim <= 0, dropout < 0, or conf_level not in (0, 1).
- forward(z)#
Perform a forward pass to compute quantiles and related metrics.
- Parameters:
z (
Tensor) – Input tensor of shape (batch_size, z_dim).- Return type:
- Returns:
Tuple containing – - taus: Cumulative probabilities, shape (batch_size, n_quantiles + 1, 1). - tau_hats: Midpoint quantiles, shape (batch_size, n_quantiles, 1). - entropies: Entropy of the probability distributions, shape (batch_size, 1).
- Raises:
ValueError – If z does not have the expected shape (batch_size, z_dim) or if internal tensor shapes do not match expected dimensions.
- class twiga.distributions.nn.fpquantile.CosinetauEmbedding(num_cosines=32, z_dim=128, num_outputs=48)#
Bases:
ModuleA PyTorch module for embedding time values using cosine transformations.
This module transforms time values into cosine-based features and embeds them into a higher-dimensional space using a linear layer and activation function.
- Parameters:
- class twiga.distributions.nn.fpquantile.FPQRDistribution(n_quantiles=9, num_outputs=1, hidden_size=256, horizon=48, dropout=0.1, conf_level=0.05, kappa=0.25, num_cosines=32, output_activation=None, loss_fn='pinball')#
Bases:
ModuleA neural network for forecasting quantiles using a quantile value network.
This module predicts quantiles for a forecasting task, combining a quantile proposal layer with a linear output layer to produce quantile forecasts.
- Parameters:
n_quantiles (
int|None) – Number of quantiles to forecast. If None, defaults to 9.num_outputs (
int) – Number of output features. Defaults to 1.hidden_dim (int) – Size of the hidden layers. Defaults to 256.
horizon (
int) – Number of time steps to forecast. Defaults to 48.dropout (
float) – Dropout rate for the quantile proposal layer. Defaults to 0.1.conf_level (
float) – Confidence level for quantile proposal. Defaults to 0.05.output_activation (
Module|None) – Activation function for the output layer. Defaults to nn.Identity().
- forecast(input_tensor)#
Forecast quantiles for the given input tensor.
quantile_levelsis the per-sample proposal grid averaged across the batch and horizon dimensions so that downstream consumers receive a fixed 1-D array of representative probability levels, matching the interface expected byForecastResult.- Parameters:
input_tensor (
Tensor) – Input tensor of shape (batch_size, z_dim).- Return type:
- Returns:
Dict with keys –
"loc": expected value (weighted sum of quantiles), shape(B, horizon, num_outputs)."quantiles": quantile forecasts, shape(B, n_quantiles, horizon, num_outputs)."quantile_levels": representative 1-D numpy array of lengthn_quantiles(mean oftau_hatsover batch and horizon)."conf_level": significance level scalar.
- forward(input_tensor)#
Forward pass of the quantile forecasting network.
- step(z, y, metric_fn, epoch=None)#
Perform a single training/validation step.
- Parameters:
z (
Tensor) – Latent tensor of shape(B, hidden_size)from the backbone.y (
Tensor) – Target tensor of shape(B, forecast_horizon, num_outputs).metric_fn (
Callable[...,Any]) – Callable returning a scalar metric given(pred, target).epoch (
int|None) – Current epoch (unused; accepted for interface consistency).
- Return type:
- Returns:
Tuple of
(loss, metric).
CRC distribution classes#
- class twiga.distributions.nn.residual_conformal.CRCDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, sigma_loss_fn='hybrid', alpha=0.1, activation='ReLU')#
Bases:
BaseDistributionCRC head for standard backbones (e.g. MLPFNetwork).
The mean is computed by projecting the latent vector z through a linear layer, then reshaping to (B, H, O). The scale is predicted by a sigma layer applied to the detached mean (flattened), ensuring that gradients do not flow back into the backbone during sigma-only training.
- Training follows a two-stage protocol:
Stage 1 —
step(): optimise μ (backbone + mu_layer), σ is not updated. Stage 2 —step_sigma(): freeze backbone, optimise σ-head only.
- Parameters:
num_target_output (
int) – Number of output features per time step.hidden_dim – Dimensionality of the backbone latent vector.
forecast_horizon (
int) – Number of forecast time steps.out_activation_function (
Module|None) – Optional activation applied to the mean output.sigma_loss_fn (
str) – Calibration objective for σ. One of_SIGMA_LOSS_FNS. Default:"hybrid".alpha (
float) – Weight for MSE vs L1 in the mu and hybrid sigma losses.activation (
str) – Unused; kept for API compatibility.
- Raises:
ValueError – If
sigma_loss_fnis not one of the supported values.
- forecast(z)#
Inference-only prediction (no gradients).
For
"hybrid_sqrt"the scale is squared to convert from√|r|space back to residual magnitude space.
- forward(x)#
Predict mean and scale.
- get_distribution(mu, sigma)#
Construct a torch distribution for interval generation.
Returns
Laplace(μ, σ)for"laplace"andNormal(μ, σ)for all other modes, treating σ as an empirical scale estimate. For"hybrid_sqrt"callers must pass the forecast-adjusted (squared) σ.
- get_log_likelihood(mu, sigma, targets)#
Compute the sigma calibration loss (NLL for probabilistic modes).
- step(z, y, metric_fn, epoch=None)#
Stage-1 training step: optimise mean parameters.
- step_sigma(z, y, metric_fn)#
Stage-2 training step: optimise sigma with frozen backbone.
The mean is recomputed under
no_gradso sigma learns to predict residual magnitude without influencing point predictions.
- class twiga.distributions.nn.residual_conformal.AdditiveCRCDistribution(num_target_output=1, hidden_size=None, hidden_dim=256, forecast_horizon=48, out_activation_function=None, sigma_loss_fn='hybrid_sqrt', alpha=0.5, sigma_dropout=0.05, activation='SiLU')#
Bases:
CRCDistributionCRC head for additive backbones (MLPGAMNetwork, MLPGAFNetwork).
Inherits all training and inference logic from
CRCDistribution. Two architectural differences from the parent:mu_layer — identity (or optional activation): the backbone’s
encode()already returns the additive mean as(B, H*O), so no linear projection is needed.sigma_layer —
ResidualSigmaHead(two-layer MLP + LayerNorm + Dropout): a deeper network better suited to modeling conditional heteroscedasticity from the additive mean signal.
All other methods —
step,step_sigma,forecast,get_distribution,get_log_likelihood— are inherited unchanged.- Parameters:
num_target_output (
int) – Number of output features per time step.hidden_dim (
int) – Hidden dimension of the sigma MLP. Independent ofH*O.forecast_horizon (
int) – Number of forecast time steps.out_activation_function (
Module|None) – Optional activation applied to the mean.sigma_loss_fn (
str) – Calibration objective (seeCRCDistribution). Default:"hybrid_sqrt".alpha (
float) – MSE/L1 weight. Default:0.5.sigma_dropout (
float) – Dropout rate in the sigma MLP.activation (
str) – Activation name for the sigma MLP. Default:"SiLU".
- Raises:
ValueError – If
sigma_loss_fnis not one of the supported values.
Custom loss functions#
- class twiga.distributions.nn.custom_loss.QuantileLoss(kappa=0.0, reduction='mean')#
Bases:
ModuleQuantile loss module for quantile regression.
Computes the pinball loss between predicted quantiles and target values.
- Variables:
kappa – Smoothing parameter for softplus approximation.
reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.
- Parameters:
- __init__(kappa=0.0, reduction='mean')#
Initialize the QuantileLoss module.
- Parameters:
- Raises:
ValueError – If kappa < 0 or reduction is invalid.
- forward(inputs, quantiles, targets)#
Compute the quantile loss.
- Parameters:
- Return type:
- Returns:
The computed quantile loss tensor, reduced according to the specified method.
- class twiga.distributions.nn.custom_loss.QuantileHuberLoss(kappa=1.0, eps=1e-08, reduction='mean')#
Bases:
ModuleQuantile Huber loss module for robust quantile regression.
Computes the Huber loss adjusted for quantiles, less sensitive to outliers than L2 loss.
- Variables:
kappa – Threshold for switching between L2 and L1 loss.
eps – Small value to prevent division by zero.
reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.
- Parameters:
- __init__(kappa=1.0, eps=1e-08, reduction='mean')#
Initialize the QuantileHuberLoss module.
- Parameters:
- Raises:
ValueError – If kappa <= 0 or eps <= 0.
- forward(inputs, quantiles, targets)#
Compute the quantile Huber loss.
- class twiga.distributions.nn.custom_loss.QuantileProposalLoss(reduction='none')#
Bases:
ModuleQuantile proposal loss module to ensure non-crossing quantiles.
Computes a loss to enforce that predicted quantiles adhere to specified levels and do not cross.
- Variables:
reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.
- Parameters:
reduction (
str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘none’).
- __init__(reduction='none')#
Initialize the QuantileProposalLoss module.
- Parameters:
reduction (
str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘none’).
- forward(quantile, quantile_hats, taus)#
Compute the quantile proposal loss.
- Parameters:
- Return type:
- Returns:
The computed quantile proposal loss tensor, reduced according to the specified method.
ML parametric models#
- class twiga.models.ml.gausscatboost_model.GAUSSCATBOOSTConfig(**data)#
Bases:
CATBOOSTConfigConfiguration for the Gaussian CatBoost probabilistic model.
Extends
CATBOOSTConfigwith:Tighter hyperparameter search bounds for faster convergence.
od_type/od_waitfor built-in overfitting detection.virtual_ensembles_countto control the number of virtual ensemble snapshots used to estimate epistemic uncertainty.
- Variables:
name – Fixed to
"gausscatboost".od_type – Overfitting detection strategy (
"Iter"or"IncToDec"). Active only when an eval set is provided toGAUSSCATBOOSTModel.fit().od_wait – Number of iterations without improvement before early stopping triggers.
virtual_ensembles_count – Number of virtual ensemble snapshots used by
predict_with_uncertainty(). Not passed toCatBoostRegressor(excluded frommodel_dump).
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- search_space: BaseSearchSpace#
- class twiga.models.ml.gausscatboost_model.GAUSSCATBOOSTModel(model_config=None)#
Bases:
BaseRegressorProbabilistic CatBoost model predicting mean and uncertainty.
Uses
loss_function="RMSEWithUncertainty"which maximises the log-likelihood of a Normal distribution, jointly learning the mean μ and aleatoric scale σ in a single model.For multi-horizon forecasting (H output steps), one
CatBoostRegressoris trained per output, yielding H models. This avoidsMultiOutputRegressorwhile preserving support forvirtual_ensembles_predict.Uncertainty decomposition via
predict_with_uncertainty()returns three components per output step:mean- predicted mean (same aspredict()μ output).knowledge_uncertainty- epistemic (model) uncertainty estimated from variance across virtual ensemble snapshots.data_uncertainty- aleatoric uncertainty encoded in the model’s second output (exp of predicted log-σ).
- Parameters:
model_config (
GAUSSCATBOOSTConfig|None) – Model configuration. Defaults toGAUSSCATBOOSTConfig.
Example:
model = GAUSSCATBOOSTModel() model.fit(X_train, y_train) mu, sigma = model.predict(X_test) unc = model.predict_with_uncertainty(X_test) # (B, L, H, 3)
- fit(X, y, eval_set=None, verbose=False)#
Fit one
RMSEWithUncertaintymodel per output step.- Parameters:
X (
ndarray) – Shape(B, L, F)- batch × sequence × features.y (
ndarray) – Shape(B, L, H)- batch × sequence × horizons.eval_set (
tuple[ndarray,ndarray] |None) – Optional(X_val, y_val)for early stopping. When provided,od_type/od_waitfrom the config activate CatBoost’s overfitting detector.verbose (
bool) – Whether to print CatBoost training logs.
- Return type:
- Returns:
Self for method chaining.
- Raises:
ValueError – If
Xoryare not 3-dimensional.
- predict(X)#
Predict mean (μ) and scale (σ) for all output steps.
- Parameters:
X (
ndarray) – Shape(B, L, F).- Return type:
- Returns:
Tuple
(mu, sigma)each of shape(B, L, H).sigma = exp(log_sigma)is guaranteed positive.- Raises:
ValueError – If the model has not been fitted or X is not 3-D.
- predict_with_uncertainty(X)#
Return mean, epistemic, and aleatoric uncertainty per output.
Uses
virtual_ensembles_predict(prediction_type="TotalUncertainty")which - for models trained withRMSEWithUncertainty- returns three values per sample:Column 0: mean prediction (μ).
Column 1: knowledge (epistemic) uncertainty - variance across virtual ensemble snapshots.
Column 2: data (aleatoric) uncertainty - derived from the predicted log-σ output.
- Parameters:
X (
ndarray) – Shape(B, L, F).- Return type:
- Returns:
Array of shape
(B, L, H, 3)- last axis is[mean, knowledge_uncertainty, data_uncertainty].- Raises:
ValueError – If the model has not been fitted.
- set_fit_request(*, eval_set='$UNCHANGED$', verbose='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
eval_setparameter infit.- verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
verboseparameter infit.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns#
- selfobject
The updated object.
- update(trial)#
Rebuild model from an Optuna trial’s suggested hyperparameters.
- Parameters:
trial – Optuna trial object.
- Return type:
- class twiga.models.ml.ngboostnormal_model.NGBOOSTNORMALConfig(**data)#
Bases:
BaseModelConfigConfiguration for the NGBoost Normal probabilistic forecasting model.
Uses natural gradient boosting with a Gaussian predictive distribution N(μ, σ²). Unlike the two-stage GAUSS* models, NGBoost jointly optimises μ and σ via the natural gradient of the chosen scoring rule, which tends to produce better-calibrated uncertainty estimates.
- Variables:
name – Model identifier fixed to
"ngboostnormal".domain – Domain fixed to
"ml". Excluded from tuning.n_estimators – Number of boosting iterations.
learning_rate – Shrinkage applied to each tree.
minibatch_frac – Row-subsample fraction per iteration.
col_sample – Column-subsample fraction per iteration.
random_state – Seed for reproducibility.
score – Scoring rule -
"LogScore"(MLE) or"CRPScore".search_space – Hyperparameter search space for Optuna tuning.
Example
>>> from twiga.models.ml import NGBOOSTNORMALConfig >>> cfg = NGBOOSTNORMALConfig(n_estimators=200, learning_rate=0.05)
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- search_space: BaseSearchSpace#
- class twiga.models.ml.ngboostnormal_model.NGBOOSTNORMALModel(model_config=None)#
Bases:
BaseNGBoostRegressorNGBoost probabilistic model with a Normal (Gaussian) predictive distribution.
Wraps
ngboost.NGBRegressorwithDist=Normal, training one regressor per flattened output column to support multi-horizon and multi-target forecasting.The model jointly learns the conditional mean μ and standard deviation σ via natural gradient boosting on the selected scoring rule (log-likelihood or CRPS). Unlike
GAUSSCATBOOSTModel, which uses CatBoost’s nativeRMSEWithUncertaintyloss, NGBoost directly maximises the scoring rule via natural gradients.- Variables:
model_config – Instance of
NGBOOSTNORMALConfig.- Parameters:
model_config (
NGBOOSTNORMALConfig|None) – Model configuration. Defaults toNGBOOSTNORMALConfig.
Example
>>> import numpy as np >>> from twiga.models.ml import NGBOOSTNORMALModel >>> model = NGBOOSTNORMALModel() >>> X = np.random.randn(100, 10, 5) >>> y = np.random.randn(100, 48, 1) >>> model.fit(X, y) >>> loc, scale = model.predict(X) >>> loc.shape, scale.shape ((100, 48, 1), (100, 48, 1))
- forecast(x)#
Return Normal distribution parameters.
- predict(X)#
Predict μ and σ for each forecast horizon.
- set_fit_request(*, eval_set='$UNCHANGED$', verbose='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
eval_setparameter infit.- verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
verboseparameter infit.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns#
- selfobject
The updated object.
- class twiga.models.ml.ngboostlognormal_model.NGBOOSTLOGNORMALConfig(**data)#
Bases:
BaseModelConfigConfiguration for the NGBoost LogNormal probabilistic forecasting model.
Uses natural gradient boosting with a log-normal predictive distribution. Suitable for strictly positive targets such as solar irradiance, wind speed, electricity price, or non-negative load.
Follows the
scipy.stats.lognormparameter convention:scale = exp(μ_log)- geometric mean (location-like quantity).s = σ_log- standard deviation in log-space (shape parameter).
- Variables:
name – Model identifier fixed to
"ngboostlognormal".domain – Domain fixed to
"ml". Excluded from tuning.n_estimators – Number of boosting iterations.
learning_rate – Shrinkage applied to each tree.
minibatch_frac – Row-subsample fraction per iteration.
col_sample – Column-subsample fraction per iteration.
random_state – Seed for reproducibility.
score – Scoring rule -
"LogScore"(MLE) or"CRPScore".search_space – Hyperparameter search space for Optuna tuning.
Example
>>> from twiga.models.ml import NGBOOSTLOGNORMALConfig >>> cfg = NGBOOSTLOGNORMALConfig(n_estimators=300, score="CRPScore")
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- search_space: BaseSearchSpace#
- class twiga.models.ml.ngboostlognormal_model.NGBOOSTLOGNORMALModel(model_config=None)#
Bases:
BaseNGBoostRegressorNGBoost probabilistic model with a LogNormal predictive distribution.
Wraps
ngboost.NGBRegressorwithDist=LogNormal, training one regressor per flattened output column for multi-horizon / multi-target support.Parameter convention (follows
scipy.stats.lognorm):loc(exposed as"loc") -scale = exp(μ_log), the geometric mean.scale(exposed as"scale") -s = σ_log, the log-space std-dev.
Use this model when the target variable is strictly positive and its logarithm is approximately Gaussian (e.g. solar irradiance, wind power, non-negative energy prices).
- Variables:
model_config – Instance of
NGBOOSTLOGNORMALConfig.- Parameters:
model_config (
NGBOOSTLOGNORMALConfig|None) – Model configuration. Defaults toNGBOOSTLOGNORMALConfig.
Example
>>> import numpy as np >>> from twiga.models.ml import NGBOOSTLOGNORMALModel >>> model = NGBOOSTLOGNORMALModel() >>> X = np.random.randn(100, 10, 5) >>> y = np.abs(np.random.randn(100, 48, 1)) + 0.1 # strictly positive >>> model.fit(X, y) >>> loc, scale = model.predict(X) >>> loc.shape, scale.shape ((100, 48, 1), (100, 48, 1))
- forecast(x)#
Return LogNormal distribution parameters.
- predict(X)#
Predict geometric mean and log-space std-dev for each forecast horizon.
- Parameters:
X (
ndarray) – Input features of shape(n_samples, seq_len, n_features).- Return type:
- Returns:
Tuple
(loc, scale)where –loc = exp(μ_log)- geometric mean (strictly positive), shape(n_samples, horizon, n_targets).scale = σ_log- standard deviation in log-space, shape(n_samples, horizon, n_targets).
- set_fit_request(*, eval_set='$UNCHANGED$', verbose='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
eval_setparameter infit.- verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
verboseparameter infit.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns#
- selfobject
The updated object.
- class twiga.models.ml.ngboostexponential_model.NGBOOSTEXPONENTIALConfig(**data)#
Bases:
BaseModelConfigConfiguration for the NGBoost Exponential probabilistic forecasting model.
Uses natural gradient boosting with an exponential predictive distribution. The exponential distribution has a single parameter
scale = 1 / λ, where λ is the rate. Both the mean and the standard deviation equalscale.Suited for non-negative targets exhibiting exponential decay or inter-arrival times - e.g. rare demand events, sparse consumption spikes, or inter-event durations in energy systems.
- Variables:
name – Model identifier fixed to
"ngboostexponential".domain – Domain fixed to
"ml". Excluded from tuning.n_estimators – Number of boosting iterations.
learning_rate – Shrinkage applied to each tree.
minibatch_frac – Row-subsample fraction per iteration.
col_sample – Column-subsample fraction per iteration.
random_state – Seed for reproducibility.
score – Scoring rule -
"LogScore"(MLE) or"CRPScore".search_space – Hyperparameter search space for Optuna tuning.
Example
>>> from twiga.models.ml import NGBOOSTEXPONENTIALConfig >>> cfg = NGBOOSTEXPONENTIALConfig(n_estimators=400, score="CRPScore")
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- search_space: BaseSearchSpace#
- class twiga.models.ml.ngboostexponential_model.NGBOOSTEXPONENTIALModel(model_config=None)#
Bases:
BaseNGBoostRegressorNGBoost probabilistic model with an Exponential predictive distribution.
Wraps
ngboost.NGBRegressorwithDist=Exponential, training one regressor per flattened output column for multi-horizon / multi-target support.The exponential distribution has one parameter,
scale = 1 / λ, which equals both the mean and the standard deviation. Both"loc"and"scale"in the returned forecast dict are set to this parameter.Use this model for non-negative targets with memoryless, decay-like behaviour - inter-arrival durations, sparse demand spikes, or short-term outage durations.
- Variables:
model_config – Instance of
NGBOOSTEXPONENTIALConfig.- Parameters:
model_config (
NGBOOSTEXPONENTIALConfig|None) – Model configuration. Defaults toNGBOOSTEXPONENTIALConfig.
Example
>>> import numpy as np >>> from twiga.models.ml import NGBOOSTEXPONENTIALModel >>> model = NGBOOSTEXPONENTIALModel() >>> X = np.random.randn(100, 10, 5) >>> y = np.random.exponential(scale=2.0, size=(100, 48, 1)) >>> model.fit(X, y) >>> loc, scale = model.predict(X) >>> loc.shape, scale.shape ((100, 48, 1), (100, 48, 1))
- forecast(x)#
Return Exponential distribution parameters.
- Parameters:
x (
ndarray) – Input features of shape(n_samples, seq_len, n_features).- Return type:
- Returns:
Dictionary with –
"loc": predictedscale = 1/λ(the mean), always > 0, shape(n_samples, horizon, n_targets)."scale": same as"loc"- for the exponential distribution, mean and std-dev are identical.
- predict(X)#
Predict the exponential rate parameter for each forecast horizon.
- Parameters:
X (
ndarray) – Input features of shape(n_samples, seq_len, n_features).- Return type:
- Returns:
Tuple
(loc, scale)where both arrays equal the predictedscale = 1 / λ(the mean and std-dev of the exponential distribution), each of shape(n_samples, horizon, n_targets).
- set_fit_request(*, eval_set='$UNCHANGED$', verbose='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
eval_setparameter infit.- verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
verboseparameter infit.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns#
- selfobject
The updated object.
ML distribution utilities#
- twiga.distributions.ml.utils.interpolate_quantile(predictions, sorted_quantiles, target_quantile=0.5)#
Interpolate predictions at a target quantile using linear interpolation.
- Parameters:
predictions (
ndarray[tuple[Any,...],dtype[double]]) – Array of predictions with shape (B, Q, C), where B is batch size, Q is number of quantiles, and C is number of channels.sorted_quantiles (
list[float] |ndarray[tuple[Any,...],dtype[double]]) – Sorted list or array of quantile levels (e.g., [0.1, 0.3, 0.7]).target_quantile (
float) – The target quantile to interpolate (default: 0.5 for median).
- Return type:
- Returns:
Interpolated predictions at target_quantile with shape (B, C).
- Raises:
ValueError – If sorted_quantiles is empty, predictions has incorrect shape, number of quantiles does not match predictions, quantiles are not sorted, or target_quantile is outside the range of sorted_quantiles.
- twiga.distributions.ml.utils.get_median_prediction(predictions, quantiles)#
Compute the median prediction (quantile = 0.5) for given predictions.
- Parameters:
predictions (
ndarray[tuple[Any,...],dtype[double]]) – Array of predictions with shape (B, Q, C), where B is batch size, Q is number of quantiles, and C is number of channels.quantiles (
list[float] |ndarray[tuple[Any,...],dtype[double]]) – List or array of quantile levels (e.g., [0.1, 0.3, 0.7]).
- Return type:
- Returns:
Median predictions with shape (B, C).
- Raises:
ValueError – If quantiles is empty, predictions has incorrect shape, or number of quantiles does not match predictions.
- twiga.distributions.ml.utils.get_sigma_prediction(predictions, quantiles)#
Compute sigma using predictions at quantiles 0.25 and 0.75.
Sigma is calculated as (upper_prediction - lower_prediction) / (2 * 0.6745), where upper_prediction and lower_prediction are at quantiles 0.75 and 0.25, respectively, and 0.6745 is the z-score for the 25th/75th percentiles.
- Parameters:
predictions (
ndarray[tuple[Any,...],dtype[double]]) – Array of predictions with shape (B, Q, C), where B is batch size, Q is number of quantiles, and C is number of channels.quantiles (
list[float] |ndarray[tuple[Any,...],dtype[double]]) – List or array of quantile levels (e.g., [0.1, 0.3, 0.7]).
- Return type:
- Returns:
Sigma predictions with shape (B, C).
- Raises:
ValueError – If quantiles is empty, predictions has incorrect shape, number of quantiles does not match predictions, or interpolation is not possible for quantiles 0.25 or 0.75.