Distribution Families#

Source Files
  • twiga/distributions/nn/parametric.py

  • twiga/distributions/nn/quantile.py

  • twiga/distributions/nn/fpquantile.py

  • twiga/distributions/nn/residual_conformal.py

  • twiga/models/nn/prob/core.py

  • twiga/models/ml/gausscatboost_model.py - GAUSSCATBOOSTConfig / GAUSSCATBOOSTModel

  • twiga/models/ml/ngboostnormal_model.py - NGBOOSTNORMALConfig / NGBOOSTNORMALModel

  • twiga/models/ml/ngboostlognormal_model.py - NGBOOSTLOGNORMALConfig / NGBOOSTLOGNORMALModel

  • twiga/models/ml/ngboostexponential_model.py - NGBOOSTEXPONENTIALConfig / NGBOOSTEXPONENTIALModel

  • twiga/models/ml/prob/base_ngboost.py - BaseNGBoostRegressor

  • twiga/distributions/ml/utils.py

Twiga implements several distribution families for probabilistic neural forecasting. All NN distributions slot into the composable backbone/head architecture described below.

Backbone/Head Architecture#

Probabilistic neural models in Twiga decouple the backbone (which learns to encode the time series) from the distribution head (which maps the latent vector to a forecast). This separation lives in twiga/models/nn/prob/core.py.

        graph LR
    X["Input (B, T, F)"] --> BB["Backbone\n(NHITSNetwork / MLPFNetwork / …)"]
    BB -->|"encode(x) → z (B, encode_size)"| HEAD["Distribution Head\n(Normal / QR / …)"]
    HEAD --> OUT["Distribution Parameters\n(loc, scale) / quantiles / …"]
    

Backbone contract - any backbone must implement:

  • encode(x) Tensor - maps raw input (B, T, F) to latent (B, encode_size)

  • encode_size: int property - width of the latent vector

  • penalty_dict(epoch) dict - returns regularisation terms (Lasso, gate sparsity …)

Head contract - any distribution head must implement:

  • forward(z) tuple - maps latent z to distribution parameters

  • step(z, y, metric_fn, epoch) (loss, metric) - computes training loss in one call

ProbabilisticModel (a LightningModule) wires them together and automatically injects hidden_size=backbone.encode_size into the head constructor so callers never specify it manually.

# Under the hood  -  TwigaForecaster does this for you when you pass a config like MLPFNormalConfig
from twiga.models.nn.prob.core import ProbabilisticModel
from twiga.models.nn.net.mlpf import MLPFNetwork
from twiga.distributions.nn.parametric import NormalDistribution

model = ProbabilisticModel(
    backbone_cls=MLPFNetwork,
    backbone_kwargs={...},          # num_target_feature, forecast_horizon, etc.
    head_cls=NormalDistribution,
    head_kwargs={"num_target_output": 1, "forecast_horizon": 48},
    # hidden_size injected automatically from backbone.encode_size
)

Adding a new distribution requires implementing only the head interface - the backbone and training loop are unchanged.

NN Distributions#

Parametric Distributions#

All parametric heads live in twiga/distributions/nn/parametric.py. Each maps the latent vector z (B, encode_size) to two (or three for Student-T) distribution parameters and trains via negative log-likelihood.

Class

Distribution

Output params

Constraint

Best for

NormalDistribution

Normal

(μ, σ)

σ > 0

Symmetric, unbounded targets

LaplaceDistribution

Laplace

(μ, b)

b > 0

Heavy-tailed, outlier-robust

LogNormalDistribution

LogNormal

(μ, σ)

σ > 0

Strictly positive, right-skewed

GammaDistribution

Gamma

(α, β)

α, β > 0

Strictly positive, flexible skew

BetaDistribution

Beta

(α, β)

α, β > 0

Bounded [0, 1] targets

StudentTDistribution

Student-T

(μ, σ, ν)

σ > 0, ν > 2

Very heavy tails

All share the same interface:

from twiga.distributions.nn.parametric import NormalDistribution

dist = NormalDistribution(
    num_target_output=1,
    hidden_size=256,      # injected automatically by ProbabilisticModel
    forecast_horizon=48,
    out_activation_function=None,
)

# Forward pass → distribution parameters
loc, scale = dist(z)  # z shape: (B, encode_size)

# Get PyTorch distribution object
d = dist.get_distribution(loc, scale)

# Negative log-likelihood for training
nll = dist.get_log_likelihood(loc, scale, targets)

# training_step equivalent
loss, metric = dist.step(z, targets, metric_fn, epoch)

Student-T requires min_df

StudentTDistribution accepts an additional min_df: float parameter (default 2.1) that clips the predicted degrees-of-freedom from below to ensure finite variance.

Quantile Distribution#

The QRDistribution class (twiga/distributions/nn/quantile.py) outputs quantile predictions from a neural network.

from twiga.distributions.nn.quantile import QRDistribution

dist = QRDistribution(
    quantiles=[0.05, 0.25, 0.5, 0.75, 0.95],
    num_outputs=1,
    hidden_size=256,
    horizon=48,
    loss_fn="pinball",    # or "huber-pinball"
    kappa=0.5,
    conf_level=0.05,
)

# Forward pass: shape (batch, num_quantiles, horizon, num_outputs)
quantile_values = dist(hidden_repr)

# Forecast: returns (loc, quantile_values)
loc, quantiles = dist.forecast(hidden_repr)

Parameter

Type

Default

Description

quantiles

list[float] | None

None

Quantile levels to predict

num_outputs

int

1

Number of target variables

hidden_size

int

256

Input hidden dimension

horizon

int

48

Forecast horizon

loss_fn

str

"pinball"

Loss function: "pinball" or "huber-pinball"

kappa

float

0.5

Huber transition parameter

conf_level

float

0.05

Confidence level for bounds

eps

float

1e-6

Numerical stability

output_activation

nn.Module | None

None

Optional output activation

The step() method computes quantile loss and metrics during training:

loss, metric = dist.step(batch, metric_fn, epoch)

Median Extraction#

from twiga.distributions.nn.quantile import get_median_quantile

# Extract median from quantile predictions
median = get_median_quantile(quantile_hats, probs=[0.05, 0.25, 0.5, 0.75, 0.95])
# Shape: (batch, horizon, num_targets)

If \(\tau = 0.5\) is in the quantile list, it is extracted directly. Otherwise, linear interpolation between the two closest quantiles is used.

Full Parameterised Quantile Regression (FPQR)#

FPQRDistribution (twiga/distributions/nn/fpquantile.py) extends quantile regression by treating the quantile levels themselves as learnable. Rather than predicting at a fixed grid of quantiles, a QuantileProposal network proposes the quantile levels adaptively from the latent representation, which are then encoded via CosinetauEmbedding before producing forecast values.

References

  • A. Faustine and A. Moshi, “Full Parameterized Quantile Function for Probabilistic Forecasting,” IEEE Access, 2024. DOI: 10.1109/ACCESS.2024.3402001

  • A. Faustine and L. Aversano, “Adaptive Quantile Learning for Long-Term Probabilistic Forecasting,” IEEE Transactions, 2022. DOI: 10.1109/ACCESS.2022.3143387

        graph LR
    Z["z (B, hidden_size)"] --> QP["QuantileProposal\n→ taus, tau_hats, entropies"]
    Z --> CE["CosinetauEmbedding\n(cosine τ embedding)"]
    QP --> CE
    CE --> QO["quantile_output\n→ (B, n_quantiles, horizon, outputs)"]
    QP --> LOSS["QuantileProposalLoss\n+ QuantileLoss − entropy"]
    

QuantileProposal generates the quantile grid from the backbone’s latent vector:

Parameter

Type

Default

Description

n_quantiles

int

10

Number of quantile levels to propose

z_dim

int

64

Input latent dimension

dropout

float

0.1

Dropout for regularisation

conf_level

float

0.05

Clamps proposals to [α/2, 1 α/2]

CosinetauEmbedding maps proposed tau values to a fixed-size embedding using cosine basis functions:

Parameter

Type

Default

Description

num_cosines

int

32

Number of cosine basis functions

z_dim

int

128

Output embedding dimension

num_outputs

int

48

Scales input size (should match horizon)

FPQRDistribution combines proposal + cosine embedding + output projection:

Parameter

Type

Default

Description

n_quantiles

int | None

9

Quantile levels (None → 9)

num_outputs

int

1

Number of target variables

hidden_size

int

256

Latent dimension from backbone

horizon

int

48

Forecast horizon

dropout

float

0.1

Dropout in proposal network

conf_level

float

0.05

Significance level for interval bounds

kappa

float

0.25

Huber transition parameter

num_cosines

int

32

Cosine basis functions (0 to use DataEmbedding)

loss_fn

str

"pinball"

"pinball" or "huber-pinball"

from twiga.distributions.nn.fpquantile import FPQRDistribution

dist = FPQRDistribution(
    n_quantiles=9,
    num_outputs=1,
    hidden_size=256,
    horizon=48,
    conf_level=0.05,
    loss_fn="pinball",
)

# Training step: loss = quantile_loss + proposal_loss − entropy
loss, metric = dist.step(z, y, metric_fn, epoch)

# Inference: returns dict with loc, quantiles, quantile_levels, conf_level
result = dist.forecast(z)
# result["loc"]             shape: (B, horizon, num_outputs)
# result["quantiles"]       shape: (B, n_quantiles, horizon, num_outputs)
# result["quantile_levels"] shape: (B, n_quantiles, horizon)  ← per-sample proposal

FPQR vs QR

QRDistribution predicts at a fixed, user-specified quantile grid. FPQRDistribution learns the quantile levels jointly with the forecast values, adapting the grid per sample. The proposal loss plus entropy regularisation encourages diverse, well-spread coverage. Use FPQR when you want the model to discover the most informative quantile levels automatically.

The training loss combines three terms:

\[\mathcal{L} = \mathcal{L}_{\text{quantile}} + \mathcal{L}_{\text{proposal}} - \mathbb{H}(\text{probs})\]

where \(\mathbb{H}\) is the entropy of the proposed quantile distribution, acting as a diversity regulariser.

Conditional Residual Calibration (CRC)#

CRC heads (twiga/distributions/nn/residual_conformal.py) learn a point forecast \(\mu\) and a per-step scale \(\sigma\) end-to-end — no post-hoc conformal calibration step required. The scale MLP is trained to approximate the absolute residual \(|y - \mu|\), giving prediction intervals of the form \([\mu \pm z_\alpha \sigma]\) at inference time.

Two variants exist, differing only in how \(\mu\) is computed:

Class

Backbone family

Mu path

CRCDistribution

MLPF, NHiTS, RNN

Linear projection from latent z (B, hidden_size)(B, H×O)

AdditiveCRCDistribution

MLPGAM, MLPGAF

encode() already returns the additive mean directly — no projection, preserving the GAM decomposition

Loss function:

The joint training objective is:

\[\mathcal{L} = \underbrace{\alpha \cdot \text{MSE}(\mu, y) + (1-\alpha) \cdot \text{MAE}(\mu, y)}_{\mathcal{L}_\mu} + \underbrace{\mathcal{L}_\sigma(\sigma,\, |y-\mu|)}_{\text{sigma calibration}}\]

The sigma calibration loss \(\mathcal{L}_\sigma\) is selected via sigma_loss_fn:

sigma_loss_fn

Formula

Use when

"hybrid_sqrt" (default)

\(\alpha \cdot \text{MSE}(\sigma, \sqrt{r}) + (1-\alpha) \cdot \text{MAE}(\sigma, \sqrt{r})\)

General purpose; variance-stabilised residual

"hybrid"

\(\alpha \cdot \text{MSE}(\sigma, r) + (1-\alpha) \cdot \text{MAE}(\sigma, r)\)

General purpose; same \(\alpha\) as mu loss

"gaussian"

\(\frac{1}{2}\left(\frac{r^2}{\sigma^2} + \log \sigma^2\right)\)

Heteroscedastic Gaussian NLL

"laplace"

\(\frac{r}{\sigma} + \log(2\sigma)\)

Robust to heavy-tailed residuals

"mse"

\((\sigma - r)^2\)

Distribution-free MSE on residual magnitude

"log"

\(\log(1 + (\sigma - r)^2)\)

Robust MSE with log-damping of large errors

where \(r = |y - \mu|\).

Two-stage training (MLPGAMCRC default, controlled by two_stage=True):

        graph LR
    S1["Stage 1\nBackbone + mu_layer\n(sigma_layer frozen)"] --> S2["Stage 2\nSigma MLP only\n(backbone frozen)"]
    

In stage 1 the backbone and mu path are updated via the combined \(\mathcal{L}_\mu + \mathcal{L}_\sigma\) loss. In stage 2 only sigma_layer parameters receive gradients — the backbone is frozen by Lightning’s toggle_optimizer. This prevents the scale head from distorting the point forecast.

from twiga.distributions.nn.residual_conformal import CRCDistribution, AdditiveCRCDistribution

# Standard backbone (MLPF, NHiTS)
crc = CRCDistribution(
    num_target_output=1,
    hidden_size=256,         # injected from backbone.encode_size
    forecast_horizon=48,
    sigma_loss_fn="hybrid_sqrt",  # "hybrid_sqrt" | "hybrid" | "gaussian" | "laplace" | "mse" | "log"
    alpha=0.1,
)
mu, sigma = crc(z)                           # z: (B, hidden_size)
loss = crc.get_log_likelihood(mu, sigma, y)  # sigma calibration loss only (not point loss)

# Sigma-only step (stage 2 of two-stage training)
loss, metric = crc.step_sigma(z, y, metric_fn)

# Additive backbone (MLPGAM, MLPGAF) — mu_layer is Identity, no linear projection
add_crc = AdditiveCRCDistribution(
    num_target_output=1,
    hidden_dim=256,          # internal sigma MLP width (independent of H*O)
    forecast_horizon=48,
)
mu, sigma = add_crc(z)      # z shape (B, H*O); mu = reshape(z), sigma from ResidualSigmaHead

CRC vs post-hoc conformal

Post-hoc conformal prediction (see Conformal Prediction) wraps any already-trained model and provides finite-sample coverage guarantees. CRC bakes the calibration objective directly into training — no separate calibration split needed — but the coverage guarantee is empirical rather than distribution-free. CRC typically produces tighter intervals when the signal-to-noise ratio is high.

ML Parametric Models#

Twiga provides two families of parametric ML models that output full distribution parameters rather than just point forecasts.

GaussCatBoost#

GAUSSCATBOOSTModel uses CatBoost’s built-in RMSEWithUncertainty loss, which jointly learns the mean μ and aleatoric scale σ in a single model. One CatBoostRegressor is trained per output step, and virtual ensemble snapshots provide epistemic uncertainty decomposition.

Model

Name

Config Class

Output

Gaussian CatBoost

gausscatboost

GAUSSCATBOOSTConfig

(μ, σ) per horizon step

GAUSSCATBOOSTConfig extends CATBOOSTConfig with:

Field

Type

Default

Description

name

Literal["gausscatboost"]

"gausscatboost"

Model identifier.

od_type

Literal["Iter", "IncToDec"]

"Iter"

Overfitting detection strategy; active only when an eval set is provided to fit().

od_wait

int

20

Iterations without improvement before early stopping triggers.

virtual_ensembles_count

int

10

Virtual ensemble snapshots for epistemic uncertainty estimation via predict_with_uncertainty(). Excluded from parameter dumps.

from twiga.models.ml.gausscatboost_model import GAUSSCATBOOSTConfig

config = GAUSSCATBOOSTConfig(
    iterations=500,
    learning_rate=0.05,
    task_type="CPU",
    virtual_ensembles_count=20,
)

Uncertainty decomposition

predict_with_uncertainty() returns three components per output step: mean (predicted μ), knowledge_uncertainty (epistemic — variance across virtual ensemble snapshots), and data_uncertainty (aleatoric — the model’s second output, exp of predicted log-σ).

NGBoost#

NGBoost (Natural Gradient Boosting) fits all distribution parameters simultaneously via the natural gradient of a proper scoring rule. Unlike the two-stage GAUSSCATBOOSTModel, which fits mean and variance separately, NGBoost optimises both jointly. All models extend BaseNGBoostRegressor and train one NGBRegressor per flattened output column.

Model

Name

Config Class

Distribution

Best for

NGBoost Normal

ngboostnormal

NGBOOSTNORMALConfig

Normal N(μ, σ²)

Symmetric, unbounded targets

NGBoost LogNormal

ngboostlognormal

NGBOOSTLOGNORMALConfig

LogNormal

Strictly positive, right-skewed targets

NGBoost Exponential

ngboostexponential

NGBOOSTEXPONENTIALConfig

Exponential (1/λ)

Non-negative, memoryless processes

NGBoost vs two-stage Gaussian

NGBoost jointly optimises μ and σ via natural gradient, producing better-calibrated intervals. The two-stage GAUSSCATBOOSTModel (RMSEWithUncertainty) is faster and scales better on very large datasets.

NGBOOSTNORMALModel#

Fits a Gaussian distribution N(μ, σ²) jointly optimising mean and standard deviation.

NGBOOSTNORMALConfig fields:

Field

Type

Default

Description

name

Literal["ngboostnormal"]

"ngboostnormal"

Model identifier.

domain

Literal["ml"]

"ml"

Domain identifier.

n_estimators

int

500

Number of boosting rounds.

learning_rate

float

0.01

Shrinkage applied to each boosting step.

minibatch_frac

float

1.0

Fraction of training data used per round.

col_sample

float

1.0

Fraction of features sampled per tree.

random_state

int

42

Random seed.

score

Literal["LogScore", "CRPScore"]

"LogScore"

Scoring rule for training objective.

Output: (loc, scale) = (mean μ, standard deviation σ).

from twiga.models.ml.ngboostnormal_model import NGBOOSTNORMALConfig

config = NGBOOSTNORMALConfig(n_estimators=300, learning_rate=0.05)

NGBOOSTLOGNORMALModel#

Fits a LogNormal distribution, appropriate for strictly positive targets such as solar irradiance or non-negative electricity prices.

NGBOOSTLOGNORMALConfig shares all fields with NGBOOSTNORMALConfig with name="ngboostlognormal".

Output: (loc, scale) = (geometric mean exp(μ_log), log-space standard deviation σ_log).

Interpreting LogNormal output

loc is the geometric mean (the exponentiated log-space mean), not the arithmetic mean. To recover the arithmetic mean use exp(log(loc) + scale² / 2). For prediction intervals, exponentiate the log-space quantiles.

from twiga.models.ml.ngboostlognormal_model import NGBOOSTLOGNORMALConfig

config = NGBOOSTLOGNORMALConfig(n_estimators=300, score="CRPScore")

NGBOOSTEXPONENTIALModel#

Fits an Exponential distribution parameterised by its scale (1/λ). Suitable for non-negative targets with a memoryless property.

NGBOOSTEXPONENTIALConfig shares all fields with NGBOOSTNORMALConfig with name="ngboostexponential".

Output: Both loc and scale equal the predicted scale parameter 1/λ (the Exponential distribution has a single parameter).

Single-parameter distribution

The Exponential distribution is fully characterised by its scale (1/λ), so both loc and scale in the output tuple carry the same value. Mean and standard deviation are both equal to 1/λ.

from twiga.models.ml.ngboostexponential_model import NGBOOSTEXPONENTIALConfig

config = NGBOOSTEXPONENTIALConfig(n_estimators=400)

LogScore vs CRPScore

LogScore maximises the predictive log-likelihood — faster and often better-calibrated for in-distribution data. CRPScore (Continuous Ranked Probability Score) is more robust to distributional shift and outliers but typically slower to converge.

ML Distribution Utilities#

The twiga/distributions/ml/utils.py module provides utilities for post-processing quantile predictions from ML models.

interpolate_quantile()#

Linearly interpolates to extract a specific quantile from a set of predicted quantiles.

from twiga.distributions.ml.utils import interpolate_quantile

# predictions shape: (batch, num_quantiles, num_targets)
q30 = interpolate_quantile(
    predictions=predictions,
    sorted_quantiles=[0.05, 0.25, 0.5, 0.75, 0.95],
    target_quantile=0.3,
)
# Output shape: (batch, num_targets)

get_median_prediction()#

Extracts the median prediction (quantile 0.5).

from twiga.distributions.ml.utils import get_median_prediction

median = get_median_prediction(predictions, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])

get_sigma_prediction()#

Estimates the standard deviation from the interquartile range:

\[\sigma = \frac{Q_{0.75} - Q_{0.25}}{2 \times 0.6745}\]
from twiga.distributions.ml.utils import get_sigma_prediction

sigma = get_sigma_prediction(predictions, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])

Note

The constant 0.6745 is the 75th percentile of the standard normal distribution. This conversion assumes approximate normality of the residuals.

How Distributions Connect to Models#

        graph TD
    subgraph "Point Models"
        A[CatBoost / XGBoost / LightGBM / MLPF / NHITS] --> B["predict() → loc array"]
    end

    subgraph "Parametric NN Models"
        C["ProbabilisticModel\n(backbone + head)"] --> D["predict() → (loc, scale, …)"]
        D --> E["NormalDistribution / LaplaceDistribution\nGammaDistribution / BetaDistribution / …"]
    end

    subgraph "Quantile Models"
        G[QRCatBoost / QRXGBoost / QRLightGBM] --> H["predict() → quantile array"]
        I["MLPFQR / NHITSQRModel / …"] --> H
        H --> J[QRDistribution / FPQRDistribution]
    end

    subgraph "ML Parametric (Gaussian)"
        K["GAUSSCATBOOSTModel"] --> L["predict() → (μ, σ)"]
    end

    B --> CP[Conformal Prediction]
    D --> CP
    H --> CP
    L --> CP
    CP --> PI[Prediction Intervals]
    

Point models produce a single prediction per time step and can be wrapped with split conformal prediction.

Parametric NN models (via ProbabilisticModel) produce full distribution parameters and train via NLL. They can be used stand-alone or combined with conformal residual fitting.

Quantile models produce multiple quantile predictions and can be used with conformal quantile regression or taken directly as prediction intervals.

ML Usage Example#

import pandas as pd

from twiga.core.config import DataPipelineConfig, ExperimentConfig
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.gausscatboost_model import GAUSSCATBOOSTConfig
from twiga.models.ml.ngboostnormal_model import NGBOOSTNORMALConfig

data_config = DataPipelineConfig(
    target_feature="load_mw",
    period="1h",
    lookback_window_size=168,
    forecast_horizon=24,
    lags=[1, 24, 168],
    input_scaler="standard",
)

train_config = ExperimentConfig(
    split_freq="months",
    train_size=6,
    test_size=1,
    window="expanding",
    project_name="GaussianForecast",
)

# Gaussian CatBoost — fast, single model via RMSEWithUncertainty
gauss_config = GAUSSCATBOOSTConfig(task_type="CPU", virtual_ensembles_count=10)

# NGBoost Normal — joint optimisation of μ and σ
ngboost_config = NGBOOSTNORMALConfig(n_estimators=300, learning_rate=0.05)

forecaster = TwigaForecaster(
    data_params=data_config,
    model_params=[gauss_config],
    cv_params=train_config,
)

forecaster.fit(train_df=train_df)

interval_dict, _ = forecaster.predict_interval(test_df=test_df)
for model_name, (lower, point, upper) in interval_dict.items():
    print(f"{model_name}: lower={lower.shape}, point={point.shape}, upper={upper.shape}")

API Reference#

Parametric distribution base#

class twiga.distributions.nn.parametric.BaseDistribution(num_target_output, hidden_size, forecast_horizon)#

Bases: ABC, Module

Abstract base for probabilistic output heads.

Subclasses must implement forward(), get_distribution(), and get_log_likelihood().

All heads share the same input contract:

x: (B, hidden_size) - latent representation from encoder/decoder.

All heads share the same output contract for forecasting tensors:

shape (B, forecast_horizon, num_target_output).

forecast(z)#

Inference-mode forward (no gradient).

Parameters:

z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

Return type:

dict[str, Tensor]

Returns:

Dict with "loc" (location/mean) and "scale" (std/dispersion). The first forward() output is always loc; the second is scale.

abstractmethod forward(x)#

Map latent x → distribution parameters.

Return type:

tuple[Tensor, ...]

abstractmethod get_distribution(*params)#

Construct a torch Distribution from predicted parameters.

Return type:

Distribution

abstractmethod get_log_likelihood(*params, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

step(z, y, metric_fn, epoch=None)#

Compute NLL loss and metric for one training/validation step.

Parameters:
  • z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

  • y (Tensor) – Target tensor of shape (B, forecast_horizon, num_target_output).

  • metric_fn (Callable[..., Any]) – Callable returning a scalar metric given (pred, target).

  • epoch (int | None) – Current epoch (unused here; available for subclasses).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (nll_loss, metric).

Parametric heads — standard backbones#

class twiga.distributions.nn.parametric.NormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Normal (Gaussian) output head.

Best for: symmetric, unbounded targets - energy demand, temperature.

Parameters predicted:

mu - mean (unconstrained) sigma - std dev (exp of log_scale, strictly positive)

Parameters:
  • num_target_output (int) – Number of output features per time step.

  • hidden_size (int) – Dimensionality of the input latent vector.

  • forecast_horizon (int) – Number of forecast time steps.

  • out_activation_function (Module | None) – Optional activation applied to mu. Defaults to Identity.

forward(x)#

Predict mu and sigma.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, sigma), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch Distribution from predicted parameters.

Return type:

Normal

get_log_likelihood(mu, sigma, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.LaplaceDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Laplace output head.

Best for: targets with heavier tails than Normal, robust to outliers - electricity prices, wind speed residuals.

Parameters predicted:

mu - location (unconstrained) scale - scale (exp of log_scale, strictly positive)

forward(x)#

Predict mu and scale.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, scale), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, scale)#

Construct a torch Distribution from predicted parameters.

Return type:

Laplace

get_log_likelihood(mu, scale, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.LogNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Log-Normal output head.

Best for: strictly positive, right-skewed targets - renewable generation, gas prices, load with zero floor.

Parameters predicted:

mu - mean of log(target) (unconstrained) sigma - std of log(target) (exp of log_scale, strictly positive)

Note

Targets must be strictly positive. Apply a small epsilon shift in the data pipeline if zeros are possible (e.g., target = max(y, 1e-6)).

forward(x)#

Predict mu and sigma of the underlying Normal (in log space).

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, sigma), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch Distribution from predicted parameters.

Return type:

LogNormal

get_log_likelihood(mu, sigma, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.GammaDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48)#

Bases: BaseDistribution

Gamma output head.

Best for: strictly positive targets with flexible skew - solar irradiance, wind power, load at aggregate level.

Parameters predicted:

concentration - shape parameter α (softplus, strictly positive) rate - rate parameter β (softplus, strictly positive)

Note

Mean of Gamma(α, β) = α/β. Targets must be strictly positive.

forward(x)#

Predict concentration and rate.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (concentration, rate), each of shape (B, forecast_horizon, num_target_output).

get_distribution(concentration, rate)#

Construct a torch Distribution from predicted parameters.

Return type:

Gamma

get_log_likelihood(concentration, rate, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.BetaDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48)#

Bases: BaseDistribution

Beta output head.

Best for: bounded [0, 1] targets - capacity factors, state of charge, fill rates, normalised demand ratios.

Parameters predicted:

alpha - first shape parameter (softplus, strictly positive) beta - second shape parameter (softplus, strictly positive)

Note

Targets must lie strictly in (0, 1). Apply a clamp in the data pipeline if boundary values are possible: target = target.clamp(1e-6, 1 - 1e-6)

forward(x)#

Predict alpha and beta.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (alpha, beta), each of shape (B, forecast_horizon, num_target_output).

get_distribution(alpha, beta)#

Construct a torch Distribution from predicted parameters.

Return type:

Beta

get_log_likelihood(alpha, beta, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.StudentTDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, min_df=2.1)#

Bases: BaseDistribution

Student-T output head.

Best for: targets with very heavy tails and potential outliers - financial returns, spot electricity prices.

Parameters predicted:

mu - location (unconstrained) sigma - scale (exp of log_scale, strictly positive) df - degrees of freedom (softplus, clipped to ≥ 2 for finite variance)

Note

Degrees of freedom are predicted per-sample but averaged across the forecast horizon so the model learns a single df per series.

forecast(z)#

Inference-mode forward (no gradient).

Parameters:

z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

Return type:

dict[str, Tensor]

Returns:

Dict with "loc" (location/mean) and "scale" (std/dispersion). The first forward() output is always loc; the second is scale.

forward(x)#

Predict mu, sigma, and degrees of freedom.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

Tuple of (mu, sigma, df) – - mu, sigma: shape (B, forecast_horizon, num_target_output) - df: shape (B, 1, num_target_output), broadcast-compatible

get_distribution(mu, sigma, df)#

Construct a torch Distribution from predicted parameters.

Return type:

StudentT

get_log_likelihood(mu, sigma, df, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

Parametric heads — additive backbones (MLPGAM / MLPGAF)#

These variants receive the pre-summed additive mean directly from the backbone instead of a latent vector, preserving the GAM decomposition.

class twiga.distributions.nn.parametric.AdditiveNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Normal output head preserving additive backbone structure.

For use with MLPGAMNetwork and MLPGAFNetwork, whose encode() already returns the additive mean directly as a flat (B, H*O) vector. The mean is taken as-is (no projection) to honour the additive decomposition; only the scale is learned via a separate linear layer.

Parameters predicted:

mu - additive mean (reshaped directly from latent, no projection) sigma - std dev (exp of log_scale, strictly positive)

Parameters:
  • num_target_output (int) – Number of output features per time step.

  • hidden_size (int) – Must equal forecast_horizon * num_target_output.

  • forecast_horizon (int) – Number of forecast time steps.

  • out_activation_function (Module | None) – Optional activation applied to mu. Defaults to Identity.

forward(x)#

Predict mu and sigma.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size) - the additive mean from the backbone.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, sigma), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch Distribution from predicted parameters.

Return type:

Normal

get_log_likelihood(mu, sigma, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.AdditiveLaplaceDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Laplace output head preserving additive backbone structure.

Analogous to AdditiveNormalDistribution but with Laplace tails. Best for: heavy-tailed, outlier-robust targets - electricity prices, wind residuals.

Parameters predicted:

mu - additive location (reshaped directly from latent, no projection) scale - scale (exp of log_scale, strictly positive)

forward(x)#

Predict mu and scale.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size) - the additive location from the backbone.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, scale), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, scale)#

Construct a torch Distribution from predicted parameters.

Return type:

Laplace

get_log_likelihood(mu, scale, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.AdditiveLogNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Log-Normal output head preserving additive backbone structure.

Best for: strictly positive, right-skewed targets - renewable generation, gas prices.

Parameters predicted:

mu - additive log-space mean (reshaped directly from latent, no projection) sigma - std of log(target) (exp of log_scale, strictly positive)

Note

Targets must be strictly positive.

forward(x)#

Predict mu and sigma of the underlying Normal (in log space).

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size) - the additive log-mean from the backbone.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, sigma), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch Distribution from predicted parameters.

Return type:

LogNormal

get_log_likelihood(mu, sigma, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.AdditiveStudentTDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, min_df=2.1)#

Bases: BaseDistribution

Student-T output head preserving additive backbone structure.

Best for: very heavy tails - spot electricity prices, financial returns.

Parameters predicted:

mu - additive location (reshaped directly from latent, no projection) sigma - scale (exp of log_scale, strictly positive) df - degrees of freedom (softplus, clipped to ≥ min_df for finite variance)

forecast(z)#

Inference-mode forward (no gradient).

Parameters:

z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

Return type:

dict[str, Tensor]

Returns:

Dict with "loc" (location/mean) and "scale" (std/dispersion). The first forward() output is always loc; the second is scale.

forward(x)#

Predict mu, sigma, and degrees of freedom.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size) - the additive location from the backbone.

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

Tuple of (mu, sigma, df) – - mu, sigma: shape (B, forecast_horizon, num_target_output) - df: shape (B, 1, num_target_output), broadcast-compatible

get_distribution(mu, sigma, df)#

Construct a torch Distribution from predicted parameters.

Return type:

StudentT

get_log_likelihood(mu, sigma, df, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

Quantile regression distribution#

class twiga.distributions.nn.quantile.QRDistribution(quantiles=None, num_outputs=1, hidden_size=256, horizon=48, kappa=0.5, output_activation=None, conf_level=0.05, loss_fn='pinball', crossing_penalty=10.0, sort_output=False, monotone_output=False, crossing_mode='relu', softplus_beta=1.0)#

Bases: Module

Fixed-grid quantile regression head.

Predicts a fixed set of quantile levels $tau$ directly from a latent representation via a single linear projection. Boundary quantiles conf_level/2 and 1 - conf_level/2 are automatically prepended / appended so the output always spans the requested coverage band.

Non-crossing is encouraged by a penalty term in step(); it is not enforced structurally. For architectural monotonicity guarantees, use FPQRDistribution with qvn_type="monotone".

Parameters:
  • quantiles (list[float] | None) – Explicit quantile levels the model outputs. None defaults to [0.1, 0.2, …, 0.9]. The list is used exactly as provided — no boundary levels are auto-inserted. Include conf_level/2 and 1 - conf_level/2 explicitly if raw (non-conformal) interval construction is required.

  • num_outputs (int) – Number of target series D.

  • hidden_size (int) – Width of the backbone latent vector fed as input.

  • horizon (int) – Forecast horizon H.

  • kappa (float) – Huber transition parameter (only used for huber-pinball).

  • output_activation (Module | None) – Applied elementwise to the raw quantile outputs. None (default) uses Identity. Warning: non-identity activations (e.g. ReLU) constrain the output range and will produce wrong forecasts for targets that can be negative.

  • conf_level (float) – Significance level α; stored as metadata for downstream consumers (conformal calibration, interval labelling). Does not affect the quantile grid.

  • loss_fn (str) – "pinball", "huber-pinball", or "winkler". "winkler" applies the interval score to the outer quantile pair and pinball loss to all K levels.

  • crossing_penalty (float) – Weight λ for the non-crossing hinge penalty.

  • sort_output (bool) – If True, sort quantile predictions along the quantile axis (dim=1) at inference and training time. Guarantees non-crossing at inference without changing the parameterisation. Has no effect when monotone_output=True.

  • monotone_output (bool) – If True, use a cumulative-softplus construction in forward() to guarantee strict monotonicity by parameterisation (anchor + positive increments). Supersedes sort_output and makes crossing_penalty redundant.

forecast(input_tensor)#

Forecast quantiles for the given input tensor.

Parameters:

input_tensor (Tensor) – Input tensor of shape (batch_size, hidden_size).

Return type:

dict

Returns:

dict

Keys loc (B, H, D), quantiles (B, K, H, D),

conf_level (float), quantile_levels (K,).

forward(x)#

Forward pass producing raw (unconstrained) quantile values.

Outputs K quantile predictions directly from the linear head without structural monotonicity enforcement. Non-crossing is encouraged during training via the crossing_penalty term in step():

loss = pinball_loss + λ · non_crossing_loss(Q̂)

where non_crossing_loss penalises any pair (k, k+1) where Q̂(τₖ) > Q̂(τₖ₊₁). The quantile_value_layer is initialised with small uniform weights to reduce crossing violations at the start of training.

Parameters:

x (Tensor) – Latent features of shape (B, hidden_size).

Return type:

Tensor

Returns:

Estimated quantile values Q(τⱼ | x), shape (B, K, H, D) where K = len(self.taus).

step(z, y, metric_fn, _epoch=None)#

Perform a single training/validation step.

Parameters:
  • z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

  • y (Tensor) – Target tensor of shape (B, forecast_horizon, num_outputs).

  • metric_fn (Callable[..., Any]) – Callable returning a scalar metric given (pred, target).

  • _epoch (int | None) – Unused; accepted for interface consistency with other heads.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (loss, metric).

twiga.distributions.nn.quantile.get_median_quantile(quantile_hats, probs)#

Compute the median (0.5 quantile) from a quantile tensor of shape (B, N, T, C).

Parameters:
  • quantile_hats (Tensor) – Tensor of shape (B, N, T, C) containing quantile values.

  • probs (Tensor | list) – Quantile probabilities (e.g., [0.1, 0.3, 0.5, 0.7, 0.9]).

Return type:

Tensor

Returns:

torch.Tensor – Median values of shape (B, T, C).

Raises:

ValueError – If input shapes are invalid or interpolation is not possible.

FPQR components#

class twiga.distributions.nn.fpquantile.QuantileProposal(n_quantiles=10, z_dim=64, dropout=0.1, conf_level=0.05, n_outputs=1, crossing_penalty=10.0, stop_gradient=True)#

Bases: Module

A neural network module for proposing quantile probability levels.

Projects the context vector z through a linear layer and softmax to obtain a valid probability distribution over n_quantiles slots, then computes the cumulative sum to produce strictly increasing probability levels taus [0, 1].

What this module outputs. tau_hats are probability-level midpoints (values in (0, 1)), not forecast values. The value network maps these levels to actual quantile predictions.

Gradient isolation. tau_hats are detached before being returned. The value-network loss must not back-propagate through the proposal parameters; those are trained separately via QuantileProposalLoss.

Entropy term. entropies (Shannon entropy of the proposal distribution) should be subtracted from the total loss (loss -= entropies.mean()), encouraging the proposal to spread its quantile grid rather than concentrate on a few levels.

crossing_penalty. Stored as self.crossing_penalty and read by step() to scale non_crossing_loss(). Not applied inside forward().

Variables:
  • n_quantiles – Number of quantiles to propose.

  • n_outputs – Number of forecast horizons.

  • z_dim – Dimensionality of the input features.

  • net – Linear layer mapping z to quantile logits.

  • dropout – Dropout layer for regularization.

  • conf_level – Stored as metadata for downstream consumers (conformal calibration). Does not affect the tau_hats clamp range.

  • tau_0 – Zero lower-boundary buffer prepended to taus.

Parameters:
  • n_quantiles (int) – Number of quantiles to propose (default: 10).

  • z_dim (int) – Dimensionality of the input features (default: 64).

  • dropout (float) – Dropout rate for regularization (default: 0.1).

  • conf_level (float) – Significance level stored as metadata (default: 0.05).

  • n_outputs (int) – Number of forecast horizons (default: 1).

  • crossing_penalty (float) – Penalty weight read externally by FPQRDistribution.step (default: 10.0).

forward(z)#

Compute probability-level proposals from the context vector.

Parameters:

z (Tensor) – Context vector, shape (batch_size, z_dim).

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

Tuple of three tensors

  • taus (B, N+1, H) — strictly increasing cumulative probability boundaries including the fixed zero lower bound.

  • tau_hats (B, N, H) — midpoint probability levels between consecutive boundaries, clamped to [0.01, 0.99]. Detached when stop_gradient=True (default), keeping the pinball gradient isolated from proposal parameters. When stop_gradient=False, gradients flow from the pinball loss through the τ-embedding back to the proposal parameters, supplementing the Wasserstein-1 training signal.

  • entropies (B, H) — Shannon entropy of the proposal distribution. Subtract from the total loss to encourage diversity.

Raises:

ValueError – If z does not have shape (batch_size, z_dim).

class twiga.distributions.nn.fpquantile.CosinetauEmbedding(num_cosines=32, z_dim=128, num_outputs=48, tau_embed_type='TimeEmb')#

Bases: Module

A PyTorch module for embedding time values using cosine transformations.

Transforms quantile levels $hat{tau}$ into cosine features and projects them to the backbone hidden dimension via a configurable embedding strategy.

Parameters:
  • num_cosines (int) – Number of cosine functions. Defaults to 32.

  • z_dim (int) – Output embedding dimension. Defaults to 128.

  • num_outputs (int) – Number of horizon steps (scaling factor for input size). Defaults to 48.

  • tau_embed_type (str) –

    Projection strategy applied after the cosine features are computed. One of:

    • "TimeEmb" (default)Linear(H·K d) + ReLU applied per quantile level independently. No inter-quantile mixing.

    • "LinearEmb" — pure linear projection without bias or activation. Slightly fewer parameters than TimeEmb; useful when a non-linear gate is provided elsewhere.

    • "ConvEmb" — circular Conv1d(kernel=3) over the quantile axis. Mixes cosine features from neighbouring quantile levels $hat{tau}_{n-1}, hat{tau}_n, hat{tau}_{n+1}$, allowing the embedding to encode relative spacing between adjacent levels.

forward(taus)#

Transforms time values into cosine-based embeddings.

Parameters:

taus (Tensor) – Input tensor of shape (batch_size, num_outputs, num_inputs).

Return type:

Tensor

Returns:

Tensor of shape (batch_size, num_outputs, z_dim) containing embedded time values.

class twiga.distributions.nn.fpquantile.FPQRDistribution(n_quantiles=9, num_outputs=1, hidden_size=256, horizon=48, proposal_type='softmax', tau_embed_type='TimeEmb', rope_type='linear', rope_base=None, dropout=0.1, conf_level=0.05, kappa=0.25, num_cosines=32, output_activation=None, activation_function='ReLU', loss_fn='pinball', crossing_penalty=10.0, sort_output=False, qvn_type='standard', crossing_mode='relu', softplus_beta=1.0, lambda_cal=0.0, cal_temperature=1.0, proposal_stop_gradient=True, proposal_loss_fn='wasserstein', sqda_weight=0.1)#

Bases: Module

A neural network for forecasting quantiles using a quantile value network.

This module predicts quantiles for a forecasting task, combining a quantile proposal layer with a linear output layer to produce quantile forecasts.

Parameters:
  • n_quantiles (int | None) – Number of quantiles to forecast. If None, defaults to 9.

  • num_outputs (int) – Number of output features. Defaults to 1.

  • hidden_dim (int) – Size of the hidden layers. Defaults to 256.

  • horizon (int) – Number of time steps to forecast. Defaults to 48.

  • dropout (float) – Dropout rate for the quantile proposal layer. Defaults to 0.1.

  • conf_level (float) – Confidence level for quantile proposal. Defaults to 0.05.

  • output_activation (Module | None) – Activation function for the output layer. Defaults to nn.Identity().

  • proposal_type (Literal['softmax', 'horizon', 'kumaraswamy']) – Quantile proposal strategy - 'softmax' (default, entropy-regularised uniform push), 'horizon' (per-step conditioning), or 'kumaraswamy' (tail-adaptive).

  • tau_embed_type (str) – Projection applied to the cosine features (only when num_cosines > 0). 'TimeEmb' (default, linear + ReLU), 'LinearEmb' (pure linear), or 'ConvEmb' (conv over quantile axis).

  • rope_type (str) – RoPE variant used when num_cosines=0. One of 'linear' (default, integer slot-index positions), 'tau_pos' (mean τ̄_k scaled to [0, N_q] as float positions), 'factored' (split z_dim: quantile-ordering RoPE ⊕ horizon RoPE), or 'direct' (no projection; backbone z rotated by τ position).

  • rope_base (float | None) – RoPE base frequency passed to TauRoPEEmbedding. None (default) auto-computes max(2, p_max × 4/π) per axis. Only used when num_cosines=0.

  • qvn_type (Literal['standard', 'monotone', 'monotone_cal']) –

    Quantile value network architecture.

    • 'standard' (default) — direct prediction; an optional crossing penalty enforces monotonicity during training.

    • 'monotone' — cumulative-softplus increments (MonotonicQVN); non-crossing is guaranteed by construction and no penalty is applied.

  • crossing_mode (Literal['relu', 'softplus']) –

    Non-crossing penalty shape used when qvn_type='standard'.

    • 'relu' (default) — hinge penalty; zero gradient when quantiles are already ordered.

    • 'softplus' — smooth hinge; gradients flow even for near-crossing pairs, removing the dead-gradient zone of ReLU.

  • softplus_beta (float) – Sharpness parameter shared by both modes. When crossing_mode='softplus' it controls the smoothness of the crossing penalty (larger → closer to ReLU). When qvn_type='monotone' it controls the sharpness of the increment gate in MonotonicQVN. Default 1.0.

forecast(input_tensor)#

Forecast quantiles for the given input tensor.

quantile_levels is the per-sample proposal grid averaged across the batch and horizon dimensions so that downstream consumers receive a fixed 1-D array of representative probability levels, matching the interface expected by ForecastResult.

Parameters:

input_tensor (Tensor) – Input tensor of shape (batch_size, z_dim).

Return type:

dict[str, Any]

Returns:

Dict with keys

  • "loc": median estimate — quantile slot whose tau_hat is nearest 0.5, shape (B, horizon, num_outputs).

  • "quantiles": quantile forecasts, shape (B, n_quantiles, horizon, num_outputs).

  • "quantile_levels": representative 1-D numpy array of length n_quantiles (mean of tau_hats over batch and horizon).

  • "conf_level": significance level scalar.

forward(input_tensor)#

Forward pass of the quantile forecasting network.

Parameters:

input_tensor (Tensor) – Input tensor of shape (batch_size, z_dim).

Return type:

tuple[Tensor, ...]

Returns:

torch.Tensor

Forecasted quantiles of shape

(batch_size, n_quantiles, horizon, num_outputs).

step(z, y, metric_fn, epoch=None)#

Perform a single training/validation step.

Parameters:
  • z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

  • y (Tensor) – Target tensor of shape (B, forecast_horizon, num_outputs).

  • metric_fn (Callable[..., Any]) – Callable returning a scalar metric given (pred, target).

  • epoch (int | None) – Current epoch (unused; accepted for interface consistency).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (loss, metric).

CRC distribution classes#

class twiga.distributions.nn.residual_conformal.CRCDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, sigma_loss_fn='hybrid', alpha=0.1, activation='ReLU')#

Bases: BaseDistribution

CRC head for standard backbones (e.g. MLPFNetwork).

The mean is computed by projecting the latent vector z through a linear layer, then reshaping to (B, H, O). The scale is predicted by a sigma layer applied to the detached mean (flattened), ensuring that gradients do not flow back into the backbone during sigma-only training.

Training follows a two-stage protocol:

Stage 1 — step(): optimise μ (backbone + mu_layer), σ is not updated. Stage 2 — step_sigma(): freeze backbone, optimise σ-head only.

Parameters:
  • num_target_output (int) – Number of output features per time step.

  • hidden_dim – Dimensionality of the backbone latent vector.

  • forecast_horizon (int) – Number of forecast time steps.

  • out_activation_function (Module | None) – Optional activation applied to the mean output.

  • sigma_loss_fn (str) – Calibration objective for σ. One of SIGMA_LOSS_FNS. Default: "hybrid".

  • alpha (float) – Weight for MSE vs L1 in the mu and hybrid sigma losses.

  • activation (str) – Unused; kept for API compatibility.

Raises:

ValueError – If sigma_loss_fn is not one of the supported values.

forecast(z)#

Inference-only prediction (no gradients).

For "hybrid_sqrt" the scale is squared to convert from √|r| space back to residual magnitude space.

Return type:

dict[str, Tensor]

Returns:

Dict with "loc" and "scale" in prediction space.

forward(x)#

Predict mean and scale.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_dim).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple (mu, sigma) each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch distribution for interval generation.

Returns Laplace(μ, σ) for "laplace" and Normal(μ, σ) for all other modes, treating σ as an empirical scale estimate. For "hybrid_sqrt" callers must pass the forecast-adjusted (squared) σ.

Parameters:
  • mu (Tensor) – Mean tensor of shape (B, H, O).

  • sigma (Tensor) – Scale tensor in prediction space, same shape.

Return type:

Normal | Laplace

Returns:

Normal or Laplace distribution.

get_log_likelihood(mu, sigma, targets)#

Compute the sigma calibration loss (NLL for probabilistic modes).

Parameters:
  • mu (Tensor) – Predicted mean.

  • sigma (Tensor) – Predicted scale in learned space (as returned by forward).

  • targets (Tensor) – Ground-truth targets.

Return type:

Tensor

Returns:

Scalar loss tensor.

step(z, y, metric_fn, epoch=None)#

Stage-1 training step: optimise mean parameters.

Parameters:
  • z (Tensor) – Latent tensor from backbone.

  • y (Tensor) – Target values.

  • metric_fn (Callable[..., Tensor]) – Callable (pred, target) scalar.

  • epoch (int | None) – Current epoch (unused; kept for API uniformity).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple (loss, metric).

step_sigma(z, y, metric_fn)#

Stage-2 training step: optimise sigma with frozen backbone.

The mean is recomputed under no_grad so sigma learns to predict residual magnitude without influencing point predictions.

Parameters:
  • z (Tensor) – Latent tensor from backbone.

  • y (Tensor) – Target values.

  • metric_fn (Callable[..., Tensor]) – Callable (pred, target) scalar.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple (sigma_loss, metric).

class twiga.distributions.nn.residual_conformal.AdditiveCRCDistribution(num_target_output=1, hidden_size=None, hidden_dim=256, forecast_horizon=48, out_activation_function=None, sigma_loss_fn='hybrid_sqrt', alpha=0.5, sigma_dropout=0.05, activation='SiLU')#

Bases: CRCDistribution

CRC head for additive backbones (MLPGAMNetwork, MLPGAFNetwork).

Inherits all training and inference logic from CRCDistribution. Two architectural differences from the parent:

  1. mu_layer — identity (or optional activation): the backbone’s encode() already returns the additive mean as (B, H*O), so no linear projection is needed.

  2. sigma_layerResidualSigmaHead (two-layer MLP + LayerNorm + Dropout): a deeper network better suited to modeling conditional heteroscedasticity from the additive mean signal.

All other methods — step, step_sigma, forecast, get_distribution, get_log_likelihood — are inherited unchanged.

Parameters:
  • num_target_output (int) – Number of output features per time step.

  • hidden_dim (int) – Hidden dimension of the sigma MLP. Independent of H*O.

  • forecast_horizon (int) – Number of forecast time steps.

  • out_activation_function (Module | None) – Optional activation applied to the mean.

  • sigma_loss_fn (str) – Calibration objective (see CRCDistribution). Default: "hybrid_sqrt".

  • alpha (float) – MSE/L1 weight. Default: 0.5.

  • sigma_dropout (float) – Dropout rate in the sigma MLP.

  • activation (str) – Activation name for the sigma MLP. Default: "SiLU".

Raises:

ValueError – If sigma_loss_fn is not one of the supported values.

forward(x)#

Predict mean and scale.

Parameters:

x (Tensor) – Flattened additive mean from backbone, shape (B, H*O).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple (mu, sigma) each of shape (B, forecast_horizon, num_target_output).

Custom loss functions#

class twiga.distributions.nn.custom_loss.QuantileLoss(kappa=0.0, reduction='mean')#

Bases: Module

Quantile loss module for quantile regression.

Computes the pinball loss between predicted quantiles and target values.

Variables:
  • kappa – Smoothing parameter for softplus approximation.

  • reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.

Parameters:
  • kappa (float) – Smoothing parameter for softplus approximation (default: 0.0).

  • reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘mean’).

__init__(kappa=0.0, reduction='mean')#

Initialize the QuantileLoss module.

Parameters:
  • kappa (float) – Smoothing parameter for softplus approximation (default: 0.0).

  • reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘mean’).

Raises:

ValueError – If kappa < 0 or reduction is invalid.

forward(inputs, quantiles, targets)#

Compute the quantile loss.

Parameters:
  • inputs (Tensor) – Predicted quantiles of shape (B, N) where B is batch size and N is number of quantiles.

  • quantiles (Tensor) – Quantile levels of shape (N,).

  • targets (Tensor) – Ground truth targets of shape (B, 1) or (B, N).

Return type:

Tensor

Returns:

The computed quantile loss tensor, reduced according to the specified method.

class twiga.distributions.nn.custom_loss.QuantileHuberLoss(kappa=1.0, eps=1e-08, reduction='mean')#

Bases: Module

Quantile Huber loss module for robust quantile regression.

Computes the Huber loss adjusted for quantiles, less sensitive to outliers than L2 loss.

Variables:
  • kappa – Threshold for switching between L2 and L1 loss.

  • eps – Small value to prevent division by zero.

  • reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.

Parameters:
  • kappa (float) – Threshold for switching between L2 and L1 loss (default: 1.0).

  • eps (float) – Small value to prevent division by zero (default: 1e-8).

  • reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘mean’).

__init__(kappa=1.0, eps=1e-08, reduction='mean')#

Initialize the QuantileHuberLoss module.

Parameters:
  • kappa (float) – Threshold for switching between L2 and L1 loss (default: 1.0).

  • eps (float) – Small value to prevent division by zero (default: 1e-8).

  • reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘mean’).

Raises:

ValueError – If kappa <= 0 or eps <= 0.

forward(inputs, quantiles, targets)#

Compute the quantile Huber loss.

Parameters:
  • inputs (Tensor) – Predicted values of shape (N, Q, T, C).

  • quantiles (Tensor) – Quantile levels of shape (Q,).

  • targets (Tensor) – True target values of shape (N, Q, T, C).

Return type:

Tensor

Returns:

The computed quantile Huber loss tensor, reduced according to the specified method.

class twiga.distributions.nn.custom_loss.QuantileProposalLoss(reduction='none')#

Bases: Module

Quantile proposal loss module to ensure non-crossing quantiles.

Computes a loss to enforce that predicted quantiles adhere to specified levels and do not cross.

Variables:

reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.

Parameters:

reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘none’).

__init__(reduction='none')#

Initialize the QuantileProposalLoss module.

Parameters:

reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘none’).

forward(quantile, quantile_hats, taus)#

Compute the quantile proposal loss.

Parameters:
  • quantile (Tensor) – True quantile values of shape (N, Q, T, C).

  • quantile_hats (Tensor) – Predicted quantile values of shape (N, Q, T, C).

  • taus (Tensor) – Quantile levels of shape (N, Q, T).

Return type:

Tensor

Returns:

The computed quantile proposal loss tensor, reduced according to the specified method.

class twiga.distributions.nn.custom_loss.SQDAProposalLoss(qdf_exponent=-1.0, divergence='squared', tail_weight=2.0, tail_alpha=0.2, eps=1e-06, reduction='mean')#

Bases: Module

Spectral Quantile Density-Adaptive (SQDA) proposal loss.

Trains the quantile proposal network by minimising the distance between the proposal’s interval widths \(\Delta\tau_n\) and the theoretically optimal density-adaptive target widths \(\Delta\tau^*_n \propto q(\hat\tau_n)^{-1}\), where \(q(\tau) = Q'(\tau)\) is the quantile density function (QDF).

Theoretical motivation

For a grid of \(N\) evaluation points, the \(L^p\) approximation error of the step-function quantile estimator satisfies:

\[E_n^{L^p} = q(\hat\tau_n)^p \cdot \frac{(\Delta\tau_n)^{p+1}}{p+1}\]

Minimising \(\sum_n E_n^{L^p}\) subject to \(\sum_n \Delta\tau_n = 1\) yields (via Lagrange multipliers) the same criterion for all \(p\):

\[\Delta\tau^*_n \propto q(\hat\tau_n)^{-1}\]

For heavy-tailed distributions \(q(\tau)\) is large at the tails (\(\tau \to 0, 1\)) and small near the median. The optimal widths are therefore smaller at the tails (denser grid) and wider in the interior — exactly the tail-adaptive property that the existing Wasserstein-1 signal cannot enforce directly.

QDF estimation

The QDF is approximated from the current (detached) quantile predictions \(\hat{Q}_n\) via central finite differences:

\[\hat{q}_n \approx \frac{\hat{Q}_{n+1} - \hat{Q}_{n-1}}{\hat\tau_{n+1} - \hat\tau_{n-1}}\]

with one-sided differences at the boundaries.

Gradient flow

Target widths \(\Delta\tau^*\) are computed from detached quantities only — no gradient flows to the value network. The loss gradient flows exclusively through \(\Delta\tau_{\text{prop}} = \tau_{n+1} - \tau_n\) (boundaries with gradient) to the proposal parameters, preserving the same gradient isolation as the Wasserstein-1 loss.

Parameters:
  • qdf_exponent (float) – Exponent applied to the QDF when computing target widths. Default -1.0 gives the theoretically optimal \(q^{-1}\) target.

  • divergence (Literal['squared', 'kl', 'symkl']) – Divergence used to measure the gap between proposal and target widths. One of 'squared' (MSE, default), 'kl' (\(\text{KL}(\text{target}\,\|\,\text{proposal})\)), or 'symkl' (symmetrised KL).

  • tail_weight (float) – Up-weight applied to tail quantile slots for the 'squared' divergence only (default 2.0). Slots with \(\hat\tau < \text{tail\_alpha}\) or \(\hat\tau > 1 - \text{tail\_alpha}\) receive this weight; all others receive weight 1. Not applied for 'kl' or 'symkl' because a non-uniform slot weight breaks KL non-negativity.

  • tail_alpha (float) – Boundary between tail and interior regions (default 0.2).

  • eps (float) – Small constant for numerical stability (default 1e-6).

  • reduction (str) – 'none', 'mean', or 'sum' (default 'mean').

Shape conventions (same as QuantileProposalLoss):
  • quantile: (B, N-1, H, D)not used (kept for interface compat).

  • quantile_hats: (B, N, H, D) — detached \(\hat{Q}\) at midpoints.

  • taus: (B, N+1, H) — tau boundaries with gradient from proposal.

forward(quantile, quantile_hats, taus)#

Compute the SQDA proposal loss.

Parameters:
  • quantile (Tensor) – Unused; shape (B, N-1, H, D). Kept for interface compatibility with QuantileProposalLoss.

  • quantile_hats (Tensor) – Detached quantile predictions at midpoints, shape (B, N, H, D).

  • taus (Tensor) – Tau boundaries with gradient from the proposal network, shape (B, N+1, H).

Return type:

Tensor

Returns:

Scalar loss (or per-element tensor when reduction='none').

ML parametric models#

class twiga.models.ml.gausscatboost_model.GAUSSCATBOOSTConfig(**data)#

Bases: CATBOOSTConfig

Configuration for the Gaussian CatBoost probabilistic model.

Extends CATBOOSTConfig with:

  • Tighter hyperparameter search bounds for faster convergence.

  • od_type / od_wait for built-in overfitting detection.

  • virtual_ensembles_count to control the number of virtual ensemble snapshots used to estimate epistemic uncertainty.

Variables:
  • name – Fixed to "gausscatboost".

  • od_type – Overfitting detection strategy ("Iter" or "IncToDec"). Active only when an eval set is provided to GAUSSCATBOOSTModel.fit().

  • od_wait – Number of iterations without improvement before early stopping triggers.

  • virtual_ensembles_count – Number of virtual ensemble snapshots used by predict_with_uncertainty(). Not passed to CatBoostRegressor (excluded from model_dump).

domain: Literal['ml']#
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: Literal['gausscatboost']#
od_type: Literal['Iter', 'IncToDec']#
od_wait: int#
search_space: BaseSearchSpace#
virtual_ensembles_count: int#
class twiga.models.ml.gausscatboost_model.GAUSSCATBOOSTModel(model_config=None)#

Bases: BaseRegressor

Probabilistic CatBoost model predicting mean and uncertainty.

Uses loss_function="RMSEWithUncertainty" which maximises the log-likelihood of a Normal distribution, jointly learning the mean μ and aleatoric scale σ in a single model.

For multi-horizon forecasting (H output steps), one CatBoostRegressor is trained per output, yielding H models. This avoids MultiOutputRegressor while preserving support for virtual_ensembles_predict.

Uncertainty decomposition via predict_with_uncertainty() returns three components per output step:

  • mean - predicted mean (same as predict() μ output).

  • knowledge_uncertainty - epistemic (model) uncertainty estimated from variance across virtual ensemble snapshots.

  • data_uncertainty - aleatoric uncertainty encoded in the model’s second output (exp of predicted log-σ).

Parameters:

model_config (GAUSSCATBOOSTConfig | None) – Model configuration. Defaults to GAUSSCATBOOSTConfig.

Example:

model = GAUSSCATBOOSTModel()
model.fit(X_train, y_train)
mu, sigma = model.predict(X_test)
unc = model.predict_with_uncertainty(X_test)  # (B, L, H, 3)
fit(X, y, eval_set=None, verbose=False, trial=None)#

Fit one RMSEWithUncertainty model per output step.

Parameters:
  • X (ndarray) – Shape (B, L, F) - batch × sequence × features.

  • y (ndarray) – Shape (B, L, H) - batch × sequence × horizons.

  • eval_set (tuple[ndarray, ndarray] | None) – Optional (X_val, y_val) for early stopping. When provided, od_type / od_wait from the config activate CatBoost’s overfitting detector.

  • verbose (bool) – Whether to print CatBoost training logs.

  • trial (Any | None) – Optuna trial for mid-training pruning. When provided alongside eval_set, a CatBoostPruningCallback is injected into the first per-output model so Hyperband can prune unpromising trials early.

Return type:

GAUSSCATBOOSTModel

Returns:

Self for method chaining.

Raises:

ValueError – If X or y are not 3-dimensional.

models: list[catboost.CatBoostRegressor]#
num_targets: int | None#
predict(X)#

Predict mean (μ) and scale (σ) for all output steps.

Parameters:

X (ndarray) – Shape (B, L, F).

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple (mu, sigma) each of shape (B, L, H). sigma = exp(log_sigma) is guaranteed positive.

Raises:

ValueError – If the model has not been fitted or X is not 3-D.

predict_with_uncertainty(X)#

Return mean, epistemic, and aleatoric uncertainty per output.

Uses virtual_ensembles_predict(prediction_type="TotalUncertainty") which - for models trained with RMSEWithUncertainty - returns three values per sample:

  • Column 0: mean prediction (μ).

  • Column 1: knowledge (epistemic) uncertainty - variance across virtual ensemble snapshots.

  • Column 2: data (aleatoric) uncertainty - derived from the predicted log-σ output.

Parameters:

X (ndarray) – Shape (B, L, F).

Return type:

ndarray

Returns:

Array of shape (B, L, H, 3) - last axis is [mean, knowledge_uncertainty, data_uncertainty].

Raises:

ValueError – If the model has not been fitted.

set_fit_request(*, eval_set='$UNCHANGED$', trial='$UNCHANGED$', verbose='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for eval_set parameter in fit.

trialstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for trial parameter in fit.

verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for verbose parameter in fit.

Returns#

selfobject

The updated object.

set_predict_request()#

No-op.

Calling this method has no effect.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

update(trial)#

Rebuild model from an Optuna trial’s suggested hyperparameters.

Parameters:

trial – Optuna trial object.

Return type:

None

virtual_ensembles_count: int#
class twiga.models.ml.ngboostnormal_model.NGBOOSTNORMALConfig(**data)#

Bases: BaseModelConfig

Configuration for the NGBoost Normal probabilistic forecasting model.

Uses natural gradient boosting with a Gaussian predictive distribution N(μ, σ²). Unlike the two-stage GAUSS* models, NGBoost jointly optimises μ and σ via the natural gradient of the chosen scoring rule, which tends to produce better-calibrated uncertainty estimates.

Variables:
  • name – Model identifier fixed to "ngboostnormal".

  • domain – Domain fixed to "ml". Excluded from tuning.

  • n_estimators – Number of boosting iterations.

  • learning_rate – Shrinkage applied to each tree.

  • minibatch_frac – Row-subsample fraction per iteration.

  • col_sample – Column-subsample fraction per iteration.

  • seed – Seed for reproducibility.

  • score – Scoring rule - "LogScore" (MLE) or "CRPScore".

  • search_space – Hyperparameter search space for Optuna tuning.

Example

>>> from twiga.models.ml import NGBOOSTNORMALConfig
>>> cfg = NGBOOSTNORMALConfig(n_estimators=200, learning_rate=0.05)
col_sample: float#
domain: Literal['ml']#
learning_rate: float#
minibatch_frac: float#
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_estimators: int#
name: Literal['ngboostnormal']#
score: Literal['LogScore', 'CRPScore']#
search_space: BaseSearchSpace#
seed: int#
class twiga.models.ml.ngboostnormal_model.NGBOOSTNORMALModel(model_config=None)#

Bases: BaseNGBoostRegressor

NGBoost probabilistic model with a Normal (Gaussian) predictive distribution.

Wraps ngboost.NGBRegressor with Dist=Normal, training one regressor per flattened output column to support multi-horizon and multi-target forecasting.

The model jointly learns the conditional mean μ and standard deviation σ via natural gradient boosting on the selected scoring rule (log-likelihood or CRPS). Unlike GAUSSCATBOOSTModel, which uses CatBoost’s native RMSEWithUncertainty loss, NGBoost directly maximises the scoring rule via natural gradients.

Variables:

model_config – Instance of NGBOOSTNORMALConfig.

Parameters:

model_config (NGBOOSTNORMALConfig | None) – Model configuration. Defaults to NGBOOSTNORMALConfig.

Example

>>> import numpy as np
>>> from twiga.models.ml import NGBOOSTNORMALModel
>>> model = NGBOOSTNORMALModel()
>>> X = np.random.randn(100, 10, 5)
>>> y = np.random.randn(100, 48, 1)
>>> model.fit(X, y)
>>> loc, scale = model.predict(X)
>>> loc.shape, scale.shape
((100, 48, 1), (100, 48, 1))
forecast(x)#

Return Normal distribution parameters.

Parameters:

x (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

dict

Returns:

Dictionary with

  • "loc": conditional mean μ, shape (n_samples, horizon, n_targets).

  • "scale": conditional std-dev σ, shape (n_samples, horizon, n_targets).

predict(X)#

Predict μ and σ for each forecast horizon.

Parameters:

X (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple (loc, scale) - conditional mean μ and standard deviation σ, each of shape (n_samples, horizon, n_targets).

set_fit_request(*, eval_set='$UNCHANGED$', trial='$UNCHANGED$', verbose='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for eval_set parameter in fit.

trialstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for trial parameter in fit.

verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for verbose parameter in fit.

Returns#

selfobject

The updated object.

set_predict_request()#

No-op.

Calling this method has no effect.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

class twiga.models.ml.ngboostlognormal_model.NGBOOSTLOGNORMALConfig(**data)#

Bases: BaseModelConfig

Configuration for the NGBoost LogNormal probabilistic forecasting model.

Uses natural gradient boosting with a log-normal predictive distribution. Suitable for strictly positive targets such as solar irradiance, wind speed, electricity price, or non-negative load.

Follows the scipy.stats.lognorm parameter convention:

  • scale = exp(μ_log) - geometric mean (location-like quantity).

  • s = σ_log - standard deviation in log-space (shape parameter).

Variables:
  • name – Model identifier fixed to "ngboostlognormal".

  • domain – Domain fixed to "ml". Excluded from tuning.

  • n_estimators – Number of boosting iterations.

  • learning_rate – Shrinkage applied to each tree.

  • minibatch_frac – Row-subsample fraction per iteration.

  • col_sample – Column-subsample fraction per iteration.

  • seed – Seed for reproducibility.

  • score – Scoring rule - "LogScore" (MLE) or "CRPScore".

  • search_space – Hyperparameter search space for Optuna tuning.

Example

>>> from twiga.models.ml import NGBOOSTLOGNORMALConfig
>>> cfg = NGBOOSTLOGNORMALConfig(n_estimators=300, score="CRPScore")
col_sample: float#
domain: Literal['ml']#
learning_rate: float#
minibatch_frac: float#
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_estimators: int#
name: Literal['ngboostlognormal']#
score: Literal['LogScore', 'CRPScore']#
search_space: BaseSearchSpace#
seed: int#
class twiga.models.ml.ngboostlognormal_model.NGBOOSTLOGNORMALModel(model_config=None)#

Bases: BaseNGBoostRegressor

NGBoost probabilistic model with a LogNormal predictive distribution.

Wraps ngboost.NGBRegressor with Dist=LogNormal, training one regressor per flattened output column for multi-horizon / multi-target support.

Parameter convention (follows scipy.stats.lognorm):

  • loc (exposed as "loc") - scale = exp(μ_log), the geometric mean.

  • scale (exposed as "scale") - s = σ_log, the log-space std-dev.

Use this model when the target variable is strictly positive and its logarithm is approximately Gaussian (e.g. solar irradiance, wind power, non-negative energy prices).

Variables:

model_config – Instance of NGBOOSTLOGNORMALConfig.

Parameters:

model_config (NGBOOSTLOGNORMALConfig | None) – Model configuration. Defaults to NGBOOSTLOGNORMALConfig.

Example

>>> import numpy as np
>>> from twiga.models.ml import NGBOOSTLOGNORMALModel
>>> model = NGBOOSTLOGNORMALModel()
>>> X = np.random.randn(100, 10, 5)
>>> y = np.abs(np.random.randn(100, 48, 1)) + 0.1  # strictly positive
>>> model.fit(X, y)
>>> loc, scale = model.predict(X)
>>> loc.shape, scale.shape
((100, 48, 1), (100, 48, 1))
forecast(x)#

Return LogNormal distribution parameters.

Parameters:

x (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

dict

Returns:

Dictionary with

  • "loc": geometric mean exp(μ_log), always > 0, shape (n_samples, horizon, n_targets).

  • "scale": log-space std-dev σ_log, shape (n_samples, horizon, n_targets).

predict(X)#

Predict geometric mean and log-space std-dev for each forecast horizon.

Parameters:

X (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple (loc, scale) where –

  • loc = exp(μ_log) - geometric mean (strictly positive), shape (n_samples, horizon, n_targets).

  • scale = σ_log - standard deviation in log-space, shape (n_samples, horizon, n_targets).

set_fit_request(*, eval_set='$UNCHANGED$', trial='$UNCHANGED$', verbose='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for eval_set parameter in fit.

trialstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for trial parameter in fit.

verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for verbose parameter in fit.

Returns#

selfobject

The updated object.

set_predict_request()#

No-op.

Calling this method has no effect.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

class twiga.models.ml.ngboostexponential_model.NGBOOSTEXPONENTIALConfig(**data)#

Bases: BaseModelConfig

Configuration for the NGBoost Exponential probabilistic forecasting model.

Uses natural gradient boosting with an exponential predictive distribution. The exponential distribution has a single parameter scale = 1 / λ, where λ is the rate. Both the mean and the standard deviation equal scale.

Suited for non-negative targets exhibiting exponential decay or inter-arrival times - e.g. rare demand events, sparse consumption spikes, or inter-event durations in energy systems.

Variables:
  • name – Model identifier fixed to "ngboostexponential".

  • domain – Domain fixed to "ml". Excluded from tuning.

  • n_estimators – Number of boosting iterations.

  • learning_rate – Shrinkage applied to each tree.

  • minibatch_frac – Row-subsample fraction per iteration.

  • col_sample – Column-subsample fraction per iteration.

  • seed – Seed for reproducibility.

  • score – Scoring rule - "LogScore" (MLE) or "CRPScore".

  • search_space – Hyperparameter search space for Optuna tuning.

Example

>>> from twiga.models.ml import NGBOOSTEXPONENTIALConfig
>>> cfg = NGBOOSTEXPONENTIALConfig(n_estimators=400, score="CRPScore")
col_sample: float#
domain: Literal['ml']#
learning_rate: float#
minibatch_frac: float#
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_estimators: int#
name: Literal['ngboostexponential']#
score: Literal['LogScore', 'CRPScore']#
search_space: BaseSearchSpace#
seed: int#
class twiga.models.ml.ngboostexponential_model.NGBOOSTEXPONENTIALModel(model_config=None)#

Bases: BaseNGBoostRegressor

NGBoost probabilistic model with an Exponential predictive distribution.

Wraps ngboost.NGBRegressor with Dist=Exponential, training one regressor per flattened output column for multi-horizon / multi-target support.

The exponential distribution has one parameter, scale = 1 / λ, which equals both the mean and the standard deviation. Both "loc" and "scale" in the returned forecast dict are set to this parameter.

Use this model for non-negative targets with memoryless, decay-like behaviour - inter-arrival durations, sparse demand spikes, or short-term outage durations.

Variables:

model_config – Instance of NGBOOSTEXPONENTIALConfig.

Parameters:

model_config (NGBOOSTEXPONENTIALConfig | None) – Model configuration. Defaults to NGBOOSTEXPONENTIALConfig.

Example

>>> import numpy as np
>>> from twiga.models.ml import NGBOOSTEXPONENTIALModel
>>> model = NGBOOSTEXPONENTIALModel()
>>> X = np.random.randn(100, 10, 5)
>>> y = np.random.exponential(scale=2.0, size=(100, 48, 1))
>>> model.fit(X, y)
>>> loc, scale = model.predict(X)
>>> loc.shape, scale.shape
((100, 48, 1), (100, 48, 1))
forecast(x)#

Return Exponential distribution parameters.

Parameters:

x (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

dict

Returns:

Dictionary with

  • "loc": predicted scale = 1/λ (the mean), always > 0, shape (n_samples, horizon, n_targets).

  • "scale": same as "loc" - for the exponential distribution, mean and std-dev are identical.

predict(X)#

Predict the exponential rate parameter for each forecast horizon.

Parameters:

X (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple (loc, scale) where both arrays equal the predicted scale = 1 / λ (the mean and std-dev of the exponential distribution), each of shape (n_samples, horizon, n_targets).

set_fit_request(*, eval_set='$UNCHANGED$', trial='$UNCHANGED$', verbose='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for eval_set parameter in fit.

trialstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for trial parameter in fit.

verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for verbose parameter in fit.

Returns#

selfobject

The updated object.

set_predict_request()#

No-op.

Calling this method has no effect.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

ML distribution utilities#

twiga.distributions.ml.utils.interpolate_quantile(predictions, sorted_quantiles, target_quantile=0.5)#

Interpolate predictions at a target quantile using linear interpolation.

Parameters:
  • predictions (ndarray[tuple[Any, ...], dtype[double]]) – Array of predictions with shape (B, Q, C), where B is batch size, Q is number of quantiles, and C is number of channels.

  • sorted_quantiles (list[float] | ndarray[tuple[Any, ...], dtype[double]]) – Sorted list or array of quantile levels (e.g., [0.1, 0.3, 0.7]).

  • target_quantile (float) – The target quantile to interpolate (default: 0.5 for median).

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Returns:

Interpolated predictions at target_quantile with shape (B, C).

Raises:

ValueError – If sorted_quantiles is empty, predictions has incorrect shape, number of quantiles does not match predictions, quantiles are not sorted, or target_quantile is outside the range of sorted_quantiles.

twiga.distributions.ml.utils.get_median_prediction(predictions, quantiles)#

Compute the median prediction (quantile = 0.5) for given predictions.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Returns:

Median predictions with shape (B, C).

Raises:

ValueError – If quantiles is empty, predictions has incorrect shape, or number of quantiles does not match predictions.

twiga.distributions.ml.utils.get_sigma_prediction(predictions, quantiles)#

Compute sigma using predictions at quantiles 0.25 and 0.75.

Sigma is calculated as (upper_prediction - lower_prediction) / (2 * 0.6745), where upper_prediction and lower_prediction are at quantiles 0.75 and 0.25, respectively, and 0.6745 is the z-score for the 25th/75th percentiles.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Returns:

Sigma predictions with shape (B, C).

Raises:

ValueError – If quantiles is empty, predictions has incorrect shape, number of quantiles does not match predictions, or interpolation is not possible for quantiles 0.25 or 0.75.