Distribution Families#

Source Files
  • twiga/distributions/nn/parametric.py

  • twiga/distributions/nn/quantile.py

  • twiga/distributions/nn/fpquantile.py

  • twiga/distributions/nn/residual_conformal.py

  • twiga/models/nn/prob/core.py

  • twiga/models/ml/gausscatboost_model.py - GAUSSCATBOOSTConfig / GAUSSCATBOOSTModel

  • twiga/models/ml/ngboostnormal_model.py - NGBOOSTNORMALConfig / NGBOOSTNORMALModel

  • twiga/models/ml/ngboostlognormal_model.py - NGBOOSTLOGNORMALConfig / NGBOOSTLOGNORMALModel

  • twiga/models/ml/ngboostexponential_model.py - NGBOOSTEXPONENTIALConfig / NGBOOSTEXPONENTIALModel

  • twiga/models/ml/prob/base_ngboost.py - BaseNGBoostRegressor

  • twiga/distributions/ml/utils.py

Twiga implements several distribution families for probabilistic neural forecasting. All NN distributions slot into the composable backbone/head architecture described below.

Backbone/Head Architecture#

Probabilistic neural models in Twiga decouple the backbone (which learns to encode the time series) from the distribution head (which maps the latent vector to a forecast). This separation lives in twiga/models/nn/prob/core.py.

        graph LR
    X["Input (B, T, F)"] --> BB["Backbone\n(NHITSNetwork / MLPFNetwork / …)"]
    BB -->|"encode(x) → z (B, encode_size)"| HEAD["Distribution Head\n(Normal / QR / …)"]
    HEAD --> OUT["Distribution Parameters\n(loc, scale) / quantiles / …"]
    

Backbone contract - any backbone must implement:

  • encode(x) Tensor - maps raw input (B, T, F) to latent (B, encode_size)

  • encode_size: int property - width of the latent vector

  • penalty_dict(epoch) dict - returns regularisation terms (Lasso, gate sparsity …)

Head contract - any distribution head must implement:

  • forward(z) tuple - maps latent z to distribution parameters

  • step(z, y, metric_fn, epoch) (loss, metric) - computes training loss in one call

ProbabilisticModel (a LightningModule) wires them together and automatically injects hidden_size=backbone.encode_size into the head constructor so callers never specify it manually.

# Under the hood  -  TwigaForecaster does this for you when you pass a config like MLPFNormalConfig
from twiga.models.nn.prob.core import ProbabilisticModel
from twiga.models.nn.net.mlpf import MLPFNetwork
from twiga.distributions.nn.parametric import NormalDistribution

model = ProbabilisticModel(
    backbone_cls=MLPFNetwork,
    backbone_kwargs={...},          # num_target_feature, forecast_horizon, etc.
    head_cls=NormalDistribution,
    head_kwargs={"num_target_output": 1, "forecast_horizon": 48},
    # hidden_size injected automatically from backbone.encode_size
)

Adding a new distribution requires implementing only the head interface - the backbone and training loop are unchanged.

NN Distributions#

Parametric Distributions#

All parametric heads live in twiga/distributions/nn/parametric.py. Each maps the latent vector z (B, encode_size) to two (or three for Student-T) distribution parameters and trains via negative log-likelihood.

Class

Distribution

Output params

Constraint

Best for

NormalDistribution

Normal

(μ, σ)

σ > 0

Symmetric, unbounded targets

LaplaceDistribution

Laplace

(μ, b)

b > 0

Heavy-tailed, outlier-robust

LogNormalDistribution

LogNormal

(μ, σ)

σ > 0

Strictly positive, right-skewed

GammaDistribution

Gamma

(α, β)

α, β > 0

Strictly positive, flexible skew

BetaDistribution

Beta

(α, β)

α, β > 0

Bounded [0, 1] targets

StudentTDistribution

Student-T

(μ, σ, ν)

σ > 0, ν > 2

Very heavy tails

All share the same interface:

from twiga.distributions.nn.parametric import NormalDistribution

dist = NormalDistribution(
    num_target_output=1,
    hidden_size=256,      # injected automatically by ProbabilisticModel
    forecast_horizon=48,
    out_activation_function=None,
)

# Forward pass → distribution parameters
loc, scale = dist(z)  # z shape: (B, encode_size)

# Get PyTorch distribution object
d = dist.get_distribution(loc, scale)

# Negative log-likelihood for training
nll = dist.get_log_likelihood(loc, scale, targets)

# training_step equivalent
loss, metric = dist.step(z, targets, metric_fn, epoch)

Student-T requires min_df

StudentTDistribution accepts an additional min_df: float parameter (default 2.1) that clips the predicted degrees-of-freedom from below to ensure finite variance.

Quantile Distribution#

The QRDistribution class (twiga/distributions/nn/quantile.py) outputs quantile predictions from a neural network.

from twiga.distributions.nn.quantile import QRDistribution

dist = QRDistribution(
    quantiles=[0.05, 0.25, 0.5, 0.75, 0.95],
    num_outputs=1,
    hidden_size=256,
    horizon=48,
    loss_fn="pinball",    # or "huber-pinball"
    kappa=0.5,
    conf_level=0.05,
)

# Forward pass: shape (batch, num_quantiles, horizon, num_outputs)
quantile_values = dist(hidden_repr)

# Forecast: returns (loc, quantile_values)
loc, quantiles = dist.forecast(hidden_repr)

Parameter

Type

Default

Description

quantiles

list[float] | None

None

Quantile levels to predict

num_outputs

int

1

Number of target variables

hidden_size

int

256

Input hidden dimension

horizon

int

48

Forecast horizon

loss_fn

str

"pinball"

Loss function: "pinball" or "huber-pinball"

kappa

float

0.5

Huber transition parameter

conf_level

float

0.05

Confidence level for bounds

eps

float

1e-6

Numerical stability

output_activation

nn.Module | None

None

Optional output activation

The step() method computes quantile loss and metrics during training:

loss, metric = dist.step(batch, metric_fn, epoch)

Median Extraction#

from twiga.distributions.nn.quantile import get_median_quantile

# Extract median from quantile predictions
median = get_median_quantile(quantile_hats, probs=[0.05, 0.25, 0.5, 0.75, 0.95])
# Shape: (batch, horizon, num_targets)

If \(\tau = 0.5\) is in the quantile list, it is extracted directly. Otherwise, linear interpolation between the two closest quantiles is used.

Full Parameterised Quantile Regression (FPQR)#

FPQRDistribution (twiga/distributions/nn/fpquantile.py) extends quantile regression by treating the quantile levels themselves as learnable. Rather than predicting at a fixed grid of quantiles, a QuantileProposal network proposes the quantile levels adaptively from the latent representation, which are then encoded via CosinetauEmbedding before producing forecast values.

References

  • A. Faustine and A. Moshi, “Full Parameterized Quantile Function for Probabilistic Forecasting,” IEEE Access, 2024. DOI: 10.1109/ACCESS.2024.3402001

  • A. Faustine and L. Aversano, “Adaptive Quantile Learning for Long-Term Probabilistic Forecasting,” IEEE Transactions, 2022. DOI: 10.1109/ACCESS.2022.3143387

        graph LR
    Z["z (B, hidden_size)"] --> QP["QuantileProposal\n→ taus, tau_hats, entropies"]
    Z --> CE["CosinetauEmbedding\n(cosine τ embedding)"]
    QP --> CE
    CE --> QO["quantile_output\n→ (B, n_quantiles, horizon, outputs)"]
    QP --> LOSS["QuantileProposalLoss\n+ QuantileLoss − entropy"]
    

QuantileProposal generates the quantile grid from the backbone’s latent vector:

Parameter

Type

Default

Description

n_quantiles

int

10

Number of quantile levels to propose

z_dim

int

64

Input latent dimension

dropout

float

0.1

Dropout for regularisation

conf_level

float

0.05

Clamps proposals to [α/2, 1 α/2]

CosinetauEmbedding maps proposed tau values to a fixed-size embedding using cosine basis functions:

Parameter

Type

Default

Description

num_cosines

int

32

Number of cosine basis functions

z_dim

int

128

Output embedding dimension

num_outputs

int

48

Scales input size (should match horizon)

FPQRDistribution combines proposal + cosine embedding + output projection:

Parameter

Type

Default

Description

n_quantiles

int | None

9

Quantile levels (None → 9)

num_outputs

int

1

Number of target variables

hidden_size

int

256

Latent dimension from backbone

horizon

int

48

Forecast horizon

dropout

float

0.1

Dropout in proposal network

conf_level

float

0.05

Significance level for interval bounds

kappa

float

0.25

Huber transition parameter

num_cosines

int

32

Cosine basis functions (0 to use DataEmbedding)

loss_fn

str

"pinball"

"pinball" or "huber-pinball"

from twiga.distributions.nn.fpquantile import FPQRDistribution

dist = FPQRDistribution(
    n_quantiles=9,
    num_outputs=1,
    hidden_size=256,
    horizon=48,
    conf_level=0.05,
    loss_fn="pinball",
)

# Training step: loss = quantile_loss + proposal_loss − entropy
loss, metric = dist.step(z, y, metric_fn, epoch)

# Inference: returns dict with loc, quantiles, quantile_levels, conf_level
result = dist.forecast(z)
# result["loc"]             shape: (B, horizon, num_outputs)
# result["quantiles"]       shape: (B, n_quantiles, horizon, num_outputs)
# result["quantile_levels"] shape: (B, n_quantiles, horizon)  ← per-sample proposal

FPQR vs QR

QRDistribution predicts at a fixed, user-specified quantile grid. FPQRDistribution learns the quantile levels jointly with the forecast values, adapting the grid per sample. The proposal loss plus entropy regularisation encourages diverse, well-spread coverage. Use FPQR when you want the model to discover the most informative quantile levels automatically.

The training loss combines three terms:

\[\mathcal{L} = \mathcal{L}_{\text{quantile}} + \mathcal{L}_{\text{proposal}} - \mathbb{H}(\text{probs})\]

where \(\mathbb{H}\) is the entropy of the proposed quantile distribution, acting as a diversity regulariser.

Conditional Residual Calibration (CRC)#

CRC heads (twiga/distributions/nn/residual_conformal.py) learn a point forecast \(\mu\) and a per-step scale \(\sigma\) end-to-end — no post-hoc conformal calibration step required. The scale MLP is trained to approximate the absolute residual \(|y - \mu|\), giving prediction intervals of the form \([\mu \pm z_\alpha \sigma]\) at inference time.

Two variants exist, differing only in how \(\mu\) is computed:

Class

Backbone family

Mu path

CRCDistribution

MLPF, NHiTS, RNN

Linear projection from latent z (B, hidden_size)(B, H×O)

AdditiveCRCDistribution

MLPGAM, MLPGAF

encode() already returns the additive mean directly — no projection, preserving the GAM decomposition

Loss function:

The joint training objective is:

\[\mathcal{L} = \underbrace{\alpha \cdot \text{MSE}(\mu, y) + (1-\alpha) \cdot \text{MAE}(\mu, y)}_{\mathcal{L}_\mu} + \underbrace{\mathcal{L}_\sigma(\sigma,\, |y-\mu|)}_{\text{sigma calibration}}\]

The sigma calibration loss \(\mathcal{L}_\sigma\) is selected via sigma_loss_fn:

sigma_loss_fn

Formula

Use when

"hybrid_sqrt" (default)

\(\alpha \cdot \text{MSE}(\sigma, \sqrt{r}) + (1-\alpha) \cdot \text{MAE}(\sigma, \sqrt{r})\)

General purpose; variance-stabilised residual

"hybrid"

\(\alpha \cdot \text{MSE}(\sigma, r) + (1-\alpha) \cdot \text{MAE}(\sigma, r)\)

General purpose; same \(\alpha\) as mu loss

"gaussian"

\(\frac{1}{2}\left(\frac{r^2}{\sigma^2} + \log \sigma^2\right)\)

Heteroscedastic Gaussian NLL

"laplace"

\(\frac{r}{\sigma} + \log(2\sigma)\)

Robust to heavy-tailed residuals

"mse"

\((\sigma - r)^2\)

Distribution-free MSE on residual magnitude

"log"

\(\log(1 + (\sigma - r)^2)\)

Robust MSE with log-damping of large errors

where \(r = |y - \mu|\).

Two-stage training (MLPGAMCRC default, controlled by two_stage=True):

        graph LR
    S1["Stage 1\nBackbone + mu_layer\n(sigma_layer frozen)"] --> S2["Stage 2\nSigma MLP only\n(backbone frozen)"]
    

In stage 1 the backbone and mu path are updated via the combined \(\mathcal{L}_\mu + \mathcal{L}_\sigma\) loss. In stage 2 only sigma_layer parameters receive gradients — the backbone is frozen by Lightning’s toggle_optimizer. This prevents the scale head from distorting the point forecast.

from twiga.distributions.nn.residual_conformal import CRCDistribution, AdditiveCRCDistribution

# Standard backbone (MLPF, NHiTS)
crc = CRCDistribution(
    num_target_output=1,
    hidden_size=256,         # injected from backbone.encode_size
    forecast_horizon=48,
    sigma_loss_fn="hybrid",  # "hybrid" | "gaussian" | "laplace" | "mse"
    alpha=0.1,
)
mu, sigma = crc(z)                           # z: (B, hidden_size)
loss = crc.get_log_likelihood(mu, sigma, y)  # sigma calibration loss only (not point loss)

# Sigma-only step (stage 2 of two-stage training)
loss, metric = crc.step_sigma(z, y, metric_fn)

# Additive backbone (MLPGAM, MLPGAF) — mu_layer is Identity, no linear projection
add_crc = AdditiveCRCDistribution(
    num_target_output=1,
    hidden_dim=256,          # internal sigma MLP width (independent of H*O)
    forecast_horizon=48,
)
mu, sigma = add_crc(z)      # z shape (B, H*O); mu = reshape(z), sigma from ResidualSigmaHead

CRC vs post-hoc conformal

Post-hoc conformal prediction (see Conformal Prediction) wraps any already-trained model and provides finite-sample coverage guarantees. CRC bakes the calibration objective directly into training — no separate calibration split needed — but the coverage guarantee is empirical rather than distribution-free. CRC typically produces tighter intervals when the signal-to-noise ratio is high.

ML Parametric Models#

Twiga provides two families of parametric ML models that output full distribution parameters rather than just point forecasts.

GaussCatBoost#

GAUSSCATBOOSTModel uses CatBoost’s built-in RMSEWithUncertainty loss, which jointly learns the mean μ and aleatoric scale σ in a single model. One CatBoostRegressor is trained per output step, and virtual ensemble snapshots provide epistemic uncertainty decomposition.

Model

Name

Config Class

Output

Gaussian CatBoost

gausscatboost

GAUSSCATBOOSTConfig

(μ, σ) per horizon step

GAUSSCATBOOSTConfig extends CATBOOSTConfig with:

Field

Type

Default

Description

name

Literal["gausscatboost"]

"gausscatboost"

Model identifier.

od_type

Literal["Iter", "IncToDec"]

"Iter"

Overfitting detection strategy; active only when an eval set is provided to fit().

od_wait

int

20

Iterations without improvement before early stopping triggers.

virtual_ensembles_count

int

10

Virtual ensemble snapshots for epistemic uncertainty estimation via predict_with_uncertainty(). Excluded from parameter dumps.

from twiga.models.ml.gausscatboost_model import GAUSSCATBOOSTConfig

config = GAUSSCATBOOSTConfig(
    iterations=500,
    learning_rate=0.05,
    task_type="CPU",
    virtual_ensembles_count=20,
)

Uncertainty decomposition

predict_with_uncertainty() returns three components per output step: mean (predicted μ), knowledge_uncertainty (epistemic — variance across virtual ensemble snapshots), and data_uncertainty (aleatoric — the model’s second output, exp of predicted log-σ).

NGBoost#

NGBoost (Natural Gradient Boosting) fits all distribution parameters simultaneously via the natural gradient of a proper scoring rule. Unlike the two-stage GAUSSCATBOOSTModel, which fits mean and variance separately, NGBoost optimises both jointly. All models extend BaseNGBoostRegressor and train one NGBRegressor per flattened output column.

Model

Name

Config Class

Distribution

Best for

NGBoost Normal

ngboostnormal

NGBOOSTNORMALConfig

Normal N(μ, σ²)

Symmetric, unbounded targets

NGBoost LogNormal

ngboostlognormal

NGBOOSTLOGNORMALConfig

LogNormal

Strictly positive, right-skewed targets

NGBoost Exponential

ngboostexponential

NGBOOSTEXPONENTIALConfig

Exponential (1/λ)

Non-negative, memoryless processes

NGBoost vs two-stage Gaussian

NGBoost jointly optimises μ and σ via natural gradient, producing better-calibrated intervals. The two-stage GAUSSCATBOOSTModel (RMSEWithUncertainty) is faster and scales better on very large datasets.

NGBOOSTNORMALModel#

Fits a Gaussian distribution N(μ, σ²) jointly optimising mean and standard deviation.

NGBOOSTNORMALConfig fields:

Field

Type

Default

Description

name

Literal["ngboostnormal"]

"ngboostnormal"

Model identifier.

domain

Literal["ml"]

"ml"

Domain identifier.

n_estimators

int

500

Number of boosting rounds.

learning_rate

float

0.01

Shrinkage applied to each boosting step.

minibatch_frac

float

1.0

Fraction of training data used per round.

col_sample

float

1.0

Fraction of features sampled per tree.

random_state

int

42

Random seed.

score

Literal["LogScore", "CRPScore"]

"LogScore"

Scoring rule for training objective.

Output: (loc, scale) = (mean μ, standard deviation σ).

from twiga.models.ml.ngboostnormal_model import NGBOOSTNORMALConfig

config = NGBOOSTNORMALConfig(n_estimators=300, learning_rate=0.05)

NGBOOSTLOGNORMALModel#

Fits a LogNormal distribution, appropriate for strictly positive targets such as solar irradiance or non-negative electricity prices.

NGBOOSTLOGNORMALConfig shares all fields with NGBOOSTNORMALConfig with name="ngboostlognormal".

Output: (loc, scale) = (geometric mean exp(μ_log), log-space standard deviation σ_log).

Interpreting LogNormal output

loc is the geometric mean (the exponentiated log-space mean), not the arithmetic mean. To recover the arithmetic mean use exp(log(loc) + scale² / 2). For prediction intervals, exponentiate the log-space quantiles.

from twiga.models.ml.ngboostlognormal_model import NGBOOSTLOGNORMALConfig

config = NGBOOSTLOGNORMALConfig(n_estimators=300, score="CRPScore")

NGBOOSTEXPONENTIALModel#

Fits an Exponential distribution parameterised by its scale (1/λ). Suitable for non-negative targets with a memoryless property.

NGBOOSTEXPONENTIALConfig shares all fields with NGBOOSTNORMALConfig with name="ngboostexponential".

Output: Both loc and scale equal the predicted scale parameter 1/λ (the Exponential distribution has a single parameter).

Single-parameter distribution

The Exponential distribution is fully characterised by its scale (1/λ), so both loc and scale in the output tuple carry the same value. Mean and standard deviation are both equal to 1/λ.

from twiga.models.ml.ngboostexponential_model import NGBOOSTEXPONENTIALConfig

config = NGBOOSTEXPONENTIALConfig(n_estimators=400)

LogScore vs CRPScore

LogScore maximises the predictive log-likelihood — faster and often better-calibrated for in-distribution data. CRPScore (Continuous Ranked Probability Score) is more robust to distributional shift and outliers but typically slower to converge.

ML Distribution Utilities#

The twiga/distributions/ml/utils.py module provides utilities for post-processing quantile predictions from ML models.

interpolate_quantile()#

Linearly interpolates to extract a specific quantile from a set of predicted quantiles.

from twiga.distributions.ml.utils import interpolate_quantile

# predictions shape: (batch, num_quantiles, num_targets)
q30 = interpolate_quantile(
    predictions=predictions,
    sorted_quantiles=[0.05, 0.25, 0.5, 0.75, 0.95],
    target_quantile=0.3,
)
# Output shape: (batch, num_targets)

get_median_prediction()#

Extracts the median prediction (quantile 0.5).

from twiga.distributions.ml.utils import get_median_prediction

median = get_median_prediction(predictions, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])

get_sigma_prediction()#

Estimates the standard deviation from the interquartile range:

\[\sigma = \frac{Q_{0.75} - Q_{0.25}}{2 \times 0.6745}\]
from twiga.distributions.ml.utils import get_sigma_prediction

sigma = get_sigma_prediction(predictions, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])

Note

The constant 0.6745 is the 75th percentile of the standard normal distribution. This conversion assumes approximate normality of the residuals.

How Distributions Connect to Models#

        graph TD
    subgraph "Point Models"
        A[CatBoost / XGBoost / LightGBM / MLPF / NHITS] --> B["predict() → loc array"]
    end

    subgraph "Parametric NN Models"
        C["ProbabilisticModel\n(backbone + head)"] --> D["predict() → (loc, scale, …)"]
        D --> E["NormalDistribution / LaplaceDistribution\nGammaDistribution / BetaDistribution / …"]
    end

    subgraph "Quantile Models"
        G[QRCatBoost / QRXGBoost / QRLightGBM] --> H["predict() → quantile array"]
        I["MLPFQR / NHITSQRModel / …"] --> H
        H --> J[QRDistribution / FPQRDistribution]
    end

    subgraph "ML Parametric (Gaussian)"
        K["GAUSSCATBOOSTModel"] --> L["predict() → (μ, σ)"]
    end

    B --> CP[Conformal Prediction]
    D --> CP
    H --> CP
    L --> CP
    CP --> PI[Prediction Intervals]
    

Point models produce a single prediction per time step and can be wrapped with split conformal prediction.

Parametric NN models (via ProbabilisticModel) produce full distribution parameters and train via NLL. They can be used stand-alone or combined with conformal residual fitting.

Quantile models produce multiple quantile predictions and can be used with conformal quantile regression or taken directly as prediction intervals.

ML Usage Example#

import pandas as pd
from sklearn.preprocessing import StandardScaler

from twiga.core.config import DataPipelineConfig, ForecasterConfig
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.gausscatboost_model import GAUSSCATBOOSTConfig
from twiga.models.ml.ngboostnormal_model import NGBOOSTNORMALConfig

data_config = DataPipelineConfig(
    target_feature="load_mw",
    period="1h",
    lookback_window_size=168,
    forecast_horizon=24,
    lags=[1, 24, 168],
    input_scaler=StandardScaler(),
)

train_config = ForecasterConfig(
    split_freq="months",
    train_size=6,
    test_size=1,
    window="expanding",
    project_name="GaussianForecast",
)

# Gaussian CatBoost — fast, single model via RMSEWithUncertainty
gauss_config = GAUSSCATBOOSTConfig(task_type="CPU", virtual_ensembles_count=10)

# NGBoost Normal — joint optimisation of μ and σ
ngboost_config = NGBOOSTNORMALConfig(n_estimators=300, learning_rate=0.05)

forecaster = TwigaForecaster(
    data_params=data_config,
    model_params=[gauss_config],
    train_params=train_config,
)

forecaster.fit(train_df=train_df)

interval_dict, _ = forecaster.predict_interval(test_df=test_df)
for model_name, (lower, point, upper) in interval_dict.items():
    print(f"{model_name}: lower={lower.shape}, point={point.shape}, upper={upper.shape}")

API Reference#

Parametric distribution base#

class twiga.distributions.nn.parametric.BaseDistribution(num_target_output, hidden_size, forecast_horizon)#

Bases: ABC, Module

Abstract base for probabilistic output heads.

Subclasses must implement forward(), get_distribution(), and get_log_likelihood().

All heads share the same input contract:

x: (B, hidden_size) - latent representation from encoder/decoder.

All heads share the same output contract for forecasting tensors:

shape (B, forecast_horizon, num_target_output).

forecast(z)#

Inference-mode forward (no gradient).

Parameters:

z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

Return type:

dict[str, Tensor]

Returns:

Dict with "loc" (location/mean) and "scale" (std/dispersion). The first forward() output is always loc; the second is scale.

abstractmethod forward(x)#

Map latent x → distribution parameters.

Return type:

tuple[Tensor, ...]

abstractmethod get_distribution(*params)#

Construct a torch Distribution from predicted parameters.

Return type:

Distribution

abstractmethod get_log_likelihood(*params, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

step(z, y, metric_fn, epoch=None)#

Compute NLL loss and metric for one training/validation step.

Parameters:
  • z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

  • y (Tensor) – Target tensor of shape (B, forecast_horizon, num_target_output).

  • metric_fn (Callable[..., Any]) – Callable returning a scalar metric given (pred, target).

  • epoch (int | None) – Current epoch (unused here; available for subclasses).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (nll_loss, metric).

Parametric heads — standard backbones#

class twiga.distributions.nn.parametric.NormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Normal (Gaussian) output head.

Best for: symmetric, unbounded targets - energy demand, temperature.

Parameters predicted:

mu - mean (unconstrained) sigma - std dev (exp of log_scale, strictly positive)

Parameters:
  • num_target_output (int) – Number of output features per time step.

  • hidden_size (int) – Dimensionality of the input latent vector.

  • forecast_horizon (int) – Number of forecast time steps.

  • out_activation_function (Module | None) – Optional activation applied to mu. Defaults to Identity.

forward(x)#

Predict mu and sigma.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, sigma), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch Distribution from predicted parameters.

Return type:

Normal

get_log_likelihood(mu, sigma, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.LaplaceDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Laplace output head.

Best for: targets with heavier tails than Normal, robust to outliers - electricity prices, wind speed residuals.

Parameters predicted:

mu - location (unconstrained) scale - scale (exp of log_scale, strictly positive)

forward(x)#

Predict mu and scale.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, scale), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, scale)#

Construct a torch Distribution from predicted parameters.

Return type:

Laplace

get_log_likelihood(mu, scale, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.LogNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Log-Normal output head.

Best for: strictly positive, right-skewed targets - renewable generation, gas prices, load with zero floor.

Parameters predicted:

mu - mean of log(target) (unconstrained) sigma - std of log(target) (exp of log_scale, strictly positive)

Note

Targets must be strictly positive. Apply a small epsilon shift in the data pipeline if zeros are possible (e.g., target = max(y, 1e-6)).

forward(x)#

Predict mu and sigma of the underlying Normal (in log space).

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, sigma), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch Distribution from predicted parameters.

Return type:

LogNormal

get_log_likelihood(mu, sigma, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.GammaDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48)#

Bases: BaseDistribution

Gamma output head.

Best for: strictly positive targets with flexible skew - solar irradiance, wind power, load at aggregate level.

Parameters predicted:

concentration - shape parameter α (softplus, strictly positive) rate - rate parameter β (softplus, strictly positive)

Note

Mean of Gamma(α, β) = α/β. Targets must be strictly positive.

forward(x)#

Predict concentration and rate.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (concentration, rate), each of shape (B, forecast_horizon, num_target_output).

get_distribution(concentration, rate)#

Construct a torch Distribution from predicted parameters.

Return type:

Gamma

get_log_likelihood(concentration, rate, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.BetaDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48)#

Bases: BaseDistribution

Beta output head.

Best for: bounded [0, 1] targets - capacity factors, state of charge, fill rates, normalised demand ratios.

Parameters predicted:

alpha - first shape parameter (softplus, strictly positive) beta - second shape parameter (softplus, strictly positive)

Note

Targets must lie strictly in (0, 1). Apply a clamp in the data pipeline if boundary values are possible: target = target.clamp(1e-6, 1 - 1e-6)

forward(x)#

Predict alpha and beta.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (alpha, beta), each of shape (B, forecast_horizon, num_target_output).

get_distribution(alpha, beta)#

Construct a torch Distribution from predicted parameters.

Return type:

Beta

get_log_likelihood(alpha, beta, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.StudentTDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, min_df=2.1)#

Bases: BaseDistribution

Student-T output head.

Best for: targets with very heavy tails and potential outliers - financial returns, spot electricity prices.

Parameters predicted:

mu - location (unconstrained) sigma - scale (exp of log_scale, strictly positive) df - degrees of freedom (softplus, clipped to ≥ 2 for finite variance)

Note

Degrees of freedom are predicted per-sample but averaged across the forecast horizon so the model learns a single df per series.

forecast(z)#

Inference-mode forward (no gradient).

Parameters:

z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

Return type:

dict[str, Tensor]

Returns:

Dict with "loc" (location/mean) and "scale" (std/dispersion). The first forward() output is always loc; the second is scale.

forward(x)#

Predict mu, sigma, and degrees of freedom.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size).

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

Tuple of (mu, sigma, df) – - mu, sigma: shape (B, forecast_horizon, num_target_output) - df: shape (B, 1, num_target_output), broadcast-compatible

get_distribution(mu, sigma, df)#

Construct a torch Distribution from predicted parameters.

Return type:

StudentT

get_log_likelihood(mu, sigma, df, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

Parametric heads — additive backbones (MLPGAM / MLPGAF)#

These variants receive the pre-summed additive mean directly from the backbone instead of a latent vector, preserving the GAM decomposition.

class twiga.distributions.nn.parametric.AdditiveNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Normal output head preserving additive backbone structure.

For use with MLPGAMNetwork and MLPGAFNetwork, whose encode() already returns the additive mean directly as a flat (B, H*O) vector. The mean is taken as-is (no projection) to honour the additive decomposition; only the scale is learned via a separate linear layer.

Parameters predicted:

mu - additive mean (reshaped directly from latent, no projection) sigma - std dev (exp of log_scale, strictly positive)

Parameters:
  • num_target_output (int) – Number of output features per time step.

  • hidden_size (int) – Must equal forecast_horizon * num_target_output.

  • forecast_horizon (int) – Number of forecast time steps.

  • out_activation_function (Module | None) – Optional activation applied to mu. Defaults to Identity.

forward(x)#

Predict mu and sigma.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size) - the additive mean from the backbone.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, sigma), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch Distribution from predicted parameters.

Return type:

Normal

get_log_likelihood(mu, sigma, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.AdditiveLaplaceDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Laplace output head preserving additive backbone structure.

Analogous to AdditiveNormalDistribution but with Laplace tails. Best for: heavy-tailed, outlier-robust targets - electricity prices, wind residuals.

Parameters predicted:

mu - additive location (reshaped directly from latent, no projection) scale - scale (exp of log_scale, strictly positive)

forward(x)#

Predict mu and scale.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size) - the additive location from the backbone.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, scale), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, scale)#

Construct a torch Distribution from predicted parameters.

Return type:

Laplace

get_log_likelihood(mu, scale, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.AdditiveLogNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#

Bases: BaseDistribution

Log-Normal output head preserving additive backbone structure.

Best for: strictly positive, right-skewed targets - renewable generation, gas prices.

Parameters predicted:

mu - additive log-space mean (reshaped directly from latent, no projection) sigma - std of log(target) (exp of log_scale, strictly positive)

Note

Targets must be strictly positive.

forward(x)#

Predict mu and sigma of the underlying Normal (in log space).

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size) - the additive log-mean from the backbone.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (mu, sigma), each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch Distribution from predicted parameters.

Return type:

LogNormal

get_log_likelihood(mu, sigma, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

class twiga.distributions.nn.parametric.AdditiveStudentTDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, min_df=2.1)#

Bases: BaseDistribution

Student-T output head preserving additive backbone structure.

Best for: very heavy tails - spot electricity prices, financial returns.

Parameters predicted:

mu - additive location (reshaped directly from latent, no projection) sigma - scale (exp of log_scale, strictly positive) df - degrees of freedom (softplus, clipped to ≥ min_df for finite variance)

forecast(z)#

Inference-mode forward (no gradient).

Parameters:

z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

Return type:

dict[str, Tensor]

Returns:

Dict with "loc" (location/mean) and "scale" (std/dispersion). The first forward() output is always loc; the second is scale.

forward(x)#

Predict mu, sigma, and degrees of freedom.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_size) - the additive location from the backbone.

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

Tuple of (mu, sigma, df) – - mu, sigma: shape (B, forecast_horizon, num_target_output) - df: shape (B, 1, num_target_output), broadcast-compatible

get_distribution(mu, sigma, df)#

Construct a torch Distribution from predicted parameters.

Return type:

StudentT

get_log_likelihood(mu, sigma, df, targets)#

Return mean negative log-likelihood over the batch.

Return type:

Tensor

Quantile regression distribution#

class twiga.distributions.nn.quantile.QRDistribution(quantiles=None, num_outputs=1, hidden_size=256, horizon=48, eps=1e-06, kappa=0.5, output_activation=None, conf_level=0.05, loss_fn='pinball', crossing_penalty=10.0)#

Bases: Module

QRNetwork is a neural network for forecasting quantiles using a quantile value network.

Parameters:
  • quantiles (list[float] | None) – List of quantiles to forecast. Default is None.

  • num_outputs (int) – Number of output features. Default is 1.

  • latent_size (int) – Size of the hidden layers. Default is 256.

  • horizon (int) – Number of time steps to forecast. Default is 48.

  • crossing_penalty (float) – Penalaise crossing quantile and ensure monotonicity.

  • out_activation_function (nn.Module) – Activation function to use in the output layer. Default is nn.Identity().

forecast(input_tensor)#

Forecast quantiles for the given input tensor.

Parameters:

input_tensor (Tensor) – Input tensor of shape (batch_size, z_dim).

Return type:

dict

Returns:

torch.Tensor

Forecasted quantiles of shape

(batch_size, n_quantiles, horizon, num_outputs).

forward(x)#

Forward pass producing raw (unconstrained) quantile values.

Outputs K quantile predictions directly from the linear head without structural monotonicity enforcement. Non-crossing is encouraged during training via the crossing_penalty term in step():

loss = pinball_loss + λ · non_crossing_loss(Q̂)

where non_crossing_loss penalises any pair (k, k+1) where Q̂(τₖ) > Q̂(τₖ₊₁). The quantile_value_layer is initialised with small uniform weights to reduce crossing violations at the start of training.

Parameters:

x (torch.Tensor) – Latent features of shape (B, hidden_size).

Returns:

torch.Tensor

Estimated quantile values Q(τⱼ | x),

shape (B, K, H, D) where K = len(self.taus).

step(z, y, metric_fn, epoch=None)#

Perform a single training/validation step.

Parameters:
  • z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

  • y (Tensor) – Target tensor of shape (B, forecast_horizon, num_outputs).

  • metric_fn (Callable[..., Any]) – Callable returning a scalar metric given (pred, target).

  • epoch (int | None) – Current epoch (unused; accepted for interface consistency).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (quantile_loss, metric).

twiga.distributions.nn.quantile.get_median_quantile(quantile_hats, probs)#

Compute the median (0.5 quantile) from a quantile tensor of shape (B, N, T, C).

Parameters:
  • quantile_hats (Tensor) – Tensor of shape (B, N, T, C) containing quantile values.

  • probs (Tensor | list) – Quantile probabilities (e.g., [0.1, 0.3, 0.5, 0.7, 0.9]).

Return type:

Tensor

Returns:

torch.Tensor – Median values of shape (B, T, C).

Raises:

ValueError – If input shapes are invalid or interpolation is not possible.

FPQR components#

class twiga.distributions.nn.fpquantile.QuantileProposal(n_quantiles=10, z_dim=64, dropout=0.1, conf_level=0.05, n_outputs=1, crossing_penalty=10.0)#

Bases: Module

A neural network module for proposing confidence quantiles.

This module generates quantile estimates from input features using a linear layer, dropout, and softmax normalization. It computes cumulative probabilities (taus), midpoints (tau_hats), and entropies, ensuring quantiles stay within specified bounds based on a significance level (conf_level).

Variables:
  • n_quantiles – Number of quantiles to propose.

  • z_dim – Dimensionality of the input features.

  • conf_level – Significance level tensor for quantile bounds.

  • net – Linear layer transforming input features to quantile logits.

  • dropout – Dropout layer for regularization.

  • tau_0 – Initial tau value buffer (zeros).

Parameters:
  • n_quantiles (int) – Number of quantiles to propose (default: 10).

  • z_dim (int) – Dimensionality of the input features (default: 64).

  • dropout (float) – Dropout rate for regularization (default: 0.1).

  • conf_level (float) – Significance level for quantile estimation (default: 0.05).

__init__(n_quantiles=10, z_dim=64, dropout=0.1, conf_level=0.05, n_outputs=1, crossing_penalty=10.0)#

Initialize the QuantileProposal module.

Parameters:
  • n_quantiles (int) – Number of quantiles to propose (default: 10).

  • z_dim (int) – Dimensionality of the input features (default: 64).

  • dropout (float) – Dropout rate for regularization (default: 0.1).

  • conf_level (float) – Significance level for quantile estimation (default: 0.05).

  • n_outputs (int) – num of output dimension.

  • crossing_penalty (float) – Penalaise crossing quantile and ensure monotonicity.

Raises:

ValueError – If n_quantiles <= 0, z_dim <= 0, dropout < 0, or conf_level not in (0, 1).

forward(z)#

Perform a forward pass to compute quantiles and related metrics.

Parameters:

z (Tensor) – Input tensor of shape (batch_size, z_dim).

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

Tuple containing – - taus: Cumulative probabilities, shape (batch_size, n_quantiles + 1, 1). - tau_hats: Midpoint quantiles, shape (batch_size, n_quantiles, 1). - entropies: Entropy of the probability distributions, shape (batch_size, 1).

Raises:

ValueError – If z does not have the expected shape (batch_size, z_dim) or if internal tensor shapes do not match expected dimensions.

class twiga.distributions.nn.fpquantile.CosinetauEmbedding(num_cosines=32, z_dim=128, num_outputs=48)#

Bases: Module

A PyTorch module for embedding time values using cosine transformations.

This module transforms time values into cosine-based features and embeds them into a higher-dimensional space using a linear layer and activation function.

Parameters:
  • num_cosines (int) – Number of cosine functions. Defaults to 32.

  • z_dim (int) – Output embedding dimension. Defaults to 128.

  • num_outputs (int) – Scaling factor for input size. Defaults to 48.

  • activation (nn.Module, optional) – Activation function. Defaults to nn.LeakyReLU().

forward(taus)#

Transforms time values into cosine-based embeddings.

Parameters:

taus (Tensor) – Input tensor of shape (batch_size, num_outputs, num_inputs).

Return type:

Tensor

Returns:

Tensor of shape (batch_size, num_outputs, z_dim) containing embedded time values.

class twiga.distributions.nn.fpquantile.FPQRDistribution(n_quantiles=9, num_outputs=1, hidden_size=256, horizon=48, dropout=0.1, conf_level=0.05, kappa=0.25, num_cosines=32, output_activation=None, loss_fn='pinball')#

Bases: Module

A neural network for forecasting quantiles using a quantile value network.

This module predicts quantiles for a forecasting task, combining a quantile proposal layer with a linear output layer to produce quantile forecasts.

Parameters:
  • n_quantiles (int | None) – Number of quantiles to forecast. If None, defaults to 9.

  • num_outputs (int) – Number of output features. Defaults to 1.

  • hidden_dim (int) – Size of the hidden layers. Defaults to 256.

  • horizon (int) – Number of time steps to forecast. Defaults to 48.

  • dropout (float) – Dropout rate for the quantile proposal layer. Defaults to 0.1.

  • conf_level (float) – Confidence level for quantile proposal. Defaults to 0.05.

  • output_activation (Module | None) – Activation function for the output layer. Defaults to nn.Identity().

forecast(input_tensor)#

Forecast quantiles for the given input tensor.

quantile_levels is the per-sample proposal grid averaged across the batch and horizon dimensions so that downstream consumers receive a fixed 1-D array of representative probability levels, matching the interface expected by ForecastResult.

Parameters:

input_tensor (Tensor) – Input tensor of shape (batch_size, z_dim).

Return type:

dict[str, Any]

Returns:

Dict with keys

  • "loc": expected value (weighted sum of quantiles), shape (B, horizon, num_outputs).

  • "quantiles": quantile forecasts, shape (B, n_quantiles, horizon, num_outputs).

  • "quantile_levels": representative 1-D numpy array of length n_quantiles (mean of tau_hats over batch and horizon).

  • "conf_level": significance level scalar.

forward(input_tensor)#

Forward pass of the quantile forecasting network.

Parameters:

input_tensor (Tensor) – Input tensor of shape (batch_size, z_dim).

Return type:

tuple[Tensor, ...]

Returns:

torch.Tensor

Forecasted quantiles of shape

(batch_size, n_quantiles, horizon, num_outputs).

step(z, y, metric_fn, epoch=None)#

Perform a single training/validation step.

Parameters:
  • z (Tensor) – Latent tensor of shape (B, hidden_size) from the backbone.

  • y (Tensor) – Target tensor of shape (B, forecast_horizon, num_outputs).

  • metric_fn (Callable[..., Any]) – Callable returning a scalar metric given (pred, target).

  • epoch (int | None) – Current epoch (unused; accepted for interface consistency).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple of (loss, metric).

CRC distribution classes#

class twiga.distributions.nn.residual_conformal.CRCDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, sigma_loss_fn='hybrid', alpha=0.1, activation='ReLU')#

Bases: BaseDistribution

CRC head for standard backbones (e.g. MLPFNetwork).

The mean is computed by projecting the latent vector z through a linear layer, then reshaping to (B, H, O). The scale is predicted by a sigma layer applied to the detached mean (flattened), ensuring that gradients do not flow back into the backbone during sigma-only training.

Training follows a two-stage protocol:

Stage 1 — step(): optimise μ (backbone + mu_layer), σ is not updated. Stage 2 — step_sigma(): freeze backbone, optimise σ-head only.

Parameters:
  • num_target_output (int) – Number of output features per time step.

  • hidden_dim – Dimensionality of the backbone latent vector.

  • forecast_horizon (int) – Number of forecast time steps.

  • out_activation_function (Module | None) – Optional activation applied to the mean output.

  • sigma_loss_fn (str) – Calibration objective for σ. One of _SIGMA_LOSS_FNS. Default: "hybrid".

  • alpha (float) – Weight for MSE vs L1 in the mu and hybrid sigma losses.

  • activation (str) – Unused; kept for API compatibility.

Raises:

ValueError – If sigma_loss_fn is not one of the supported values.

forecast(z)#

Inference-only prediction (no gradients).

For "hybrid_sqrt" the scale is squared to convert from √|r| space back to residual magnitude space.

Return type:

dict[str, Tensor]

Returns:

Dict with "loc" and "scale" in prediction space.

forward(x)#

Predict mean and scale.

Parameters:

x (Tensor) – Latent tensor of shape (B, hidden_dim).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple (mu, sigma) each of shape (B, forecast_horizon, num_target_output).

get_distribution(mu, sigma)#

Construct a torch distribution for interval generation.

Returns Laplace(μ, σ) for "laplace" and Normal(μ, σ) for all other modes, treating σ as an empirical scale estimate. For "hybrid_sqrt" callers must pass the forecast-adjusted (squared) σ.

Parameters:
  • mu (Tensor) – Mean tensor of shape (B, H, O).

  • sigma (Tensor) – Scale tensor in prediction space, same shape.

Return type:

Normal | Laplace

Returns:

Normal or Laplace distribution.

get_log_likelihood(mu, sigma, targets)#

Compute the sigma calibration loss (NLL for probabilistic modes).

Parameters:
  • mu (Tensor) – Predicted mean.

  • sigma (Tensor) – Predicted scale in learned space (as returned by forward).

  • targets (Tensor) – Ground-truth targets.

Return type:

Tensor

Returns:

Scalar loss tensor.

step(z, y, metric_fn, epoch=None)#

Stage-1 training step: optimise mean parameters.

Parameters:
  • z (Tensor) – Latent tensor from backbone.

  • y (Tensor) – Target values.

  • metric_fn (Callable[..., Tensor]) – Callable (pred, target) scalar.

  • epoch (int | None) – Current epoch (unused; kept for API uniformity).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple (loss, metric).

step_sigma(z, y, metric_fn)#

Stage-2 training step: optimise sigma with frozen backbone.

The mean is recomputed under no_grad so sigma learns to predict residual magnitude without influencing point predictions.

Parameters:
  • z (Tensor) – Latent tensor from backbone.

  • y (Tensor) – Target values.

  • metric_fn (Callable[..., Tensor]) – Callable (pred, target) scalar.

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple (sigma_loss, metric).

class twiga.distributions.nn.residual_conformal.AdditiveCRCDistribution(num_target_output=1, hidden_size=None, hidden_dim=256, forecast_horizon=48, out_activation_function=None, sigma_loss_fn='hybrid_sqrt', alpha=0.5, sigma_dropout=0.05, activation='SiLU')#

Bases: CRCDistribution

CRC head for additive backbones (MLPGAMNetwork, MLPGAFNetwork).

Inherits all training and inference logic from CRCDistribution. Two architectural differences from the parent:

  1. mu_layer — identity (or optional activation): the backbone’s encode() already returns the additive mean as (B, H*O), so no linear projection is needed.

  2. sigma_layerResidualSigmaHead (two-layer MLP + LayerNorm + Dropout): a deeper network better suited to modeling conditional heteroscedasticity from the additive mean signal.

All other methods — step, step_sigma, forecast, get_distribution, get_log_likelihood — are inherited unchanged.

Parameters:
  • num_target_output (int) – Number of output features per time step.

  • hidden_dim (int) – Hidden dimension of the sigma MLP. Independent of H*O.

  • forecast_horizon (int) – Number of forecast time steps.

  • out_activation_function (Module | None) – Optional activation applied to the mean.

  • sigma_loss_fn (str) – Calibration objective (see CRCDistribution). Default: "hybrid_sqrt".

  • alpha (float) – MSE/L1 weight. Default: 0.5.

  • sigma_dropout (float) – Dropout rate in the sigma MLP.

  • activation (str) – Activation name for the sigma MLP. Default: "SiLU".

Raises:

ValueError – If sigma_loss_fn is not one of the supported values.

forward(x)#

Predict mean and scale.

Parameters:

x (Tensor) – Flattened additive mean from backbone, shape (B, H*O).

Return type:

tuple[Tensor, Tensor]

Returns:

Tuple (mu, sigma) each of shape (B, forecast_horizon, num_target_output).

Custom loss functions#

class twiga.distributions.nn.custom_loss.QuantileLoss(kappa=0.0, reduction='mean')#

Bases: Module

Quantile loss module for quantile regression.

Computes the pinball loss between predicted quantiles and target values.

Variables:
  • kappa – Smoothing parameter for softplus approximation.

  • reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.

Parameters:
  • kappa (float) – Smoothing parameter for softplus approximation (default: 0.0).

  • reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘mean’).

__init__(kappa=0.0, reduction='mean')#

Initialize the QuantileLoss module.

Parameters:
  • kappa (float) – Smoothing parameter for softplus approximation (default: 0.0).

  • reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘mean’).

Raises:

ValueError – If kappa < 0 or reduction is invalid.

forward(inputs, quantiles, targets)#

Compute the quantile loss.

Parameters:
  • inputs (Tensor) – Predicted quantiles of shape (B, N) where B is batch size and N is number of quantiles.

  • quantiles (Tensor) – Quantile levels of shape (N,).

  • targets (Tensor) – Ground truth targets of shape (B, 1) or (B, N).

Return type:

Tensor

Returns:

The computed quantile loss tensor, reduced according to the specified method.

class twiga.distributions.nn.custom_loss.QuantileHuberLoss(kappa=1.0, eps=1e-08, reduction='mean')#

Bases: Module

Quantile Huber loss module for robust quantile regression.

Computes the Huber loss adjusted for quantiles, less sensitive to outliers than L2 loss.

Variables:
  • kappa – Threshold for switching between L2 and L1 loss.

  • eps – Small value to prevent division by zero.

  • reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.

Parameters:
  • kappa (float) – Threshold for switching between L2 and L1 loss (default: 1.0).

  • eps (float) – Small value to prevent division by zero (default: 1e-8).

  • reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘mean’).

__init__(kappa=1.0, eps=1e-08, reduction='mean')#

Initialize the QuantileHuberLoss module.

Parameters:
  • kappa (float) – Threshold for switching between L2 and L1 loss (default: 1.0).

  • eps (float) – Small value to prevent division by zero (default: 1e-8).

  • reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘mean’).

Raises:

ValueError – If kappa <= 0 or eps <= 0.

forward(inputs, quantiles, targets)#

Compute the quantile Huber loss.

Parameters:
  • inputs (Tensor) – Predicted values of shape (N, Q, T, C).

  • quantiles (Tensor) – Quantile levels of shape (Q,).

  • targets (Tensor) – True target values of shape (N, Q, T, C).

Return type:

Tensor

Returns:

The computed quantile Huber loss tensor, reduced according to the specified method.

class twiga.distributions.nn.custom_loss.QuantileProposalLoss(reduction='none')#

Bases: Module

Quantile proposal loss module to ensure non-crossing quantiles.

Computes a loss to enforce that predicted quantiles adhere to specified levels and do not cross.

Variables:

reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.

Parameters:

reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘none’).

__init__(reduction='none')#

Initialize the QuantileProposalLoss module.

Parameters:

reduction (str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘none’).

forward(quantile, quantile_hats, taus)#

Compute the quantile proposal loss.

Parameters:
  • quantile (Tensor) – True quantile values of shape (N, Q, T, C).

  • quantile_hats (Tensor) – Predicted quantile values of shape (N, Q, T, C).

  • taus (Tensor) – Quantile levels of shape (N, Q, T).

Return type:

Tensor

Returns:

The computed quantile proposal loss tensor, reduced according to the specified method.

ML parametric models#

class twiga.models.ml.gausscatboost_model.GAUSSCATBOOSTConfig(**data)#

Bases: CATBOOSTConfig

Configuration for the Gaussian CatBoost probabilistic model.

Extends CATBOOSTConfig with:

  • Tighter hyperparameter search bounds for faster convergence.

  • od_type / od_wait for built-in overfitting detection.

  • virtual_ensembles_count to control the number of virtual ensemble snapshots used to estimate epistemic uncertainty.

Variables:
  • name – Fixed to "gausscatboost".

  • od_type – Overfitting detection strategy ("Iter" or "IncToDec"). Active only when an eval set is provided to GAUSSCATBOOSTModel.fit().

  • od_wait – Number of iterations without improvement before early stopping triggers.

  • virtual_ensembles_count – Number of virtual ensemble snapshots used by predict_with_uncertainty(). Not passed to CatBoostRegressor (excluded from model_dump).

domain: Literal['ml']#
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: Literal['gausscatboost']#
od_type: Literal['Iter', 'IncToDec']#
od_wait: int#
search_space: BaseSearchSpace#
virtual_ensembles_count: int#
class twiga.models.ml.gausscatboost_model.GAUSSCATBOOSTModel(model_config=None)#

Bases: BaseRegressor

Probabilistic CatBoost model predicting mean and uncertainty.

Uses loss_function="RMSEWithUncertainty" which maximises the log-likelihood of a Normal distribution, jointly learning the mean μ and aleatoric scale σ in a single model.

For multi-horizon forecasting (H output steps), one CatBoostRegressor is trained per output, yielding H models. This avoids MultiOutputRegressor while preserving support for virtual_ensembles_predict.

Uncertainty decomposition via predict_with_uncertainty() returns three components per output step:

  • mean - predicted mean (same as predict() μ output).

  • knowledge_uncertainty - epistemic (model) uncertainty estimated from variance across virtual ensemble snapshots.

  • data_uncertainty - aleatoric uncertainty encoded in the model’s second output (exp of predicted log-σ).

Parameters:

model_config (GAUSSCATBOOSTConfig | None) – Model configuration. Defaults to GAUSSCATBOOSTConfig.

Example:

model = GAUSSCATBOOSTModel()
model.fit(X_train, y_train)
mu, sigma = model.predict(X_test)
unc = model.predict_with_uncertainty(X_test)  # (B, L, H, 3)
fit(X, y, eval_set=None, verbose=False)#

Fit one RMSEWithUncertainty model per output step.

Parameters:
  • X (ndarray) – Shape (B, L, F) - batch × sequence × features.

  • y (ndarray) – Shape (B, L, H) - batch × sequence × horizons.

  • eval_set (tuple[ndarray, ndarray] | None) – Optional (X_val, y_val) for early stopping. When provided, od_type / od_wait from the config activate CatBoost’s overfitting detector.

  • verbose (bool) – Whether to print CatBoost training logs.

Return type:

GAUSSCATBOOSTModel

Returns:

Self for method chaining.

Raises:

ValueError – If X or y are not 3-dimensional.

models: list[catboost.CatBoostRegressor]#
num_targets: int | None#
predict(X)#

Predict mean (μ) and scale (σ) for all output steps.

Parameters:

X (ndarray) – Shape (B, L, F).

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple (mu, sigma) each of shape (B, L, H). sigma = exp(log_sigma) is guaranteed positive.

Raises:

ValueError – If the model has not been fitted or X is not 3-D.

predict_with_uncertainty(X)#

Return mean, epistemic, and aleatoric uncertainty per output.

Uses virtual_ensembles_predict(prediction_type="TotalUncertainty") which - for models trained with RMSEWithUncertainty - returns three values per sample:

  • Column 0: mean prediction (μ).

  • Column 1: knowledge (epistemic) uncertainty - variance across virtual ensemble snapshots.

  • Column 2: data (aleatoric) uncertainty - derived from the predicted log-σ output.

Parameters:

X (ndarray) – Shape (B, L, F).

Return type:

ndarray

Returns:

Array of shape (B, L, H, 3) - last axis is [mean, knowledge_uncertainty, data_uncertainty].

Raises:

ValueError – If the model has not been fitted.

set_fit_request(*, eval_set='$UNCHANGED$', verbose='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for eval_set parameter in fit.

verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for verbose parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

update(trial)#

Rebuild model from an Optuna trial’s suggested hyperparameters.

Parameters:

trial – Optuna trial object.

Return type:

None

virtual_ensembles_count: int#
class twiga.models.ml.ngboostnormal_model.NGBOOSTNORMALConfig(**data)#

Bases: BaseModelConfig

Configuration for the NGBoost Normal probabilistic forecasting model.

Uses natural gradient boosting with a Gaussian predictive distribution N(μ, σ²). Unlike the two-stage GAUSS* models, NGBoost jointly optimises μ and σ via the natural gradient of the chosen scoring rule, which tends to produce better-calibrated uncertainty estimates.

Variables:
  • name – Model identifier fixed to "ngboostnormal".

  • domain – Domain fixed to "ml". Excluded from tuning.

  • n_estimators – Number of boosting iterations.

  • learning_rate – Shrinkage applied to each tree.

  • minibatch_frac – Row-subsample fraction per iteration.

  • col_sample – Column-subsample fraction per iteration.

  • random_state – Seed for reproducibility.

  • score – Scoring rule - "LogScore" (MLE) or "CRPScore".

  • search_space – Hyperparameter search space for Optuna tuning.

Example

>>> from twiga.models.ml import NGBOOSTNORMALConfig
>>> cfg = NGBOOSTNORMALConfig(n_estimators=200, learning_rate=0.05)
col_sample: float#
domain: Literal['ml']#
learning_rate: float#
minibatch_frac: float#
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_estimators: int#
name: Literal['ngboostnormal']#
random_state: int#
score: Literal['LogScore', 'CRPScore']#
search_space: BaseSearchSpace#
class twiga.models.ml.ngboostnormal_model.NGBOOSTNORMALModel(model_config=None)#

Bases: BaseNGBoostRegressor

NGBoost probabilistic model with a Normal (Gaussian) predictive distribution.

Wraps ngboost.NGBRegressor with Dist=Normal, training one regressor per flattened output column to support multi-horizon and multi-target forecasting.

The model jointly learns the conditional mean μ and standard deviation σ via natural gradient boosting on the selected scoring rule (log-likelihood or CRPS). Unlike GAUSSCATBOOSTModel, which uses CatBoost’s native RMSEWithUncertainty loss, NGBoost directly maximises the scoring rule via natural gradients.

Variables:

model_config – Instance of NGBOOSTNORMALConfig.

Parameters:

model_config (NGBOOSTNORMALConfig | None) – Model configuration. Defaults to NGBOOSTNORMALConfig.

Example

>>> import numpy as np
>>> from twiga.models.ml import NGBOOSTNORMALModel
>>> model = NGBOOSTNORMALModel()
>>> X = np.random.randn(100, 10, 5)
>>> y = np.random.randn(100, 48, 1)
>>> model.fit(X, y)
>>> loc, scale = model.predict(X)
>>> loc.shape, scale.shape
((100, 48, 1), (100, 48, 1))
forecast(x)#

Return Normal distribution parameters.

Parameters:

x (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

dict

Returns:

Dictionary with

  • "loc": conditional mean μ, shape (n_samples, horizon, n_targets).

  • "scale": conditional std-dev σ, shape (n_samples, horizon, n_targets).

predict(X)#

Predict μ and σ for each forecast horizon.

Parameters:

X (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple (loc, scale) - conditional mean μ and standard deviation σ, each of shape (n_samples, horizon, n_targets).

set_fit_request(*, eval_set='$UNCHANGED$', verbose='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for eval_set parameter in fit.

verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for verbose parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

class twiga.models.ml.ngboostlognormal_model.NGBOOSTLOGNORMALConfig(**data)#

Bases: BaseModelConfig

Configuration for the NGBoost LogNormal probabilistic forecasting model.

Uses natural gradient boosting with a log-normal predictive distribution. Suitable for strictly positive targets such as solar irradiance, wind speed, electricity price, or non-negative load.

Follows the scipy.stats.lognorm parameter convention:

  • scale = exp(μ_log) - geometric mean (location-like quantity).

  • s = σ_log - standard deviation in log-space (shape parameter).

Variables:
  • name – Model identifier fixed to "ngboostlognormal".

  • domain – Domain fixed to "ml". Excluded from tuning.

  • n_estimators – Number of boosting iterations.

  • learning_rate – Shrinkage applied to each tree.

  • minibatch_frac – Row-subsample fraction per iteration.

  • col_sample – Column-subsample fraction per iteration.

  • random_state – Seed for reproducibility.

  • score – Scoring rule - "LogScore" (MLE) or "CRPScore".

  • search_space – Hyperparameter search space for Optuna tuning.

Example

>>> from twiga.models.ml import NGBOOSTLOGNORMALConfig
>>> cfg = NGBOOSTLOGNORMALConfig(n_estimators=300, score="CRPScore")
col_sample: float#
domain: Literal['ml']#
learning_rate: float#
minibatch_frac: float#
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_estimators: int#
name: Literal['ngboostlognormal']#
random_state: int#
score: Literal['LogScore', 'CRPScore']#
search_space: BaseSearchSpace#
class twiga.models.ml.ngboostlognormal_model.NGBOOSTLOGNORMALModel(model_config=None)#

Bases: BaseNGBoostRegressor

NGBoost probabilistic model with a LogNormal predictive distribution.

Wraps ngboost.NGBRegressor with Dist=LogNormal, training one regressor per flattened output column for multi-horizon / multi-target support.

Parameter convention (follows scipy.stats.lognorm):

  • loc (exposed as "loc") - scale = exp(μ_log), the geometric mean.

  • scale (exposed as "scale") - s = σ_log, the log-space std-dev.

Use this model when the target variable is strictly positive and its logarithm is approximately Gaussian (e.g. solar irradiance, wind power, non-negative energy prices).

Variables:

model_config – Instance of NGBOOSTLOGNORMALConfig.

Parameters:

model_config (NGBOOSTLOGNORMALConfig | None) – Model configuration. Defaults to NGBOOSTLOGNORMALConfig.

Example

>>> import numpy as np
>>> from twiga.models.ml import NGBOOSTLOGNORMALModel
>>> model = NGBOOSTLOGNORMALModel()
>>> X = np.random.randn(100, 10, 5)
>>> y = np.abs(np.random.randn(100, 48, 1)) + 0.1  # strictly positive
>>> model.fit(X, y)
>>> loc, scale = model.predict(X)
>>> loc.shape, scale.shape
((100, 48, 1), (100, 48, 1))
forecast(x)#

Return LogNormal distribution parameters.

Parameters:

x (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

dict

Returns:

Dictionary with

  • "loc": geometric mean exp(μ_log), always > 0, shape (n_samples, horizon, n_targets).

  • "scale": log-space std-dev σ_log, shape (n_samples, horizon, n_targets).

predict(X)#

Predict geometric mean and log-space std-dev for each forecast horizon.

Parameters:

X (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple (loc, scale) where –

  • loc = exp(μ_log) - geometric mean (strictly positive), shape (n_samples, horizon, n_targets).

  • scale = σ_log - standard deviation in log-space, shape (n_samples, horizon, n_targets).

set_fit_request(*, eval_set='$UNCHANGED$', verbose='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for eval_set parameter in fit.

verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for verbose parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

class twiga.models.ml.ngboostexponential_model.NGBOOSTEXPONENTIALConfig(**data)#

Bases: BaseModelConfig

Configuration for the NGBoost Exponential probabilistic forecasting model.

Uses natural gradient boosting with an exponential predictive distribution. The exponential distribution has a single parameter scale = 1 / λ, where λ is the rate. Both the mean and the standard deviation equal scale.

Suited for non-negative targets exhibiting exponential decay or inter-arrival times - e.g. rare demand events, sparse consumption spikes, or inter-event durations in energy systems.

Variables:
  • name – Model identifier fixed to "ngboostexponential".

  • domain – Domain fixed to "ml". Excluded from tuning.

  • n_estimators – Number of boosting iterations.

  • learning_rate – Shrinkage applied to each tree.

  • minibatch_frac – Row-subsample fraction per iteration.

  • col_sample – Column-subsample fraction per iteration.

  • random_state – Seed for reproducibility.

  • score – Scoring rule - "LogScore" (MLE) or "CRPScore".

  • search_space – Hyperparameter search space for Optuna tuning.

Example

>>> from twiga.models.ml import NGBOOSTEXPONENTIALConfig
>>> cfg = NGBOOSTEXPONENTIALConfig(n_estimators=400, score="CRPScore")
col_sample: float#
domain: Literal['ml']#
learning_rate: float#
minibatch_frac: float#
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_estimators: int#
name: Literal['ngboostexponential']#
random_state: int#
score: Literal['LogScore', 'CRPScore']#
search_space: BaseSearchSpace#
class twiga.models.ml.ngboostexponential_model.NGBOOSTEXPONENTIALModel(model_config=None)#

Bases: BaseNGBoostRegressor

NGBoost probabilistic model with an Exponential predictive distribution.

Wraps ngboost.NGBRegressor with Dist=Exponential, training one regressor per flattened output column for multi-horizon / multi-target support.

The exponential distribution has one parameter, scale = 1 / λ, which equals both the mean and the standard deviation. Both "loc" and "scale" in the returned forecast dict are set to this parameter.

Use this model for non-negative targets with memoryless, decay-like behaviour - inter-arrival durations, sparse demand spikes, or short-term outage durations.

Variables:

model_config – Instance of NGBOOSTEXPONENTIALConfig.

Parameters:

model_config (NGBOOSTEXPONENTIALConfig | None) – Model configuration. Defaults to NGBOOSTEXPONENTIALConfig.

Example

>>> import numpy as np
>>> from twiga.models.ml import NGBOOSTEXPONENTIALModel
>>> model = NGBOOSTEXPONENTIALModel()
>>> X = np.random.randn(100, 10, 5)
>>> y = np.random.exponential(scale=2.0, size=(100, 48, 1))
>>> model.fit(X, y)
>>> loc, scale = model.predict(X)
>>> loc.shape, scale.shape
((100, 48, 1), (100, 48, 1))
forecast(x)#

Return Exponential distribution parameters.

Parameters:

x (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

dict

Returns:

Dictionary with

  • "loc": predicted scale = 1/λ (the mean), always > 0, shape (n_samples, horizon, n_targets).

  • "scale": same as "loc" - for the exponential distribution, mean and std-dev are identical.

predict(X)#

Predict the exponential rate parameter for each forecast horizon.

Parameters:

X (ndarray) – Input features of shape (n_samples, seq_len, n_features).

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple (loc, scale) where both arrays equal the predicted scale = 1 / λ (the mean and std-dev of the exponential distribution), each of shape (n_samples, horizon, n_targets).

set_fit_request(*, eval_set='$UNCHANGED$', verbose='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for eval_set parameter in fit.

verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for verbose parameter in fit.

Returns#

selfobject

The updated object.

set_score_request(*, sample_weight='$UNCHANGED$')#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters#

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns#

selfobject

The updated object.

ML distribution utilities#

twiga.distributions.ml.utils.interpolate_quantile(predictions, sorted_quantiles, target_quantile=0.5)#

Interpolate predictions at a target quantile using linear interpolation.

Parameters:
  • predictions (ndarray[tuple[Any, ...], dtype[double]]) – Array of predictions with shape (B, Q, C), where B is batch size, Q is number of quantiles, and C is number of channels.

  • sorted_quantiles (list[float] | ndarray[tuple[Any, ...], dtype[double]]) – Sorted list or array of quantile levels (e.g., [0.1, 0.3, 0.7]).

  • target_quantile (float) – The target quantile to interpolate (default: 0.5 for median).

Return type:

ndarray[tuple[Any, ...], dtype[double]]

Returns:

Interpolated predictions at target_quantile with shape (B, C).

Raises:

ValueError – If sorted_quantiles is empty, predictions has incorrect shape, number of quantiles does not match predictions, quantiles are not sorted, or target_quantile is outside the range of sorted_quantiles.

twiga.distributions.ml.utils.get_median_prediction(predictions, quantiles)#

Compute the median prediction (quantile = 0.5) for given predictions.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Returns:

Median predictions with shape (B, C).

Raises:

ValueError – If quantiles is empty, predictions has incorrect shape, or number of quantiles does not match predictions.

twiga.distributions.ml.utils.get_sigma_prediction(predictions, quantiles)#

Compute sigma using predictions at quantiles 0.25 and 0.75.

Sigma is calculated as (upper_prediction - lower_prediction) / (2 * 0.6745), where upper_prediction and lower_prediction are at quantiles 0.75 and 0.25, respectively, and 0.6745 is the z-score for the 25th/75th percentiles.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Returns:

Sigma predictions with shape (B, C).

Raises:

ValueError – If quantiles is empty, predictions has incorrect shape, number of quantiles does not match predictions, or interpolation is not possible for quantiles 0.25 or 0.75.