Distribution Families#
Source Files
twiga/distributions/nn/parametric.pytwiga/distributions/nn/quantile.pytwiga/distributions/nn/fpquantile.pytwiga/distributions/nn/residual_conformal.pytwiga/models/nn/prob/core.pytwiga/models/ml/gausscatboost_model.py-GAUSSCATBOOSTConfig/GAUSSCATBOOSTModeltwiga/models/ml/ngboostnormal_model.py-NGBOOSTNORMALConfig/NGBOOSTNORMALModeltwiga/models/ml/ngboostlognormal_model.py-NGBOOSTLOGNORMALConfig/NGBOOSTLOGNORMALModeltwiga/models/ml/ngboostexponential_model.py-NGBOOSTEXPONENTIALConfig/NGBOOSTEXPONENTIALModeltwiga/models/ml/prob/base_ngboost.py-BaseNGBoostRegressortwiga/distributions/ml/utils.py
Twiga implements several distribution families for probabilistic neural forecasting. All NN distributions slot into the composable backbone/head architecture described below.
Backbone/Head Architecture#
Probabilistic neural models in Twiga decouple the backbone (which learns to encode the time series) from the distribution head (which maps the latent vector to a forecast). This separation lives in twiga/models/nn/prob/core.py.
graph LR
X["Input (B, T, F)"] --> BB["Backbone\n(NHITSNetwork / MLPFNetwork / …)"]
BB -->|"encode(x) → z (B, encode_size)"| HEAD["Distribution Head\n(Normal / QR / …)"]
HEAD --> OUT["Distribution Parameters\n(loc, scale) / quantiles / …"]
Backbone contract - any backbone must implement:
encode(x) → Tensor- maps raw input(B, T, F)to latent(B, encode_size)encode_size: intproperty - width of the latent vectorpenalty_dict(epoch) → dict- returns regularisation terms (Lasso, gate sparsity …)
Head contract - any distribution head must implement:
forward(z) → tuple- maps latentzto distribution parametersstep(z, y, metric_fn, epoch) → (loss, metric)- computes training loss in one call
ProbabilisticModel (a LightningModule) wires them together and automatically injects hidden_size=backbone.encode_size into the head constructor so callers never specify it manually.
# Under the hood - TwigaForecaster does this for you when you pass a config like MLPFNormalConfig
from twiga.models.nn.prob.core import ProbabilisticModel
from twiga.models.nn.net.mlpf import MLPFNetwork
from twiga.distributions.nn.parametric import NormalDistribution
model = ProbabilisticModel(
backbone_cls=MLPFNetwork,
backbone_kwargs={...}, # num_target_feature, forecast_horizon, etc.
head_cls=NormalDistribution,
head_kwargs={"num_target_output": 1, "forecast_horizon": 48},
# hidden_size injected automatically from backbone.encode_size
)
Adding a new distribution requires implementing only the head interface - the backbone and training loop are unchanged.
NN Distributions#
Parametric Distributions#
All parametric heads live in twiga/distributions/nn/parametric.py. Each maps the latent vector z (B, encode_size) to two (or three for Student-T) distribution parameters and trains via negative log-likelihood.
Class |
Distribution |
Output params |
Constraint |
Best for |
|---|---|---|---|---|
|
Normal |
|
σ > 0 |
Symmetric, unbounded targets |
|
Laplace |
|
b > 0 |
Heavy-tailed, outlier-robust |
|
LogNormal |
|
σ > 0 |
Strictly positive, right-skewed |
|
Gamma |
|
α, β > 0 |
Strictly positive, flexible skew |
|
Beta |
|
α, β > 0 |
Bounded [0, 1] targets |
|
Student-T |
|
σ > 0, ν > 2 |
Very heavy tails |
All share the same interface:
from twiga.distributions.nn.parametric import NormalDistribution
dist = NormalDistribution(
num_target_output=1,
hidden_size=256, # injected automatically by ProbabilisticModel
forecast_horizon=48,
out_activation_function=None,
)
# Forward pass → distribution parameters
loc, scale = dist(z) # z shape: (B, encode_size)
# Get PyTorch distribution object
d = dist.get_distribution(loc, scale)
# Negative log-likelihood for training
nll = dist.get_log_likelihood(loc, scale, targets)
# training_step equivalent
loss, metric = dist.step(z, targets, metric_fn, epoch)
Student-T requires min_df
StudentTDistribution accepts an additional min_df: float parameter (default 2.1) that clips the predicted degrees-of-freedom from below to ensure finite variance.
Quantile Distribution#
The QRDistribution class (twiga/distributions/nn/quantile.py) outputs quantile predictions from a neural network.
from twiga.distributions.nn.quantile import QRDistribution
dist = QRDistribution(
quantiles=[0.05, 0.25, 0.5, 0.75, 0.95],
num_outputs=1,
hidden_size=256,
horizon=48,
loss_fn="pinball", # or "huber-pinball"
kappa=0.5,
conf_level=0.05,
)
# Forward pass: shape (batch, num_quantiles, horizon, num_outputs)
quantile_values = dist(hidden_repr)
# Forecast: returns (loc, quantile_values)
loc, quantiles = dist.forecast(hidden_repr)
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Quantile levels to predict |
|
|
|
Number of target variables |
|
|
|
Input hidden dimension |
|
|
|
Forecast horizon |
|
|
|
Loss function: |
|
|
|
Huber transition parameter |
|
|
|
Confidence level for bounds |
|
|
|
Numerical stability |
|
|
|
Optional output activation |
The step() method computes quantile loss and metrics during training:
loss, metric = dist.step(batch, metric_fn, epoch)
Median Extraction#
from twiga.distributions.nn.quantile import get_median_quantile
# Extract median from quantile predictions
median = get_median_quantile(quantile_hats, probs=[0.05, 0.25, 0.5, 0.75, 0.95])
# Shape: (batch, horizon, num_targets)
If \(\tau = 0.5\) is in the quantile list, it is extracted directly. Otherwise, linear interpolation between the two closest quantiles is used.
Full Parameterised Quantile Regression (FPQR)#
FPQRDistribution (twiga/distributions/nn/fpquantile.py) extends quantile regression by treating the quantile levels themselves as learnable. Rather than predicting at a fixed grid of quantiles, a QuantileProposal network proposes the quantile levels adaptively from the latent representation, which are then encoded via CosinetauEmbedding before producing forecast values.
References
A. Faustine and A. Moshi, “Full Parameterized Quantile Function for Probabilistic Forecasting,” IEEE Access, 2024. DOI: 10.1109/ACCESS.2024.3402001
A. Faustine and L. Aversano, “Adaptive Quantile Learning for Long-Term Probabilistic Forecasting,” IEEE Transactions, 2022. DOI: 10.1109/ACCESS.2022.3143387
graph LR
Z["z (B, hidden_size)"] --> QP["QuantileProposal\n→ taus, tau_hats, entropies"]
Z --> CE["CosinetauEmbedding\n(cosine τ embedding)"]
QP --> CE
CE --> QO["quantile_output\n→ (B, n_quantiles, horizon, outputs)"]
QP --> LOSS["QuantileProposalLoss\n+ QuantileLoss − entropy"]
QuantileProposal generates the quantile grid from the backbone’s latent vector:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Number of quantile levels to propose |
|
|
|
Input latent dimension |
|
|
|
Dropout for regularisation |
|
|
|
Clamps proposals to |
CosinetauEmbedding maps proposed tau values to a fixed-size embedding using cosine basis functions:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Number of cosine basis functions |
|
|
|
Output embedding dimension |
|
|
|
Scales input size (should match horizon) |
FPQRDistribution combines proposal + cosine embedding + output projection:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Quantile levels (None → 9) |
|
|
|
Number of target variables |
|
|
|
Latent dimension from backbone |
|
|
|
Forecast horizon |
|
|
|
Dropout in proposal network |
|
|
|
Significance level for interval bounds |
|
|
|
Huber transition parameter |
|
|
|
Cosine basis functions (0 to use |
|
|
|
|
from twiga.distributions.nn.fpquantile import FPQRDistribution
dist = FPQRDistribution(
n_quantiles=9,
num_outputs=1,
hidden_size=256,
horizon=48,
conf_level=0.05,
loss_fn="pinball",
)
# Training step: loss = quantile_loss + proposal_loss − entropy
loss, metric = dist.step(z, y, metric_fn, epoch)
# Inference: returns dict with loc, quantiles, quantile_levels, conf_level
result = dist.forecast(z)
# result["loc"] shape: (B, horizon, num_outputs)
# result["quantiles"] shape: (B, n_quantiles, horizon, num_outputs)
# result["quantile_levels"] shape: (B, n_quantiles, horizon) ← per-sample proposal
FPQR vs QR
QRDistribution predicts at a fixed, user-specified quantile grid. FPQRDistribution learns the quantile levels jointly with the forecast values, adapting the grid per sample. The proposal loss plus entropy regularisation encourages diverse, well-spread coverage. Use FPQR when you want the model to discover the most informative quantile levels automatically.
The training loss combines three terms:
where \(\mathbb{H}\) is the entropy of the proposed quantile distribution, acting as a diversity regulariser.
Conditional Residual Calibration (CRC)#
CRC heads (twiga/distributions/nn/residual_conformal.py) learn a point forecast \(\mu\) and a per-step scale \(\sigma\) end-to-end — no post-hoc conformal calibration step required. The scale MLP is trained to approximate the absolute residual \(|y - \mu|\), giving prediction intervals of the form \([\mu \pm z_\alpha \sigma]\) at inference time.
Two variants exist, differing only in how \(\mu\) is computed:
Class |
Backbone family |
Mu path |
|---|---|---|
|
MLPF, NHiTS, RNN |
Linear projection from latent |
|
MLPGAM, MLPGAF |
|
Loss function:
The joint training objective is:
The sigma calibration loss \(\mathcal{L}_\sigma\) is selected via sigma_loss_fn:
|
Formula |
Use when |
|---|---|---|
|
\(\alpha \cdot \text{MSE}(\sigma, \sqrt{r}) + (1-\alpha) \cdot \text{MAE}(\sigma, \sqrt{r})\) |
General purpose; variance-stabilised residual |
|
\(\alpha \cdot \text{MSE}(\sigma, r) + (1-\alpha) \cdot \text{MAE}(\sigma, r)\) |
General purpose; same \(\alpha\) as mu loss |
|
\(\frac{1}{2}\left(\frac{r^2}{\sigma^2} + \log \sigma^2\right)\) |
Heteroscedastic Gaussian NLL |
|
\(\frac{r}{\sigma} + \log(2\sigma)\) |
Robust to heavy-tailed residuals |
|
\((\sigma - r)^2\) |
Distribution-free MSE on residual magnitude |
|
\(\log(1 + (\sigma - r)^2)\) |
Robust MSE with log-damping of large errors |
where \(r = |y - \mu|\).
Two-stage training (MLPGAMCRC default, controlled by two_stage=True):
graph LR
S1["Stage 1\nBackbone + mu_layer\n(sigma_layer frozen)"] --> S2["Stage 2\nSigma MLP only\n(backbone frozen)"]
In stage 1 the backbone and mu path are updated via the combined \(\mathcal{L}_\mu + \mathcal{L}_\sigma\) loss. In stage 2 only sigma_layer parameters receive gradients — the backbone is frozen by Lightning’s toggle_optimizer. This prevents the scale head from distorting the point forecast.
from twiga.distributions.nn.residual_conformal import CRCDistribution, AdditiveCRCDistribution
# Standard backbone (MLPF, NHiTS)
crc = CRCDistribution(
num_target_output=1,
hidden_size=256, # injected from backbone.encode_size
forecast_horizon=48,
sigma_loss_fn="hybrid_sqrt", # "hybrid_sqrt" | "hybrid" | "gaussian" | "laplace" | "mse" | "log"
alpha=0.1,
)
mu, sigma = crc(z) # z: (B, hidden_size)
loss = crc.get_log_likelihood(mu, sigma, y) # sigma calibration loss only (not point loss)
# Sigma-only step (stage 2 of two-stage training)
loss, metric = crc.step_sigma(z, y, metric_fn)
# Additive backbone (MLPGAM, MLPGAF) — mu_layer is Identity, no linear projection
add_crc = AdditiveCRCDistribution(
num_target_output=1,
hidden_dim=256, # internal sigma MLP width (independent of H*O)
forecast_horizon=48,
)
mu, sigma = add_crc(z) # z shape (B, H*O); mu = reshape(z), sigma from ResidualSigmaHead
CRC vs post-hoc conformal
Post-hoc conformal prediction (see Conformal Prediction) wraps any already-trained model and provides finite-sample coverage guarantees. CRC bakes the calibration objective directly into training — no separate calibration split needed — but the coverage guarantee is empirical rather than distribution-free. CRC typically produces tighter intervals when the signal-to-noise ratio is high.
ML Parametric Models#
Twiga provides two families of parametric ML models that output full distribution parameters rather than just point forecasts.
GaussCatBoost#
GAUSSCATBOOSTModel uses CatBoost’s built-in RMSEWithUncertainty loss, which jointly learns the mean μ and aleatoric scale σ in a single model. One CatBoostRegressor is trained per output step, and virtual ensemble snapshots provide epistemic uncertainty decomposition.
Model |
Name |
Config Class |
Output |
|---|---|---|---|
Gaussian CatBoost |
|
|
(μ, σ) per horizon step |
GAUSSCATBOOSTConfig extends CATBOOSTConfig with:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Model identifier. |
|
|
|
Overfitting detection strategy; active only when an eval set is provided to |
|
|
|
Iterations without improvement before early stopping triggers. |
|
|
|
Virtual ensemble snapshots for epistemic uncertainty estimation via |
from twiga.models.ml.gausscatboost_model import GAUSSCATBOOSTConfig
config = GAUSSCATBOOSTConfig(
iterations=500,
learning_rate=0.05,
task_type="CPU",
virtual_ensembles_count=20,
)
Uncertainty decomposition
predict_with_uncertainty() returns three components per output step: mean (predicted μ), knowledge_uncertainty (epistemic — variance across virtual ensemble snapshots), and data_uncertainty (aleatoric — the model’s second output, exp of predicted log-σ).
NGBoost#
NGBoost (Natural Gradient Boosting) fits all distribution parameters simultaneously via the natural gradient of a proper scoring rule. Unlike the two-stage GAUSSCATBOOSTModel, which fits mean and variance separately, NGBoost optimises both jointly. All models extend BaseNGBoostRegressor and train one NGBRegressor per flattened output column.
Model |
Name |
Config Class |
Distribution |
Best for |
|---|---|---|---|---|
NGBoost Normal |
|
|
Normal N(μ, σ²) |
Symmetric, unbounded targets |
NGBoost LogNormal |
|
|
LogNormal |
Strictly positive, right-skewed targets |
NGBoost Exponential |
|
|
Exponential (1/λ) |
Non-negative, memoryless processes |
NGBoost vs two-stage Gaussian
NGBoost jointly optimises μ and σ via natural gradient, producing better-calibrated intervals. The two-stage GAUSSCATBOOSTModel (RMSEWithUncertainty) is faster and scales better on very large datasets.
NGBOOSTNORMALModel#
Fits a Gaussian distribution N(μ, σ²) jointly optimising mean and standard deviation.
NGBOOSTNORMALConfig fields:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Model identifier. |
|
|
|
Domain identifier. |
|
|
|
Number of boosting rounds. |
|
|
|
Shrinkage applied to each boosting step. |
|
|
|
Fraction of training data used per round. |
|
|
|
Fraction of features sampled per tree. |
|
|
|
Random seed. |
|
|
|
Scoring rule for training objective. |
Output: (loc, scale) = (mean μ, standard deviation σ).
from twiga.models.ml.ngboostnormal_model import NGBOOSTNORMALConfig
config = NGBOOSTNORMALConfig(n_estimators=300, learning_rate=0.05)
NGBOOSTLOGNORMALModel#
Fits a LogNormal distribution, appropriate for strictly positive targets such as solar irradiance or non-negative electricity prices.
NGBOOSTLOGNORMALConfig shares all fields with NGBOOSTNORMALConfig with name="ngboostlognormal".
Output: (loc, scale) = (geometric mean exp(μ_log), log-space standard deviation σ_log).
Interpreting LogNormal output
loc is the geometric mean (the exponentiated log-space mean), not the arithmetic mean. To recover the arithmetic mean use exp(log(loc) + scale² / 2). For prediction intervals, exponentiate the log-space quantiles.
from twiga.models.ml.ngboostlognormal_model import NGBOOSTLOGNORMALConfig
config = NGBOOSTLOGNORMALConfig(n_estimators=300, score="CRPScore")
NGBOOSTEXPONENTIALModel#
Fits an Exponential distribution parameterised by its scale (1/λ). Suitable for non-negative targets with a memoryless property.
NGBOOSTEXPONENTIALConfig shares all fields with NGBOOSTNORMALConfig with name="ngboostexponential".
Output: Both loc and scale equal the predicted scale parameter 1/λ (the Exponential distribution has a single parameter).
Single-parameter distribution
The Exponential distribution is fully characterised by its scale (1/λ), so both loc and scale in the output tuple carry the same value. Mean and standard deviation are both equal to 1/λ.
from twiga.models.ml.ngboostexponential_model import NGBOOSTEXPONENTIALConfig
config = NGBOOSTEXPONENTIALConfig(n_estimators=400)
LogScore vs CRPScore
LogScore maximises the predictive log-likelihood — faster and often better-calibrated for in-distribution data. CRPScore (Continuous Ranked Probability Score) is more robust to distributional shift and outliers but typically slower to converge.
ML Distribution Utilities#
The twiga/distributions/ml/utils.py module provides utilities for post-processing quantile predictions from ML models.
interpolate_quantile()#
Linearly interpolates to extract a specific quantile from a set of predicted quantiles.
from twiga.distributions.ml.utils import interpolate_quantile
# predictions shape: (batch, num_quantiles, num_targets)
q30 = interpolate_quantile(
predictions=predictions,
sorted_quantiles=[0.05, 0.25, 0.5, 0.75, 0.95],
target_quantile=0.3,
)
# Output shape: (batch, num_targets)
get_median_prediction()#
Extracts the median prediction (quantile 0.5).
from twiga.distributions.ml.utils import get_median_prediction
median = get_median_prediction(predictions, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])
get_sigma_prediction()#
Estimates the standard deviation from the interquartile range:
from twiga.distributions.ml.utils import get_sigma_prediction
sigma = get_sigma_prediction(predictions, quantiles=[0.05, 0.25, 0.5, 0.75, 0.95])
Note
The constant 0.6745 is the 75th percentile of the standard normal distribution. This conversion assumes approximate normality of the residuals.
How Distributions Connect to Models#
graph TD
subgraph "Point Models"
A[CatBoost / XGBoost / LightGBM / MLPF / NHITS] --> B["predict() → loc array"]
end
subgraph "Parametric NN Models"
C["ProbabilisticModel\n(backbone + head)"] --> D["predict() → (loc, scale, …)"]
D --> E["NormalDistribution / LaplaceDistribution\nGammaDistribution / BetaDistribution / …"]
end
subgraph "Quantile Models"
G[QRCatBoost / QRXGBoost / QRLightGBM] --> H["predict() → quantile array"]
I["MLPFQR / NHITSQRModel / …"] --> H
H --> J[QRDistribution / FPQRDistribution]
end
subgraph "ML Parametric (Gaussian)"
K["GAUSSCATBOOSTModel"] --> L["predict() → (μ, σ)"]
end
B --> CP[Conformal Prediction]
D --> CP
H --> CP
L --> CP
CP --> PI[Prediction Intervals]
Point models produce a single prediction per time step and can be wrapped with split conformal prediction.
Parametric NN models (via ProbabilisticModel) produce full distribution parameters and train via NLL. They can be used stand-alone or combined with conformal residual fitting.
Quantile models produce multiple quantile predictions and can be used with conformal quantile regression or taken directly as prediction intervals.
ML Usage Example#
import pandas as pd
from twiga.core.config import DataPipelineConfig, ExperimentConfig
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.gausscatboost_model import GAUSSCATBOOSTConfig
from twiga.models.ml.ngboostnormal_model import NGBOOSTNORMALConfig
data_config = DataPipelineConfig(
target_feature="load_mw",
period="1h",
lookback_window_size=168,
forecast_horizon=24,
lags=[1, 24, 168],
input_scaler="standard",
)
train_config = ExperimentConfig(
split_freq="months",
train_size=6,
test_size=1,
window="expanding",
project_name="GaussianForecast",
)
# Gaussian CatBoost — fast, single model via RMSEWithUncertainty
gauss_config = GAUSSCATBOOSTConfig(task_type="CPU", virtual_ensembles_count=10)
# NGBoost Normal — joint optimisation of μ and σ
ngboost_config = NGBOOSTNORMALConfig(n_estimators=300, learning_rate=0.05)
forecaster = TwigaForecaster(
data_params=data_config,
model_params=[gauss_config],
cv_params=train_config,
)
forecaster.fit(train_df=train_df)
interval_dict, _ = forecaster.predict_interval(test_df=test_df)
for model_name, (lower, point, upper) in interval_dict.items():
print(f"{model_name}: lower={lower.shape}, point={point.shape}, upper={upper.shape}")
API Reference#
Parametric distribution base#
- class twiga.distributions.nn.parametric.BaseDistribution(num_target_output, hidden_size, forecast_horizon)#
-
Abstract base for probabilistic output heads.
Subclasses must implement
forward(),get_distribution(), andget_log_likelihood().- All heads share the same input contract:
x: (B, hidden_size) - latent representation from encoder/decoder.
- All heads share the same output contract for forecasting tensors:
shape (B, forecast_horizon, num_target_output).
- forecast(z)#
Inference-mode forward (no gradient).
- abstractmethod get_distribution(*params)#
Construct a torch Distribution from predicted parameters.
- Return type:
- abstractmethod get_log_likelihood(*params, targets)#
Return mean negative log-likelihood over the batch.
- Return type:
- step(z, y, metric_fn, epoch=None)#
Compute NLL loss and metric for one training/validation step.
- Parameters:
z (
Tensor) – Latent tensor of shape(B, hidden_size)from the backbone.y (
Tensor) – Target tensor of shape(B, forecast_horizon, num_target_output).metric_fn (
Callable[...,Any]) – Callable returning a scalar metric given(pred, target).epoch (
int|None) – Current epoch (unused here; available for subclasses).
- Return type:
- Returns:
Tuple of
(nll_loss, metric).
Parametric heads — standard backbones#
- class twiga.distributions.nn.parametric.NormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionNormal (Gaussian) output head.
Best for: symmetric, unbounded targets - energy demand, temperature.
- Parameters predicted:
mu - mean (unconstrained) sigma - std dev (exp of log_scale, strictly positive)
- Parameters:
- forward(x)#
Predict mu and sigma.
- get_distribution(mu, sigma)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.LaplaceDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionLaplace output head.
Best for: targets with heavier tails than Normal, robust to outliers - electricity prices, wind speed residuals.
- Parameters predicted:
mu - location (unconstrained) scale - scale (exp of log_scale, strictly positive)
- forward(x)#
Predict mu and scale.
- get_distribution(mu, scale)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.LogNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionLog-Normal output head.
Best for: strictly positive, right-skewed targets - renewable generation, gas prices, load with zero floor.
- Parameters predicted:
mu - mean of log(target) (unconstrained) sigma - std of log(target) (exp of log_scale, strictly positive)
Note
Targets must be strictly positive. Apply a small epsilon shift in the data pipeline if zeros are possible (e.g., target = max(y, 1e-6)).
- forward(x)#
Predict mu and sigma of the underlying Normal (in log space).
- get_distribution(mu, sigma)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.GammaDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48)#
Bases:
BaseDistributionGamma output head.
Best for: strictly positive targets with flexible skew - solar irradiance, wind power, load at aggregate level.
- Parameters predicted:
concentration - shape parameter α (softplus, strictly positive) rate - rate parameter β (softplus, strictly positive)
Note
Mean of Gamma(α, β) = α/β. Targets must be strictly positive.
- forward(x)#
Predict concentration and rate.
- get_distribution(concentration, rate)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.BetaDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48)#
Bases:
BaseDistributionBeta output head.
Best for: bounded [0, 1] targets - capacity factors, state of charge, fill rates, normalised demand ratios.
- Parameters predicted:
alpha - first shape parameter (softplus, strictly positive) beta - second shape parameter (softplus, strictly positive)
Note
Targets must lie strictly in (0, 1). Apply a clamp in the data pipeline if boundary values are possible:
target = target.clamp(1e-6, 1 - 1e-6)- forward(x)#
Predict alpha and beta.
- get_distribution(alpha, beta)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.StudentTDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, min_df=2.1)#
Bases:
BaseDistributionStudent-T output head.
Best for: targets with very heavy tails and potential outliers - financial returns, spot electricity prices.
- Parameters predicted:
mu - location (unconstrained) sigma - scale (exp of log_scale, strictly positive) df - degrees of freedom (softplus, clipped to ≥ 2 for finite variance)
Note
Degrees of freedom are predicted per-sample but averaged across the forecast horizon so the model learns a single df per series.
- forecast(z)#
Inference-mode forward (no gradient).
- forward(x)#
Predict mu, sigma, and degrees of freedom.
- get_distribution(mu, sigma, df)#
Construct a torch Distribution from predicted parameters.
- Return type:
Parametric heads — additive backbones (MLPGAM / MLPGAF)#
These variants receive the pre-summed additive mean directly from the backbone instead of a latent vector, preserving the GAM decomposition.
- class twiga.distributions.nn.parametric.AdditiveNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionNormal output head preserving additive backbone structure.
For use with
MLPGAMNetworkandMLPGAFNetwork, whoseencode()already returns the additive mean directly as a flat(B, H*O)vector. The mean is taken as-is (no projection) to honour the additive decomposition; only the scale is learned via a separate linear layer.- Parameters predicted:
mu - additive mean (reshaped directly from latent, no projection) sigma - std dev (exp of log_scale, strictly positive)
- Parameters:
- forward(x)#
Predict mu and sigma.
- get_distribution(mu, sigma)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.AdditiveLaplaceDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionLaplace output head preserving additive backbone structure.
Analogous to
AdditiveNormalDistributionbut with Laplace tails. Best for: heavy-tailed, outlier-robust targets - electricity prices, wind residuals.- Parameters predicted:
mu - additive location (reshaped directly from latent, no projection) scale - scale (exp of log_scale, strictly positive)
- forward(x)#
Predict mu and scale.
- get_distribution(mu, scale)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.AdditiveLogNormalDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None)#
Bases:
BaseDistributionLog-Normal output head preserving additive backbone structure.
Best for: strictly positive, right-skewed targets - renewable generation, gas prices.
- Parameters predicted:
mu - additive log-space mean (reshaped directly from latent, no projection) sigma - std of log(target) (exp of log_scale, strictly positive)
Note
Targets must be strictly positive.
- forward(x)#
Predict mu and sigma of the underlying Normal (in log space).
- get_distribution(mu, sigma)#
Construct a torch Distribution from predicted parameters.
- Return type:
- class twiga.distributions.nn.parametric.AdditiveStudentTDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, min_df=2.1)#
Bases:
BaseDistributionStudent-T output head preserving additive backbone structure.
Best for: very heavy tails - spot electricity prices, financial returns.
- Parameters predicted:
mu - additive location (reshaped directly from latent, no projection) sigma - scale (exp of log_scale, strictly positive) df - degrees of freedom (softplus, clipped to ≥ min_df for finite variance)
- forecast(z)#
Inference-mode forward (no gradient).
- forward(x)#
Predict mu, sigma, and degrees of freedom.
- Parameters:
x (
Tensor) – Latent tensor of shape (B, hidden_size) - the additive location from the backbone.- Return type:
- Returns:
Tuple of (mu, sigma, df) – - mu, sigma: shape (B, forecast_horizon, num_target_output) - df: shape (B, 1, num_target_output), broadcast-compatible
- get_distribution(mu, sigma, df)#
Construct a torch Distribution from predicted parameters.
- Return type:
Quantile regression distribution#
- class twiga.distributions.nn.quantile.QRDistribution(quantiles=None, num_outputs=1, hidden_size=256, horizon=48, kappa=0.5, output_activation=None, conf_level=0.05, loss_fn='pinball', crossing_penalty=10.0, sort_output=False, monotone_output=False, crossing_mode='relu', softplus_beta=1.0)#
Bases:
ModuleFixed-grid quantile regression head.
Predicts a fixed set of quantile levels $tau$ directly from a latent representation via a single linear projection. Boundary quantiles
conf_level/2and1 - conf_level/2are automatically prepended / appended so the output always spans the requested coverage band.Non-crossing is encouraged by a penalty term in
step(); it is not enforced structurally. For architectural monotonicity guarantees, useFPQRDistributionwithqvn_type="monotone".- Parameters:
quantiles (
list[float] |None) – Explicit quantile levels the model outputs.Nonedefaults to[0.1, 0.2, …, 0.9]. The list is used exactly as provided — no boundary levels are auto-inserted. Includeconf_level/2and1 - conf_level/2explicitly if raw (non-conformal) interval construction is required.num_outputs (
int) – Number of target seriesD.hidden_size (
int) – Width of the backbone latent vector fed as input.horizon (
int) – Forecast horizonH.kappa (
float) – Huber transition parameter (only used forhuber-pinball).output_activation (
Module|None) – Applied elementwise to the raw quantile outputs.None(default) usesIdentity. Warning: non-identity activations (e.g.ReLU) constrain the output range and will produce wrong forecasts for targets that can be negative.conf_level (
float) – Significance levelα; stored as metadata for downstream consumers (conformal calibration, interval labelling). Does not affect the quantile grid.loss_fn (
str) –"pinball","huber-pinball", or"winkler"."winkler"applies the interval score to the outer quantile pair and pinball loss to all K levels.crossing_penalty (
float) – Weightλfor the non-crossing hinge penalty.sort_output (
bool) – IfTrue, sort quantile predictions along the quantile axis (dim=1) at inference and training time. Guarantees non-crossing at inference without changing the parameterisation. Has no effect whenmonotone_output=True.monotone_output (
bool) – IfTrue, use a cumulative-softplus construction inforward()to guarantee strict monotonicity by parameterisation (anchor + positive increments). Supersedessort_outputand makescrossing_penaltyredundant.
- forecast(input_tensor)#
Forecast quantiles for the given input tensor.
- forward(x)#
Forward pass producing raw (unconstrained) quantile values.
Outputs K quantile predictions directly from the linear head without structural monotonicity enforcement. Non-crossing is encouraged during training via the
crossing_penaltyterm instep():loss = pinball_loss + λ · non_crossing_loss(Q̂)
where
non_crossing_losspenalises any pair (k, k+1) where Q̂(τₖ) > Q̂(τₖ₊₁). Thequantile_value_layeris initialised with small uniform weights to reduce crossing violations at the start of training.
- step(z, y, metric_fn, _epoch=None)#
Perform a single training/validation step.
- Parameters:
z (
Tensor) – Latent tensor of shape(B, hidden_size)from the backbone.y (
Tensor) – Target tensor of shape(B, forecast_horizon, num_outputs).metric_fn (
Callable[...,Any]) – Callable returning a scalar metric given(pred, target)._epoch (
int|None) – Unused; accepted for interface consistency with other heads.
- Return type:
- Returns:
Tuple of
(loss, metric).
- twiga.distributions.nn.quantile.get_median_quantile(quantile_hats, probs)#
Compute the median (0.5 quantile) from a quantile tensor of shape (B, N, T, C).
- Parameters:
- Return type:
- Returns:
torch.Tensor – Median values of shape (B, T, C).
- Raises:
ValueError – If input shapes are invalid or interpolation is not possible.
FPQR components#
- class twiga.distributions.nn.fpquantile.QuantileProposal(n_quantiles=10, z_dim=64, dropout=0.1, conf_level=0.05, n_outputs=1, crossing_penalty=10.0, stop_gradient=True)#
Bases:
ModuleA neural network module for proposing quantile probability levels.
Projects the context vector
zthrough a linear layer and softmax to obtain a valid probability distribution overn_quantilesslots, then computes the cumulative sum to produce strictly increasing probability levelstaus ∈ [0, 1].What this module outputs.
tau_hatsare probability-level midpoints (values in(0, 1)), not forecast values. The value network maps these levels to actual quantile predictions.Gradient isolation.
tau_hatsare detached before being returned. The value-network loss must not back-propagate through the proposal parameters; those are trained separately viaQuantileProposalLoss.Entropy term.
entropies(Shannon entropy of the proposal distribution) should be subtracted from the total loss (loss -= entropies.mean()), encouraging the proposal to spread its quantile grid rather than concentrate on a few levels.crossing_penalty. Stored as
self.crossing_penaltyand read bystep()to scalenon_crossing_loss(). Not applied insideforward().- Variables:
n_quantiles – Number of quantiles to propose.
n_outputs – Number of forecast horizons.
z_dim – Dimensionality of the input features.
net – Linear layer mapping
zto quantile logits.dropout – Dropout layer for regularization.
conf_level – Stored as metadata for downstream consumers (conformal calibration). Does not affect the
tau_hatsclamp range.tau_0 – Zero lower-boundary buffer prepended to
taus.
- Parameters:
n_quantiles (
int) – Number of quantiles to propose (default: 10).z_dim (
int) – Dimensionality of the input features (default: 64).dropout (
float) – Dropout rate for regularization (default: 0.1).conf_level (
float) – Significance level stored as metadata (default: 0.05).n_outputs (
int) – Number of forecast horizons (default: 1).crossing_penalty (
float) – Penalty weight read externally byFPQRDistribution.step(default: 10.0).
- forward(z)#
Compute probability-level proposals from the context vector.
- Parameters:
z (
Tensor) – Context vector, shape(batch_size, z_dim).- Return type:
- Returns:
Tuple of three tensors –
taus
(B, N+1, H)— strictly increasing cumulative probability boundaries including the fixed zero lower bound.tau_hats
(B, N, H)— midpoint probability levels between consecutive boundaries, clamped to[0.01, 0.99]. Detached whenstop_gradient=True(default), keeping the pinball gradient isolated from proposal parameters. Whenstop_gradient=False, gradients flow from the pinball loss through the τ-embedding back to the proposal parameters, supplementing the Wasserstein-1 training signal.entropies
(B, H)— Shannon entropy of the proposal distribution. Subtract from the total loss to encourage diversity.
- Raises:
ValueError – If
zdoes not have shape(batch_size, z_dim).
- class twiga.distributions.nn.fpquantile.CosinetauEmbedding(num_cosines=32, z_dim=128, num_outputs=48, tau_embed_type='TimeEmb')#
Bases:
ModuleA PyTorch module for embedding time values using cosine transformations.
Transforms quantile levels $hat{tau}$ into cosine features and projects them to the backbone hidden dimension via a configurable embedding strategy.
- Parameters:
num_cosines (
int) – Number of cosine functions. Defaults to 32.z_dim (
int) – Output embedding dimension. Defaults to 128.num_outputs (
int) – Number of horizon steps (scaling factor for input size). Defaults to 48.tau_embed_type (
str) –Projection strategy applied after the cosine features are computed. One of:
"TimeEmb"(default) —Linear(H·K → d) + ReLUapplied per quantile level independently. No inter-quantile mixing."LinearEmb"— pure linear projection without bias or activation. Slightly fewer parameters thanTimeEmb; useful when a non-linear gate is provided elsewhere."ConvEmb"— circularConv1d(kernel=3)over the quantile axis. Mixes cosine features from neighbouring quantile levels $hat{tau}_{n-1}, hat{tau}_n, hat{tau}_{n+1}$, allowing the embedding to encode relative spacing between adjacent levels.
- class twiga.distributions.nn.fpquantile.FPQRDistribution(n_quantiles=9, num_outputs=1, hidden_size=256, horizon=48, proposal_type='softmax', tau_embed_type='TimeEmb', rope_type='linear', rope_base=None, dropout=0.1, conf_level=0.05, kappa=0.25, num_cosines=32, output_activation=None, activation_function='ReLU', loss_fn='pinball', crossing_penalty=10.0, sort_output=False, qvn_type='standard', crossing_mode='relu', softplus_beta=1.0, lambda_cal=0.0, cal_temperature=1.0, proposal_stop_gradient=True, proposal_loss_fn='wasserstein', sqda_weight=0.1)#
Bases:
ModuleA neural network for forecasting quantiles using a quantile value network.
This module predicts quantiles for a forecasting task, combining a quantile proposal layer with a linear output layer to produce quantile forecasts.
- Parameters:
n_quantiles (
int|None) – Number of quantiles to forecast. If None, defaults to 9.num_outputs (
int) – Number of output features. Defaults to 1.hidden_dim (int) – Size of the hidden layers. Defaults to 256.
horizon (
int) – Number of time steps to forecast. Defaults to 48.dropout (
float) – Dropout rate for the quantile proposal layer. Defaults to 0.1.conf_level (
float) – Confidence level for quantile proposal. Defaults to 0.05.output_activation (
Module|None) – Activation function for the output layer. Defaults to nn.Identity().proposal_type (
Literal['softmax','horizon','kumaraswamy']) – Quantile proposal strategy -'softmax'(default, entropy-regularised uniform push),'horizon'(per-step conditioning), or'kumaraswamy'(tail-adaptive).tau_embed_type (
str) – Projection applied to the cosine features (only whennum_cosines > 0).'TimeEmb'(default, linear + ReLU),'LinearEmb'(pure linear), or'ConvEmb'(conv over quantile axis).rope_type (
str) – RoPE variant used whennum_cosines=0. One of'linear'(default, integer slot-index positions),'tau_pos'(mean τ̄_k scaled to[0, N_q]as float positions),'factored'(split z_dim: quantile-ordering RoPE ⊕ horizon RoPE), or'direct'(no projection; backbone z rotated by τ position).rope_base (
float|None) – RoPE base frequency passed toTauRoPEEmbedding.None(default) auto-computesmax(2, p_max × 4/π)per axis. Only used whennum_cosines=0.qvn_type (
Literal['standard','monotone','monotone_cal']) –Quantile value network architecture.
'standard'(default) — direct prediction; an optional crossing penalty enforces monotonicity during training.'monotone'— cumulative-softplus increments (MonotonicQVN); non-crossing is guaranteed by construction and no penalty is applied.
crossing_mode (
Literal['relu','softplus']) –Non-crossing penalty shape used when
qvn_type='standard'.'relu'(default) — hinge penalty; zero gradient when quantiles are already ordered.'softplus'— smooth hinge; gradients flow even for near-crossing pairs, removing the dead-gradient zone of ReLU.
softplus_beta (
float) – Sharpness parameter shared by both modes. Whencrossing_mode='softplus'it controls the smoothness of the crossing penalty (larger → closer to ReLU). Whenqvn_type='monotone'it controls the sharpness of the increment gate inMonotonicQVN. Default1.0.
- forecast(input_tensor)#
Forecast quantiles for the given input tensor.
quantile_levelsis the per-sample proposal grid averaged across the batch and horizon dimensions so that downstream consumers receive a fixed 1-D array of representative probability levels, matching the interface expected byForecastResult.- Parameters:
input_tensor (
Tensor) – Input tensor of shape (batch_size, z_dim).- Return type:
- Returns:
Dict with keys –
"loc": median estimate — quantile slot whosetau_hatis nearest 0.5, shape(B, horizon, num_outputs)."quantiles": quantile forecasts, shape(B, n_quantiles, horizon, num_outputs)."quantile_levels": representative 1-D numpy array of lengthn_quantiles(mean oftau_hatsover batch and horizon)."conf_level": significance level scalar.
- forward(input_tensor)#
Forward pass of the quantile forecasting network.
- step(z, y, metric_fn, epoch=None)#
Perform a single training/validation step.
- Parameters:
z (
Tensor) – Latent tensor of shape(B, hidden_size)from the backbone.y (
Tensor) – Target tensor of shape(B, forecast_horizon, num_outputs).metric_fn (
Callable[...,Any]) – Callable returning a scalar metric given(pred, target).epoch (
int|None) – Current epoch (unused; accepted for interface consistency).
- Return type:
- Returns:
Tuple of
(loss, metric).
CRC distribution classes#
- class twiga.distributions.nn.residual_conformal.CRCDistribution(num_target_output=1, hidden_size=256, forecast_horizon=48, out_activation_function=None, sigma_loss_fn='hybrid', alpha=0.1, activation='ReLU')#
Bases:
BaseDistributionCRC head for standard backbones (e.g. MLPFNetwork).
The mean is computed by projecting the latent vector z through a linear layer, then reshaping to (B, H, O). The scale is predicted by a sigma layer applied to the detached mean (flattened), ensuring that gradients do not flow back into the backbone during sigma-only training.
- Training follows a two-stage protocol:
Stage 1 —
step(): optimise μ (backbone + mu_layer), σ is not updated. Stage 2 —step_sigma(): freeze backbone, optimise σ-head only.
- Parameters:
num_target_output (
int) – Number of output features per time step.hidden_dim – Dimensionality of the backbone latent vector.
forecast_horizon (
int) – Number of forecast time steps.out_activation_function (
Module|None) – Optional activation applied to the mean output.sigma_loss_fn (
str) – Calibration objective for σ. One ofSIGMA_LOSS_FNS. Default:"hybrid".alpha (
float) – Weight for MSE vs L1 in the mu and hybrid sigma losses.activation (
str) – Unused; kept for API compatibility.
- Raises:
ValueError – If
sigma_loss_fnis not one of the supported values.
- forecast(z)#
Inference-only prediction (no gradients).
For
"hybrid_sqrt"the scale is squared to convert from√|r|space back to residual magnitude space.
- forward(x)#
Predict mean and scale.
- get_distribution(mu, sigma)#
Construct a torch distribution for interval generation.
Returns
Laplace(μ, σ)for"laplace"andNormal(μ, σ)for all other modes, treating σ as an empirical scale estimate. For"hybrid_sqrt"callers must pass the forecast-adjusted (squared) σ.
- get_log_likelihood(mu, sigma, targets)#
Compute the sigma calibration loss (NLL for probabilistic modes).
- step(z, y, metric_fn, epoch=None)#
Stage-1 training step: optimise mean parameters.
- step_sigma(z, y, metric_fn)#
Stage-2 training step: optimise sigma with frozen backbone.
The mean is recomputed under
no_gradso sigma learns to predict residual magnitude without influencing point predictions.
- class twiga.distributions.nn.residual_conformal.AdditiveCRCDistribution(num_target_output=1, hidden_size=None, hidden_dim=256, forecast_horizon=48, out_activation_function=None, sigma_loss_fn='hybrid_sqrt', alpha=0.5, sigma_dropout=0.05, activation='SiLU')#
Bases:
CRCDistributionCRC head for additive backbones (MLPGAMNetwork, MLPGAFNetwork).
Inherits all training and inference logic from
CRCDistribution. Two architectural differences from the parent:mu_layer — identity (or optional activation): the backbone’s
encode()already returns the additive mean as(B, H*O), so no linear projection is needed.sigma_layer —
ResidualSigmaHead(two-layer MLP + LayerNorm + Dropout): a deeper network better suited to modeling conditional heteroscedasticity from the additive mean signal.
All other methods —
step,step_sigma,forecast,get_distribution,get_log_likelihood— are inherited unchanged.- Parameters:
num_target_output (
int) – Number of output features per time step.hidden_dim (
int) – Hidden dimension of the sigma MLP. Independent ofH*O.forecast_horizon (
int) – Number of forecast time steps.out_activation_function (
Module|None) – Optional activation applied to the mean.sigma_loss_fn (
str) – Calibration objective (seeCRCDistribution). Default:"hybrid_sqrt".alpha (
float) – MSE/L1 weight. Default:0.5.sigma_dropout (
float) – Dropout rate in the sigma MLP.activation (
str) – Activation name for the sigma MLP. Default:"SiLU".
- Raises:
ValueError – If
sigma_loss_fnis not one of the supported values.
Custom loss functions#
- class twiga.distributions.nn.custom_loss.QuantileLoss(kappa=0.0, reduction='mean')#
Bases:
ModuleQuantile loss module for quantile regression.
Computes the pinball loss between predicted quantiles and target values.
- Variables:
kappa – Smoothing parameter for softplus approximation.
reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.
- Parameters:
- __init__(kappa=0.0, reduction='mean')#
Initialize the QuantileLoss module.
- Parameters:
- Raises:
ValueError – If kappa < 0 or reduction is invalid.
- forward(inputs, quantiles, targets)#
Compute the quantile loss.
- Parameters:
- Return type:
- Returns:
The computed quantile loss tensor, reduced according to the specified method.
- class twiga.distributions.nn.custom_loss.QuantileHuberLoss(kappa=1.0, eps=1e-08, reduction='mean')#
Bases:
ModuleQuantile Huber loss module for robust quantile regression.
Computes the Huber loss adjusted for quantiles, less sensitive to outliers than L2 loss.
- Variables:
kappa – Threshold for switching between L2 and L1 loss.
eps – Small value to prevent division by zero.
reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.
- Parameters:
- __init__(kappa=1.0, eps=1e-08, reduction='mean')#
Initialize the QuantileHuberLoss module.
- Parameters:
- Raises:
ValueError – If kappa <= 0 or eps <= 0.
- forward(inputs, quantiles, targets)#
Compute the quantile Huber loss.
- class twiga.distributions.nn.custom_loss.QuantileProposalLoss(reduction='none')#
Bases:
ModuleQuantile proposal loss module to ensure non-crossing quantiles.
Computes a loss to enforce that predicted quantiles adhere to specified levels and do not cross.
- Variables:
reduction – Reduction method: ‘none’, ‘mean’, or ‘sum’.
- Parameters:
reduction (
str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘none’).
- __init__(reduction='none')#
Initialize the QuantileProposalLoss module.
- Parameters:
reduction (
str) – Reduction method: ‘none’, ‘mean’, or ‘sum’ (default: ‘none’).
- forward(quantile, quantile_hats, taus)#
Compute the quantile proposal loss.
- Parameters:
- Return type:
- Returns:
The computed quantile proposal loss tensor, reduced according to the specified method.
- class twiga.distributions.nn.custom_loss.SQDAProposalLoss(qdf_exponent=-1.0, divergence='squared', tail_weight=2.0, tail_alpha=0.2, eps=1e-06, reduction='mean')#
Bases:
ModuleSpectral Quantile Density-Adaptive (SQDA) proposal loss.
Trains the quantile proposal network by minimising the distance between the proposal’s interval widths \(\Delta\tau_n\) and the theoretically optimal density-adaptive target widths \(\Delta\tau^*_n \propto q(\hat\tau_n)^{-1}\), where \(q(\tau) = Q'(\tau)\) is the quantile density function (QDF).
Theoretical motivation
For a grid of \(N\) evaluation points, the \(L^p\) approximation error of the step-function quantile estimator satisfies:
\[E_n^{L^p} = q(\hat\tau_n)^p \cdot \frac{(\Delta\tau_n)^{p+1}}{p+1}\]Minimising \(\sum_n E_n^{L^p}\) subject to \(\sum_n \Delta\tau_n = 1\) yields (via Lagrange multipliers) the same criterion for all \(p\):
\[\Delta\tau^*_n \propto q(\hat\tau_n)^{-1}\]For heavy-tailed distributions \(q(\tau)\) is large at the tails (\(\tau \to 0, 1\)) and small near the median. The optimal widths are therefore smaller at the tails (denser grid) and wider in the interior — exactly the tail-adaptive property that the existing Wasserstein-1 signal cannot enforce directly.
QDF estimation
The QDF is approximated from the current (detached) quantile predictions \(\hat{Q}_n\) via central finite differences:
\[\hat{q}_n \approx \frac{\hat{Q}_{n+1} - \hat{Q}_{n-1}}{\hat\tau_{n+1} - \hat\tau_{n-1}}\]with one-sided differences at the boundaries.
Gradient flow
Target widths \(\Delta\tau^*\) are computed from detached quantities only — no gradient flows to the value network. The loss gradient flows exclusively through \(\Delta\tau_{\text{prop}} = \tau_{n+1} - \tau_n\) (boundaries with gradient) to the proposal parameters, preserving the same gradient isolation as the Wasserstein-1 loss.
- Parameters:
qdf_exponent (
float) – Exponent applied to the QDF when computing target widths. Default-1.0gives the theoretically optimal \(q^{-1}\) target.divergence (
Literal['squared','kl','symkl']) – Divergence used to measure the gap between proposal and target widths. One of'squared'(MSE, default),'kl'(\(\text{KL}(\text{target}\,\|\,\text{proposal})\)), or'symkl'(symmetrised KL).tail_weight (
float) – Up-weight applied to tail quantile slots for the'squared'divergence only (default2.0). Slots with \(\hat\tau < \text{tail\_alpha}\) or \(\hat\tau > 1 - \text{tail\_alpha}\) receive this weight; all others receive weight 1. Not applied for'kl'or'symkl'because a non-uniform slot weight breaks KL non-negativity.tail_alpha (
float) – Boundary between tail and interior regions (default0.2).eps (
float) – Small constant for numerical stability (default1e-6).reduction (
str) –'none','mean', or'sum'(default'mean').
- Shape conventions (same as
QuantileProposalLoss): quantile:(B, N-1, H, D)— not used (kept for interface compat).quantile_hats:(B, N, H, D)— detached \(\hat{Q}\) at midpoints.taus:(B, N+1, H)— tau boundaries with gradient from proposal.
- forward(quantile, quantile_hats, taus)#
Compute the SQDA proposal loss.
- Parameters:
quantile (
Tensor) – Unused; shape(B, N-1, H, D). Kept for interface compatibility withQuantileProposalLoss.quantile_hats (
Tensor) – Detached quantile predictions at midpoints, shape(B, N, H, D).taus (
Tensor) – Tau boundaries with gradient from the proposal network, shape(B, N+1, H).
- Return type:
- Returns:
Scalar loss (or per-element tensor when
reduction='none').
ML parametric models#
- class twiga.models.ml.gausscatboost_model.GAUSSCATBOOSTConfig(**data)#
Bases:
CATBOOSTConfigConfiguration for the Gaussian CatBoost probabilistic model.
Extends
CATBOOSTConfigwith:Tighter hyperparameter search bounds for faster convergence.
od_type/od_waitfor built-in overfitting detection.virtual_ensembles_countto control the number of virtual ensemble snapshots used to estimate epistemic uncertainty.
- Variables:
name – Fixed to
"gausscatboost".od_type – Overfitting detection strategy (
"Iter"or"IncToDec"). Active only when an eval set is provided toGAUSSCATBOOSTModel.fit().od_wait – Number of iterations without improvement before early stopping triggers.
virtual_ensembles_count – Number of virtual ensemble snapshots used by
predict_with_uncertainty(). Not passed toCatBoostRegressor(excluded frommodel_dump).
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- search_space: BaseSearchSpace#
- class twiga.models.ml.gausscatboost_model.GAUSSCATBOOSTModel(model_config=None)#
Bases:
BaseRegressorProbabilistic CatBoost model predicting mean and uncertainty.
Uses
loss_function="RMSEWithUncertainty"which maximises the log-likelihood of a Normal distribution, jointly learning the mean μ and aleatoric scale σ in a single model.For multi-horizon forecasting (H output steps), one
CatBoostRegressoris trained per output, yielding H models. This avoidsMultiOutputRegressorwhile preserving support forvirtual_ensembles_predict.Uncertainty decomposition via
predict_with_uncertainty()returns three components per output step:mean- predicted mean (same aspredict()μ output).knowledge_uncertainty- epistemic (model) uncertainty estimated from variance across virtual ensemble snapshots.data_uncertainty- aleatoric uncertainty encoded in the model’s second output (exp of predicted log-σ).
- Parameters:
model_config (
GAUSSCATBOOSTConfig|None) – Model configuration. Defaults toGAUSSCATBOOSTConfig.
Example:
model = GAUSSCATBOOSTModel() model.fit(X_train, y_train) mu, sigma = model.predict(X_test) unc = model.predict_with_uncertainty(X_test) # (B, L, H, 3)
- fit(X, y, eval_set=None, verbose=False, trial=None)#
Fit one
RMSEWithUncertaintymodel per output step.- Parameters:
X (
ndarray) – Shape(B, L, F)- batch × sequence × features.y (
ndarray) – Shape(B, L, H)- batch × sequence × horizons.eval_set (
tuple[ndarray,ndarray] |None) – Optional(X_val, y_val)for early stopping. When provided,od_type/od_waitfrom the config activate CatBoost’s overfitting detector.verbose (
bool) – Whether to print CatBoost training logs.trial (
Any|None) – Optuna trial for mid-training pruning. When provided alongsideeval_set, aCatBoostPruningCallbackis injected into the first per-output model so Hyperband can prune unpromising trials early.
- Return type:
- Returns:
Self for method chaining.
- Raises:
ValueError – If
Xoryare not 3-dimensional.
- predict(X)#
Predict mean (μ) and scale (σ) for all output steps.
- Parameters:
X (
ndarray) – Shape(B, L, F).- Return type:
- Returns:
Tuple
(mu, sigma)each of shape(B, L, H).sigma = exp(log_sigma)is guaranteed positive.- Raises:
ValueError – If the model has not been fitted or X is not 3-D.
- predict_with_uncertainty(X)#
Return mean, epistemic, and aleatoric uncertainty per output.
Uses
virtual_ensembles_predict(prediction_type="TotalUncertainty")which - for models trained withRMSEWithUncertainty- returns three values per sample:Column 0: mean prediction (μ).
Column 1: knowledge (epistemic) uncertainty - variance across virtual ensemble snapshots.
Column 2: data (aleatoric) uncertainty - derived from the predicted log-σ output.
- Parameters:
X (
ndarray) – Shape(B, L, F).- Return type:
- Returns:
Array of shape
(B, L, H, 3)- last axis is[mean, knowledge_uncertainty, data_uncertainty].- Raises:
ValueError – If the model has not been fitted.
- set_fit_request(*, eval_set='$UNCHANGED$', trial='$UNCHANGED$', verbose='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
eval_setparameter infit.- trialstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
trialparameter infit.- verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
verboseparameter infit.
Returns#
- selfobject
The updated object.
- set_predict_request()#
No-op.
Calling this method has no effect.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns#
- selfobject
The updated object.
- update(trial)#
Rebuild model from an Optuna trial’s suggested hyperparameters.
- Parameters:
trial – Optuna trial object.
- Return type:
- class twiga.models.ml.ngboostnormal_model.NGBOOSTNORMALConfig(**data)#
Bases:
BaseModelConfigConfiguration for the NGBoost Normal probabilistic forecasting model.
Uses natural gradient boosting with a Gaussian predictive distribution N(μ, σ²). Unlike the two-stage GAUSS* models, NGBoost jointly optimises μ and σ via the natural gradient of the chosen scoring rule, which tends to produce better-calibrated uncertainty estimates.
- Variables:
name – Model identifier fixed to
"ngboostnormal".domain – Domain fixed to
"ml". Excluded from tuning.n_estimators – Number of boosting iterations.
learning_rate – Shrinkage applied to each tree.
minibatch_frac – Row-subsample fraction per iteration.
col_sample – Column-subsample fraction per iteration.
seed – Seed for reproducibility.
score – Scoring rule -
"LogScore"(MLE) or"CRPScore".search_space – Hyperparameter search space for Optuna tuning.
Example
>>> from twiga.models.ml import NGBOOSTNORMALConfig >>> cfg = NGBOOSTNORMALConfig(n_estimators=200, learning_rate=0.05)
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- search_space: BaseSearchSpace#
- class twiga.models.ml.ngboostnormal_model.NGBOOSTNORMALModel(model_config=None)#
Bases:
BaseNGBoostRegressorNGBoost probabilistic model with a Normal (Gaussian) predictive distribution.
Wraps
ngboost.NGBRegressorwithDist=Normal, training one regressor per flattened output column to support multi-horizon and multi-target forecasting.The model jointly learns the conditional mean μ and standard deviation σ via natural gradient boosting on the selected scoring rule (log-likelihood or CRPS). Unlike
GAUSSCATBOOSTModel, which uses CatBoost’s nativeRMSEWithUncertaintyloss, NGBoost directly maximises the scoring rule via natural gradients.- Variables:
model_config – Instance of
NGBOOSTNORMALConfig.- Parameters:
model_config (
NGBOOSTNORMALConfig|None) – Model configuration. Defaults toNGBOOSTNORMALConfig.
Example
>>> import numpy as np >>> from twiga.models.ml import NGBOOSTNORMALModel >>> model = NGBOOSTNORMALModel() >>> X = np.random.randn(100, 10, 5) >>> y = np.random.randn(100, 48, 1) >>> model.fit(X, y) >>> loc, scale = model.predict(X) >>> loc.shape, scale.shape ((100, 48, 1), (100, 48, 1))
- forecast(x)#
Return Normal distribution parameters.
- predict(X)#
Predict μ and σ for each forecast horizon.
- set_fit_request(*, eval_set='$UNCHANGED$', trial='$UNCHANGED$', verbose='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
eval_setparameter infit.- trialstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
trialparameter infit.- verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
verboseparameter infit.
Returns#
- selfobject
The updated object.
- set_predict_request()#
No-op.
Calling this method has no effect.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns#
- selfobject
The updated object.
- class twiga.models.ml.ngboostlognormal_model.NGBOOSTLOGNORMALConfig(**data)#
Bases:
BaseModelConfigConfiguration for the NGBoost LogNormal probabilistic forecasting model.
Uses natural gradient boosting with a log-normal predictive distribution. Suitable for strictly positive targets such as solar irradiance, wind speed, electricity price, or non-negative load.
Follows the
scipy.stats.lognormparameter convention:scale = exp(μ_log)- geometric mean (location-like quantity).s = σ_log- standard deviation in log-space (shape parameter).
- Variables:
name – Model identifier fixed to
"ngboostlognormal".domain – Domain fixed to
"ml". Excluded from tuning.n_estimators – Number of boosting iterations.
learning_rate – Shrinkage applied to each tree.
minibatch_frac – Row-subsample fraction per iteration.
col_sample – Column-subsample fraction per iteration.
seed – Seed for reproducibility.
score – Scoring rule -
"LogScore"(MLE) or"CRPScore".search_space – Hyperparameter search space for Optuna tuning.
Example
>>> from twiga.models.ml import NGBOOSTLOGNORMALConfig >>> cfg = NGBOOSTLOGNORMALConfig(n_estimators=300, score="CRPScore")
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- search_space: BaseSearchSpace#
- class twiga.models.ml.ngboostlognormal_model.NGBOOSTLOGNORMALModel(model_config=None)#
Bases:
BaseNGBoostRegressorNGBoost probabilistic model with a LogNormal predictive distribution.
Wraps
ngboost.NGBRegressorwithDist=LogNormal, training one regressor per flattened output column for multi-horizon / multi-target support.Parameter convention (follows
scipy.stats.lognorm):loc(exposed as"loc") -scale = exp(μ_log), the geometric mean.scale(exposed as"scale") -s = σ_log, the log-space std-dev.
Use this model when the target variable is strictly positive and its logarithm is approximately Gaussian (e.g. solar irradiance, wind power, non-negative energy prices).
- Variables:
model_config – Instance of
NGBOOSTLOGNORMALConfig.- Parameters:
model_config (
NGBOOSTLOGNORMALConfig|None) – Model configuration. Defaults toNGBOOSTLOGNORMALConfig.
Example
>>> import numpy as np >>> from twiga.models.ml import NGBOOSTLOGNORMALModel >>> model = NGBOOSTLOGNORMALModel() >>> X = np.random.randn(100, 10, 5) >>> y = np.abs(np.random.randn(100, 48, 1)) + 0.1 # strictly positive >>> model.fit(X, y) >>> loc, scale = model.predict(X) >>> loc.shape, scale.shape ((100, 48, 1), (100, 48, 1))
- forecast(x)#
Return LogNormal distribution parameters.
- predict(X)#
Predict geometric mean and log-space std-dev for each forecast horizon.
- Parameters:
X (
ndarray) – Input features of shape(n_samples, seq_len, n_features).- Return type:
- Returns:
Tuple
(loc, scale)where –loc = exp(μ_log)- geometric mean (strictly positive), shape(n_samples, horizon, n_targets).scale = σ_log- standard deviation in log-space, shape(n_samples, horizon, n_targets).
- set_fit_request(*, eval_set='$UNCHANGED$', trial='$UNCHANGED$', verbose='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
eval_setparameter infit.- trialstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
trialparameter infit.- verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
verboseparameter infit.
Returns#
- selfobject
The updated object.
- set_predict_request()#
No-op.
Calling this method has no effect.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns#
- selfobject
The updated object.
- class twiga.models.ml.ngboostexponential_model.NGBOOSTEXPONENTIALConfig(**data)#
Bases:
BaseModelConfigConfiguration for the NGBoost Exponential probabilistic forecasting model.
Uses natural gradient boosting with an exponential predictive distribution. The exponential distribution has a single parameter
scale = 1 / λ, where λ is the rate. Both the mean and the standard deviation equalscale.Suited for non-negative targets exhibiting exponential decay or inter-arrival times - e.g. rare demand events, sparse consumption spikes, or inter-event durations in energy systems.
- Variables:
name – Model identifier fixed to
"ngboostexponential".domain – Domain fixed to
"ml". Excluded from tuning.n_estimators – Number of boosting iterations.
learning_rate – Shrinkage applied to each tree.
minibatch_frac – Row-subsample fraction per iteration.
col_sample – Column-subsample fraction per iteration.
seed – Seed for reproducibility.
score – Scoring rule -
"LogScore"(MLE) or"CRPScore".search_space – Hyperparameter search space for Optuna tuning.
Example
>>> from twiga.models.ml import NGBOOSTEXPONENTIALConfig >>> cfg = NGBOOSTEXPONENTIALConfig(n_estimators=400, score="CRPScore")
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- search_space: BaseSearchSpace#
- class twiga.models.ml.ngboostexponential_model.NGBOOSTEXPONENTIALModel(model_config=None)#
Bases:
BaseNGBoostRegressorNGBoost probabilistic model with an Exponential predictive distribution.
Wraps
ngboost.NGBRegressorwithDist=Exponential, training one regressor per flattened output column for multi-horizon / multi-target support.The exponential distribution has one parameter,
scale = 1 / λ, which equals both the mean and the standard deviation. Both"loc"and"scale"in the returned forecast dict are set to this parameter.Use this model for non-negative targets with memoryless, decay-like behaviour - inter-arrival durations, sparse demand spikes, or short-term outage durations.
- Variables:
model_config – Instance of
NGBOOSTEXPONENTIALConfig.- Parameters:
model_config (
NGBOOSTEXPONENTIALConfig|None) – Model configuration. Defaults toNGBOOSTEXPONENTIALConfig.
Example
>>> import numpy as np >>> from twiga.models.ml import NGBOOSTEXPONENTIALModel >>> model = NGBOOSTEXPONENTIALModel() >>> X = np.random.randn(100, 10, 5) >>> y = np.random.exponential(scale=2.0, size=(100, 48, 1)) >>> model.fit(X, y) >>> loc, scale = model.predict(X) >>> loc.shape, scale.shape ((100, 48, 1), (100, 48, 1))
- forecast(x)#
Return Exponential distribution parameters.
- Parameters:
x (
ndarray) – Input features of shape(n_samples, seq_len, n_features).- Return type:
- Returns:
Dictionary with –
"loc": predictedscale = 1/λ(the mean), always > 0, shape(n_samples, horizon, n_targets)."scale": same as"loc"- for the exponential distribution, mean and std-dev are identical.
- predict(X)#
Predict the exponential rate parameter for each forecast horizon.
- Parameters:
X (
ndarray) – Input features of shape(n_samples, seq_len, n_features).- Return type:
- Returns:
Tuple
(loc, scale)where both arrays equal the predictedscale = 1 / λ(the mean and std-dev of the exponential distribution), each of shape(n_samples, horizon, n_targets).
- set_fit_request(*, eval_set='$UNCHANGED$', trial='$UNCHANGED$', verbose='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- eval_setstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
eval_setparameter infit.- trialstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
trialparameter infit.- verbosestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
verboseparameter infit.
Returns#
- selfobject
The updated object.
- set_predict_request()#
No-op.
Calling this method has no effect.
Returns#
- selfobject
The updated object.
- set_score_request(*, sample_weight='$UNCHANGED$')#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters#
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
Returns#
- selfobject
The updated object.
ML distribution utilities#
- twiga.distributions.ml.utils.interpolate_quantile(predictions, sorted_quantiles, target_quantile=0.5)#
Interpolate predictions at a target quantile using linear interpolation.
- Parameters:
predictions (
ndarray[tuple[Any,...],dtype[double]]) – Array of predictions with shape (B, Q, C), where B is batch size, Q is number of quantiles, and C is number of channels.sorted_quantiles (
list[float] |ndarray[tuple[Any,...],dtype[double]]) – Sorted list or array of quantile levels (e.g., [0.1, 0.3, 0.7]).target_quantile (
float) – The target quantile to interpolate (default: 0.5 for median).
- Return type:
- Returns:
Interpolated predictions at target_quantile with shape (B, C).
- Raises:
ValueError – If sorted_quantiles is empty, predictions has incorrect shape, number of quantiles does not match predictions, quantiles are not sorted, or target_quantile is outside the range of sorted_quantiles.
- twiga.distributions.ml.utils.get_median_prediction(predictions, quantiles)#
Compute the median prediction (quantile = 0.5) for given predictions.
- Parameters:
predictions (
ndarray[tuple[Any,...],dtype[double]]) – Array of predictions with shape (B, Q, C), where B is batch size, Q is number of quantiles, and C is number of channels.quantiles (
list[float] |ndarray[tuple[Any,...],dtype[double]]) – List or array of quantile levels (e.g., [0.1, 0.3, 0.7]).
- Return type:
- Returns:
Median predictions with shape (B, C).
- Raises:
ValueError – If quantiles is empty, predictions has incorrect shape, or number of quantiles does not match predictions.
- twiga.distributions.ml.utils.get_sigma_prediction(predictions, quantiles)#
Compute sigma using predictions at quantiles 0.25 and 0.75.
Sigma is calculated as (upper_prediction - lower_prediction) / (2 * 0.6745), where upper_prediction and lower_prediction are at quantiles 0.75 and 0.25, respectively, and 0.6745 is the z-score for the 25th/75th percentiles.
- Parameters:
predictions (
ndarray[tuple[Any,...],dtype[double]]) – Array of predictions with shape (B, Q, C), where B is batch size, Q is number of quantiles, and C is number of channels.quantiles (
list[float] |ndarray[tuple[Any,...],dtype[double]]) – List or array of quantile levels (e.g., [0.1, 0.3, 0.7]).
- Return type:
- Returns:
Sigma predictions with shape (B, C).
- Raises:
ValueError – If quantiles is empty, predictions has incorrect shape, number of quantiles does not match predictions, or interpolation is not possible for quantiles 0.25 or 0.75.