Conformal Prediction & Uncertainty Quantification#
Source Files
twiga/distributions/conformal/core.pytwiga/distributions/conformal/base.pytwiga/distributions/conformal/cqr.pytwiga/distributions/conformal/residual_conformaltwiga/core/config/base.py
Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees. Unlike Bayesian methods, conformal prediction makes no assumptions about the data distribution - it only requires exchangeability of calibration data.
Because calibration needs only a set of predictions and ground truth from a held-out split, conformal prediction wraps any trained Twiga model without retraining - including plain point-forecast ML models like CatBoost or XGBoost. This makes it the recommended path for adding uncertainty estimates to an existing point forecaster.
How It Works#
graph LR
A[Train Model] --> B[Calibrate on Held-Out Data]
B --> C[Compute Non-Conformity Scores]
C --> D["Calculate Threshold q̂ at (1-α) quantile"]
D --> E[Generate Intervals on New Data]
E --> F["[prediction - q̂, prediction + q̂]"]
Train the model on training data
Calibrate using held-out data to compute non-conformity scores
Calculate threshold \(\hat{q}\) as the \((1-\alpha)\)-quantile of the scores
Generate intervals on new data using the calibrated threshold
The resulting intervals have guaranteed marginal coverage of at least \(1-\alpha\).
Class Hierarchy#
classDiagram
class BaseConformal {
<<abstract>>
+alpha: float
+calib_method: str
+lambda_: float
+q_hat: float | ndarray
+calibrate(*calib_args, axis)
+calculate_conformal_quantile(scores, axis, calib_timestamps)
+get_scores()*
+generate_intervals()*
}
class SplitConformal {
+score_type: "res" | "sign-res"
+get_scores(predicts, targets)
+generate_intervals(mu_pred)
}
class ConformalQuantileRegressor {
+score_type: "scaled" | "unscaled"
+get_scores(lower_q, upper_q, targets)
+generate_intervals(lower_q, upper_q)
}
class ConformalResidualFitting {
+score_type: "res" | "sign-res"
+q_hat_low: float | ndarray
+get_scores(predicts, sigma, targets)
+calibrate(predicts, sigma, targets, calib_timestamps)
+generate_intervals(loc, sigma)
}
class Conformal {
<<factory>>
+__new__(method, score_type, alpha, calib_method, lambda_)
}
BaseConformal <|-- SplitConformal
BaseConformal <|-- ConformalQuantileRegressor
BaseConformal <|-- ConformalResidualFitting
Conformal ..> SplitConformal : creates
Conformal ..> ConformalQuantileRegressor : creates
Conformal ..> ConformalResidualFitting : creates
Factory: Conformal#
The Conformal class in twiga/distributions/conformal/core.py is a factory that selects the appropriate method:
from twiga.distributions.conformal.core import Conformal
# Creates a SplitConformal instance
cp = Conformal(method="residual", score_type="res", alpha=0.1)
# Creates a ConformalQuantileRegressor instance
cqr = Conformal(method="quantile", score_type="scaled", alpha=0.1)
# Creates a ConformalResidualFitting instance with uniform calibration
crf = Conformal(method="residual-fitting", score_type="sign-res", alpha=0.1)
# Creates a ConformalResidualFitting instance with temporal weighting
crf_t = Conformal(
method="residual-fitting",
score_type="sign-res",
alpha=0.1,
calib_method="temporal",
lambda_=1.0,
)
calib_method and lambda_ are forwarded only when method="residual-fitting".
Configuration#
Conformal prediction is configured via ConformalConfig:
from twiga.core.config import ConformalConfig
# Standard uniform calibration:
config = ConformalConfig(
method="residual-fitting",
score_type="sign-res",
alpha=0.1,
)
# Temporal calibration weighting (addresses seasonal distribution shift):
config = ConformalConfig(
method="residual-fitting",
score_type="sign-res",
alpha=0.1,
calib_method="temporal",
lambda_=1.0,
)
# Via coverage probability:
config = ConformalConfig.from_coverage(
coverage=0.9,
method="residual",
score_type="res",
)
Parameter |
Type |
Constraints |
Description |
|---|---|---|---|
|
|
Required |
Conformal prediction method |
|
|
Required |
Non-conformity score type |
|
|
|
Significance level (1 minus alpha = target coverage) |
|
|
Default |
Quantile estimation method for calibration scores |
|
|
|
Exponential decay rate for temporal weighting |
ConformalConfig.from_coverage(coverage, **kwargs) is a convenience factory that converts a coverage probability to alpha = 1 - coverage and forwards any remaining keyword arguments to the constructor.
Valid Score Types per Method#
Method |
Class |
Valid Score Types |
Description |
|---|---|---|---|
|
|
|
Absolute or signed residuals |
|
|
|
CQR with or without scaling |
|
|
|
Scale-adapted residuals |
Methods#
Split Conformal (method="residual")#
The simplest method. Computes non-conformity scores as residuals between predictions and targets.
Score types:
"res": \(s_i = |y_i - \hat{y}_i|\) (absolute residuals)"sign-res": \(s_i = y_i - \hat{y}_i\) (signed residuals)
Intervals: \([\hat{y} - \hat{q}, \; \hat{y} + \hat{q}]\)
from twiga.distributions.conformal.base import SplitConformal
cp = SplitConformal(score_type="res", alpha=0.1)
cp.calibrate(predictions, targets)
lower, upper = cp.generate_intervals(new_predictions)
Tip
Split conformal works with any point prediction model and is the easiest to set up. Use this as a starting point.
Conformal Quantile Regression (method="quantile")#
Requires a quantile regression model that produces lower and upper quantile predictions.
Score types:
"unscaled": \(s_i = \max(q_{lo,i} - y_i, \; y_i - q_{hi,i})\)"scaled": \(s_i = \max\left(\frac{q_{lo,i} - y_i}{q_{hi,i} - q_{lo,i}}, \; \frac{y_i - q_{hi,i}}{q_{hi,i} - q_{lo,i}}\right)\)
Intervals (unscaled): \([q_{lo} - \hat{q}, \; q_{hi} + \hat{q}]\)
Intervals (scaled): \([q_{lo} - \hat{q} \cdot (q_{hi} - q_{lo}), \; q_{hi} + \hat{q} \cdot (q_{hi} - q_{lo})]\)
from twiga.distributions.conformal.cqr import ConformalQuantileRegressor
cqr = ConformalQuantileRegressor(score_type="scaled", alpha=0.1)
cqr.calibrate(lower_quantile, upper_quantile, targets)
lower, upper = cqr.generate_intervals(lower_quantile_new, upper_quantile_new)
Note
Scaled CQR adapts interval width based on the model’s quantile spread, producing narrower intervals where the model is more confident. Use with quantile regression models.
Conformal Residual Fitting (method="residual-fitting")#
Requires a model that produces both a point prediction (\(\mu\)) and a scale estimate (\(\sigma\)).
Score types:
"res": \(s_i = \frac{|\hat{y}_i - y_i|}{\sigma_i}\) (absolute)"sign-res": \(s_i = \frac{\hat{y}_i - y_i}{\sigma_i}\) (signed, can be negative)
Intervals (res): \([\mu - \sigma \cdot \hat{q}, \; \mu + \sigma \cdot \hat{q}]\)
Intervals (sign-res): \([\mu - \sigma \cdot \hat{q}^+, \; \mu - \sigma \cdot \hat{q}^-]\) where \(\hat{q}^+ = Q_{1-\alpha/2}\) and \(\hat{q}^- = Q_{\alpha/2}\) of the signed scores, giving asymmetric bounds with \(1-\alpha\) marginal coverage.
from twiga.distributions.conformal.crc import ConformalResidualFitting
# Uniform calibration (default):
crf = ConformalResidualFitting(score_type="sign-res", alpha=0.1)
crf.calibrate(predictions, sigma, targets)
lower, upper = crf.generate_intervals(new_predictions, new_sigma)
# Temporal calibration (pass calibration timestamps):
import pandas as pd
crf_t = ConformalResidualFitting(
score_type="sign-res",
alpha=0.1,
calib_method="temporal",
lambda_=1.0,
)
calib_timestamps = pd.to_datetime(calib_df["timestamp"])
crf_t.calibrate(predictions, sigma, targets, calib_timestamps=calib_timestamps)
lower, upper = crf_t.generate_intervals(new_predictions, new_sigma)
Note
Use residual-fitting with probabilistic models like GAUSSCATBOOSTModel or neural models with sigma outputs like MLPGAFModel.
Temporal Calibration Weighting#
Standard conformal prediction assumes exchangeability: calibration and test samples come from the same distribution. In rolling-origin forecasting this assumption can be violated when calibration data is drawn from a different season than the test window.
Temporal weighting (Tibshirani et al. 2019) addresses this by assigning higher weight to calibration samples that are more recent and therefore closer in distribution to the test period.
Weight formula:
where \(d_i \in [0, 1]\) is the normalised age of sample \(i\) (\(d_i = 0\) for the newest, \(d_i = 1\) for the oldest). Weights are normalised to sum to 1.
The weighted quantile selects the smallest score \(s_{(k)}\) such that:
Decay rate lambda_:
Value |
Effect |
|---|---|
|
Uniform weighting (same as |
|
Moderate recency bias (recommended starting point) |
|
Strong recency bias; upweights the most recent ~30 % of calibration data |
When to use temporal weighting
Use calib_method="temporal" when the calibration window is drawn from a different season than the test window, for example in expanding-window cross-validation where calibration is always at the training tail while the test window advances into a new season. Temporal weighting does not change the CV protocol and is compatible with any calib_size.
from twiga.core.config import ConformalConfig
from twiga.forecaster.core import TwigaForecaster
conformal_config = ConformalConfig(
method="residual-fitting",
score_type="sign-res",
alpha=0.1,
calib_method="temporal",
lambda_=1.0,
)
forecaster = TwigaForecaster(
data_params=data_config,
model_params=[model_cfg],
cv_params=exp_config,
conformal_params=conformal_config,
)
forecaster.fit(train_df=train_df)
forecaster.calibrate(calibrate_df=calib_df) # timestamps extracted automatically
lower, point, upper = forecaster.predict_interval(test_df=test_df)
TwigaForecaster.calibrate() extracts per-sample timestamps from the calibration data automatically and passes them to ConformalResidualFitting.calibrate() when calib_method="temporal".
Helper Functions#
The weighting utilities are available as standalone functions for custom workflows:
import pandas as pd
from twiga.distributions.conformal.base import (
calculate_temporal_weights,
calculate_weighted_conformal_value,
)
timestamps = pd.date_range("2021-09-01", periods=90, freq="D")
weights = calculate_temporal_weights(timestamps, lambda_=1.0)
# weights: ndarray of shape (90,) summing to 1, newest sample gets highest weight
threshold = calculate_weighted_conformal_value(scores, alpha=0.1, weights=weights)
# threshold: scalar (1-D scores) or ndarray of shape (H,) for horizon-wise scores
Integration with TwigaForecaster#
The TwigaForecaster manages conformal prediction end-to-end:
from twiga.core.config import (
ConformalConfig, DataPipelineConfig, ExperimentConfig,
)
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.xgboost_model import XGBOOSTConfig
# 1. Configure with conformal prediction
conformal_config = ConformalConfig(
method="residual",
score_type="res",
alpha=0.1,
)
forecaster = TwigaForecaster(
data_params=data_config,
model_params=[XGBOOSTConfig()],
cv_params=train_config,
conformal_params=conformal_config,
)
# 2. Train
forecaster.fit(train_df=train_df, val_df=val_df)
# 3. Calibrate on held-out data
forecaster.calibrate(calibrate_df=calibration_df)
# 4. Generate prediction intervals
interval_dict, times = forecaster.predict_interval(test_df=test_df)
for model_name, (lower, point, upper) in interval_dict.items():
print(f"{model_name}: coverage target = {1 - conformal_config.alpha:.0%}")
# 5. Evaluate interval quality
predictions_df, metrics_df = forecaster.evaluate_interval_forecast(test_df=test_df)
# metrics_df includes: picp (coverage), winkle-score, ace, nmpi, cwe
Calibration Flow#
sequenceDiagram
participant User
participant TF as TwigaForecaster
participant Conf as Conformal Factory
participant Model
User->>TF: calibrate(calibrate_df)
TF->>Model: predict(calibrate_df)
Model-->>TF: predictions
TF->>TF: get_ground_truth()
loop For each model
TF->>Conf: Conformal(method, score_type, alpha)
Conf-->>TF: conformal_instance
TF->>Conf: calibrate(predictions, targets)
Conf->>Conf: get_scores() → calculate_conformal_quantile()
Conf-->>TF: calibrated (q_hat set)
TF->>TF: store in self.conformal[model_name]
end
API Reference#
- twiga.distributions.conformal.base.calculate_conformal_value(scores, alpha, axis=0)#
Calculate the 1-alpha quantile of scores for conformal prediction using NumPy.
This function computes the threshold value (quantile) used to construct prediction sets based on the given non-conformity scores and significance level alpha. If the scores are empty or the quantile value exceeds 1, it returns an array of np.inf.
- Parameters:
- Return type:
- Returns:
np.ndarray – The threshold value used to construct prediction sets, with shape (scores.shape[1:]).
- Raises:
ValueError – If alpha is not between 0 and 1.
- twiga.distributions.conformal.base.calculate_temporal_weights(timestamps, lambda_)#
Compute exponentially decaying temporal weights for calibration samples.
Weights are proportional to
exp(-lambda_ * d_i)whered_iis the normalised age of sampleirelative to the most recent calibration point (d_i = 0for the newest sample,d_i = 1for the oldest). The resulting weights are normalised to sum to 1.A
lambda_of 0 recovers uniform weights. Larger values concentrate weight on more recent samples, reducing the influence of calibration samples that may have been drawn from a different seasonal regime.- Parameters:
- Return type:
- Returns:
1-D array of normalised weights summing to 1, shape
(N,).- Raises:
ValueError – If
lambda_is negative ortimestampsis empty.
- twiga.distributions.conformal.base.calculate_weighted_conformal_value(scores, alpha, weights)#
Compute the weighted (1-alpha) quantile of non-conformity scores.
Implements the weighted conformal prediction quantile of Tibshirani et al. (2019). For each horizon column, scores are sorted ascending and the smallest score whose cumulative normalised weight reaches
1 - alphais returned as the threshold.- Parameters:
- Return type:
- Returns:
Threshold(s) with shape
()(scalar) for 1-D input or(H,)for 2-D input. Returnsnp.inffor any horizon where the cumulative weight cannot reach1 - alpha.- Raises:
ValueError – If
alphais outside(0, 1)or shapes are incompatible.
- class twiga.distributions.conformal.base.BaseConformal(alpha=0.1, calib_method='uniform', lambda_=1.0)#
Bases:
ABCAbstract base class for Conformal Prediction in Regression.
Provides core functionality for constructing prediction intervals with finite-sample coverage guarantees.
- Variables:
alpha (float) – Significance level for prediction intervals (0 < alpha < 1).
calib_method (str) – Quantile estimation method —
"uniform"(default) or"temporal"(exponentially weighted by sample recency).lambda (float) – Exponential decay rate for temporal weighting. Ignored when
calib_method="uniform".q_hat (float | np.ndarray) – Calibrated threshold(s) for intervals.
- Parameters:
- Raises:
ValueError – If alpha is not in (0, 1) or calib_method is invalid.
- __init__(alpha=0.1, calib_method='uniform', lambda_=1.0)#
Initializes base conformal predictor with validation.
- calculate_conformal_quantile(scores, axis=0, calib_timestamps=None, alpha=None)#
Computes the (1-alpha)-adjusted quantile of non-conformity scores.
Dispatches to the uniform empirical quantile (Lei et al. 2017) or the temporally weighted quantile (Tibshirani et al. 2019) depending on
self.calib_method.- Parameters:
scores (
ndarray) – Non-conformity scores array with shape(N,)or(N, H).axis (
int) – Quantile computation axis (uniform method only). Defaults to 0.calib_timestamps (
Series|ndarray|None) – Calibration sample timestamps required whencalib_method="temporal". Ignored for"uniform".alpha (
float|None) – Significance level override. WhenNone(default)self.alphais used. Passself.alpha / 2for split-quantile calibration (e.g.sign-resscore type) to compute each tail at half the nominal level, achieving valid 1-alpha joint coverage.
- Return type:
- Returns:
Quantile values for interval construction.
- Raises:
ValueError – For empty scores or missing timestamps in temporal mode.
- calibrate(*calib_args, axis=0)#
Calibrates conformal thresholds using provided arguments.
- Parameters:
*calib_args – Implementation-specific calibration data
axis (
int) – Axis for quantile computation. Defaults to 0.
- Raises:
ValueError – If calibration data validation fails
- Return type:
- abstractmethod generate_intervals(*pred_args)#
Generates prediction intervals from inputs.
Must be implemented by concrete subclasses.
- class twiga.distributions.conformal.base.SplitConformal(score_type='res', alpha=0.1, calib_method='uniform', lambda_=1.0)#
Bases:
BaseConformalImplements residual-based conformal prediction for regression tasks.
- Variables:
score_type (str) – Type of non-conformity score (‘res’ or ‘sign-res’).
- generate_intervals(mu_pred)#
Generates prediction intervals.
- class twiga.distributions.conformal.cqr.ConformalQuantileRegressor(score_type='scaled', alpha=0.1)#
Bases:
BaseConformalConformal quantile regression for uncertainty estimation.
Implements conformal prediction intervals for quantile regression models using either scaled or unscaled non-conformity scores.
- Variables:
- Parameters:
- Raises:
ValueError – For invalid score_type or alpha values.
- generate_intervals(lower_quantile, upper_quantile)#
Generates calibrated prediction intervals.
- get_scores(lower_quantile, upper_quantile, targets)#
Computes non-conformity scores for quantile predictions.
- class twiga.distributions.conformal.crc.ConformalResidualFitting(score_type='res', alpha=0.1, calib_method='uniform', lambda_=1.0)#
Bases:
BaseConformalConformal residual fitting for uncertainty estimation.
Implements conformal prediction intervals using residual-based scoring with optional scale adaptation.
For
score_type="res"(default), symmetric intervals are built from the (1-alpha)-quantile of absolute scaled residuals.For
score_type="sign-res", asymmetric intervals are built using a split-quantile approach: the (1-alpha/2)-quantile of signed scores gives the lower bound, and the (alpha/2)-quantile gives the upper bound, achieving valid marginal coverage at level 1-alpha.Temporal weighting (
calib_method="temporal") applies exponentially decaying weights to calibration samples by recency, addressing distribution shift between the calibration and test periods (Tibshirani et al. 2019).- Variables:
score_type (str) – Score calculation method (‘res’ or ‘sign-res’).
alpha (float) – Significance level for intervals.
calib_method (str) –
"uniform"or"temporal".lambda (float) – Exponential decay rate for temporal weighting.
q_hat (float | np.ndarray) – Upper-tail calibrated threshold (lower bound).
q_hat_low (float | np.ndarray) – Lower-tail calibrated threshold (upper bound, sign-res only).
- Parameters:
- Raises:
ValueError – For invalid score_type or alpha values.
- __init__(score_type='res', alpha=0.1, calib_method='uniform', lambda_=1.0)#
Initializes residual fitter with validation.
- calibrate(predicts, sigma, targets, axis=0, calib_timestamps=None)#
Calibrates conformal thresholds from calibration data.
For
sign-res, computes split quantiles at alpha/2 and 1-alpha/2 to construct asymmetric intervals with valid 1-alpha marginal coverage. Forres, delegates to the base (1-alpha)-quantile of absolute scores.- Parameters:
predicts (
ndarray) – Model predictions on calibration set.sigma (
ndarray) – Scale factors on calibration set.targets (
ndarray) – Ground truth values on calibration set.axis (
int) – Axis for quantile computation. Defaults to 0.calib_timestamps (
Series|ndarray|None) – Calibration sample timestamps required whencalib_method="temporal". Ignored for"uniform".
- Return type:
- generate_intervals(loc, sigma)#
Generates calibrated prediction intervals.
For
sign-res, produces asymmetric bounds using the two split quantiles set bycalibrate(). Forres, produces symmetric bounds.