Conformal Prediction & Uncertainty Quantification#

Source Files
  • twiga/distributions/conformal/core.py

  • twiga/distributions/conformal/base.py

  • twiga/distributions/conformal/cqr.py

  • twiga/distributions/conformal/residual_conformal

  • twiga/core/config/base.py

Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees. Unlike Bayesian methods, conformal prediction makes no assumptions about the data distribution - it only requires exchangeability of calibration data.

Because calibration needs only a set of predictions and ground truth from a held-out split, conformal prediction wraps any trained Twiga model without retraining - including plain point-forecast ML models like CatBoost or XGBoost. This makes it the recommended path for adding uncertainty estimates to an existing point forecaster.

How It Works#

        graph LR
    A[Train Model] --> B[Calibrate on Held-Out Data]
    B --> C[Compute Non-Conformity Scores]
    C --> D["Calculate Threshold q̂ at (1-α) quantile"]
    D --> E[Generate Intervals on New Data]
    E --> F["[prediction - q̂, prediction + q̂]"]
    
  1. Train the model on training data

  2. Calibrate using held-out data to compute non-conformity scores

  3. Calculate threshold \(\hat{q}\) as the \((1-\alpha)\)-quantile of the scores

  4. Generate intervals on new data using the calibrated threshold

The resulting intervals have guaranteed marginal coverage of at least \(1-\alpha\).

Class Hierarchy#

        classDiagram
    class BaseConformal {
        <<abstract>>
        +alpha: float
        +q_hat: float | ndarray
        +calibrate(*calib_args, axis)
        +calculate_conformal_quantile(scores, axis)
        +get_scores()*
        +generate_intervals()*
    }

    class SplitConformal {
        +score_type: "res" | "sign-res"
        +get_scores(predicts, targets)
        +generate_intervals(mu_pred)
    }

    class ConformalQuantileRegressor {
        +score_type: "scaled" | "unscaled"
        +get_scores(lower_q, upper_q, targets)
        +generate_intervals(lower_q, upper_q)
    }

    class ConformalResidualFitting {
        +score_type: "res" | "sign-res"
        +get_scores(predicts, sigma, targets)
        +generate_intervals(loc, sigma)
    }

    class Conformal {
        <<factory>>
        +__new__(method, score_type, alpha)
    }

    BaseConformal <|-- SplitConformal
    BaseConformal <|-- ConformalQuantileRegressor
    BaseConformal <|-- ConformalResidualFitting
    Conformal ..> SplitConformal : creates
    Conformal ..> ConformalQuantileRegressor : creates
    Conformal ..> ConformalResidualFitting : creates
    

Factory: Conformal#

The Conformal class in twiga/distributions/conformal/core.py is a factory that selects the appropriate method:

from twiga.distributions.conformal.core import Conformal

# Creates a SplitConformal instance
cp = Conformal(method="residual", score_type="res", alpha=0.1)

# Creates a ConformalQuantileRegressor instance
cqr = Conformal(method="quantile", score_type="scaled", alpha=0.1)

# Creates a ConformalResidualFitting instance
crf = Conformal(method="residual-fitting", score_type="res", alpha=0.1)

Configuration#

Conformal prediction is configured via ConformalConfig:

from twiga.core.config import ConformalConfig

config = ConformalConfig(
    method="residual",       # "residual", "quantile", "residual-fitting"
    score_type="res",        # method-dependent (see table below)
    alpha=0.1,               # significance level (0 < alpha < 1)
)

Parameter

Type

Constraints

Description

method

Literal["residual", "quantile", "residual-fitting"]

Required

Conformal prediction method

score_type

Literal["scaled", "unscaled", "res", "sign-res"]

Required

Non-conformity score type

alpha

float

0 < alpha < 1

Significance level (\(1-\alpha\) = target coverage)

Valid Score Types per Method#

Method

Class

Valid Score Types

Description

"residual"

SplitConformal

"res", "sign-res"

Absolute or signed residuals

"quantile"

ConformalQuantileRegressor

"scaled", "unscaled"

CQR with or without scaling

"residual-fitting"

ConformalResidualFitting

"res", "sign-res"

Scale-adapted residuals

Methods#

Split Conformal (method="residual")#

The simplest method. Computes non-conformity scores as residuals between predictions and targets.

Score types:

  • "res": \(s_i = |y_i - \hat{y}_i|\) (absolute residuals)

  • "sign-res": \(s_i = y_i - \hat{y}_i\) (signed residuals)

Intervals: \([\hat{y} - \hat{q}, \; \hat{y} + \hat{q}]\)

from twiga.distributions.conformal.base import SplitConformal

cp = SplitConformal(score_type="res", alpha=0.1)
cp.calibrate(predictions, targets)
lower, upper = cp.generate_intervals(new_predictions)

Tip

Split conformal works with any point prediction model and is the easiest to set up. Use this as a starting point.

Conformal Quantile Regression (method="quantile")#

Requires a quantile regression model that produces lower and upper quantile predictions.

Score types:

  • "unscaled": \(s_i = \max(q_{lo,i} - y_i, \; y_i - q_{hi,i})\)

  • "scaled": \(s_i = \max\left(\frac{q_{lo,i} - y_i}{q_{hi,i} - q_{lo,i}}, \; \frac{y_i - q_{hi,i}}{q_{hi,i} - q_{lo,i}}\right)\)

Intervals (unscaled): \([q_{lo} - \hat{q}, \; q_{hi} + \hat{q}]\)

Intervals (scaled): \([q_{lo} - \hat{q} \cdot (q_{hi} - q_{lo}), \; q_{hi} + \hat{q} \cdot (q_{hi} - q_{lo})]\)

from twiga.distributions.conformal.cqr import ConformalQuantileRegressor

cqr = ConformalQuantileRegressor(score_type="scaled", alpha=0.1)
cqr.calibrate(lower_quantile, upper_quantile, targets)
lower, upper = cqr.generate_intervals(lower_quantile_new, upper_quantile_new)

Note

Scaled CQR adapts interval width based on the model’s quantile spread, producing narrower intervals where the model is more confident. Use with quantile regression models.

Conformal Residual Fitting (method="residual-fitting")#

Requires a model that produces both a point prediction (\(\mu\)) and a scale estimate (\(\sigma\)).

Score types:

  • "res": \(s_i = \frac{|\hat{y}_i - y_i|}{\sigma_i}\) (absolute)

  • "sign-res": \(s_i = \frac{\hat{y}_i - y_i}{\sigma_i}\) (signed, can be negative)

Intervals: \([\mu - \sigma \cdot \hat{q}, \; \mu + \sigma \cdot \hat{q}]\)

from twiga.distributions.conformal.crc import ConformalResidualFitting

crf = ConformalResidualFitting(score_type="res", alpha=0.1)
crf.calibrate(predictions, sigma, targets)
lower, upper = crf.generate_intervals(new_predictions, new_sigma)

Note

Use residual-fitting with probabilistic models like GAUSSCATBOOSTModel or neural models with sigma outputs like MLPGAFModel.

Integration with TwigaForecaster#

The TwigaForecaster manages conformal prediction end-to-end:

from twiga.core.config import (
    ConformalConfig, DataPipelineConfig, ForecasterConfig,
)
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.xgboost_model import XGBOOSTConfig

# 1. Configure with conformal prediction
conformal_config = ConformalConfig(
    method="residual",
    score_type="res",
    alpha=0.1,
)

forecaster = TwigaForecaster(
    data_params=data_config,
    model_params=[XGBOOSTConfig()],
    train_params=train_config,
    conformal_params=conformal_config,
)

# 2. Train
forecaster.fit(train_df=train_df, val_df=val_df)

# 3. Calibrate on held-out data
forecaster.calibrate(calibrate_df=calibration_df)

# 4. Generate prediction intervals
interval_dict, times = forecaster.predict_interval(test_df=test_df)

for model_name, (lower, point, upper) in interval_dict.items():
    print(f"{model_name}: coverage target = {1 - conformal_config.alpha:.0%}")

# 5. Evaluate interval quality
predictions_df, metrics_df = forecaster.evaluate_interval_forecast(test_df=test_df)
# metrics_df includes: picp (coverage), winkle-score, ace, nmpi, cwe

Calibration Flow#

        sequenceDiagram
    participant User
    participant TF as TwigaForecaster
    participant Conf as Conformal Factory
    participant Model

    User->>TF: calibrate(calibrate_df)
    TF->>Model: predict(calibrate_df)
    Model-->>TF: predictions
    TF->>TF: get_ground_truth()
    loop For each model
        TF->>Conf: Conformal(method, score_type, alpha)
        Conf-->>TF: conformal_instance
        TF->>Conf: calibrate(predictions, targets)
        Conf->>Conf: get_scores() → calculate_conformal_quantile()
        Conf-->>TF: calibrated (q_hat set)
        TF->>TF: store in self.conformal[model_name]
    end
    

API Reference#

class twiga.distributions.conformal.core.Conformal(method, score_type='res', alpha=0.1)#

Bases: object

Factory class to create method-specific conformal predictors.

class twiga.distributions.conformal.base.BaseConformal(alpha=0.1)#

Bases: ABC

Abstract base class for Conformal Prediction in Regression.

Provides core functionality for constructing prediction intervals with finite-sample coverage guarantees.

Variables:
  • alpha (float) – Significance level for prediction intervals (0 < alpha < 1).

  • q_hat (float | np.ndarray) – Calibrated threshold(s) for intervals.

Parameters:

alpha (float) – Significance level (0 < alpha < 1). Defaults to 0.1.

Raises:

ValueError – If alpha is not in (0, 1).

__init__(alpha=0.1)#

Initializes base conformal predictor with validation.

calculate_conformal_quantile(scores, axis=0)#

Computes (1-alpha)-adjusted quantile of non-conformity scores.

Implements conformal quantile adjustment from Lei et al. (2017).

Parameters:
  • scores (ndarray) – Non-conformity scores array

  • axis (int) – Quantile computation axis. Defaults to 0.

Return type:

float | ndarray

Returns:

Quantile values for interval construction

Raises:

ValueError – For empty scores array

calibrate(*calib_args, axis=0)#

Calibrates conformal thresholds using provided arguments.

Parameters:
  • *calib_args – Implementation-specific calibration data

  • axis (int) – Axis for quantile computation. Defaults to 0.

Raises:

ValueError – If calibration data validation fails

Return type:

None

abstractmethod generate_intervals(*pred_args)#

Generates prediction intervals from inputs.

Must be implemented by concrete subclasses.

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple of (lower bounds, upper bounds)

abstractmethod get_scores(*args)#

Computes non-conformity scores from inputs.

Must be implemented by concrete subclasses.

Return type:

ndarray

Returns:

Array of non-conformity scores

class twiga.distributions.conformal.base.SplitConformal(score_type='res', alpha=0.1)#

Bases: BaseConformal

Implements residual-based conformal prediction for regression tasks.

Variables:

score_type (str) – Type of non-conformity score (‘res’ or ‘sign-res’).

generate_intervals(mu_pred)#

Generates prediction intervals.

Parameters:

mu_pred (ndarray) – Model predictions.

Return type:

tuple[ndarray, ndarray]

Returns:

tuple[np.ndarray, np.ndarray] – Lower and upper bounds.

get_scores(predicts, targets)#

Computes non-conformity scores based on residuals.

Parameters:
  • predicts (ndarray) – Model predictions.

  • targets (ndarray) – Ground truth targets.

Return type:

ndarray

Returns:

np.ndarray – Non-conformity scores.

class twiga.distributions.conformal.cqr.ConformalQuantileRegressor(score_type='scaled', alpha=0.1)#

Bases: BaseConformal

Conformal quantile regression for uncertainty estimation.

Implements conformal prediction intervals for quantile regression models using either scaled or unscaled non-conformity scores.

Variables:
  • score_type (str) – Type of non-conformity score (‘scaled’ or ‘unscaled’)

  • alpha (float) – Significance level for prediction intervals

  • q_hat (float | np.ndarray) – Calibrated threshold(s)

  • _EPS (float) – Numerical stability constant

Parameters:
  • score_type (str) – Score calculation method. Defaults to ‘scaled’.

  • alpha (float) – Significance level (0 < alpha < 1). Defaults to 0.1.

Raises:

ValueError – For invalid score_type or alpha values.

generate_intervals(lower_quantile, upper_quantile)#

Generates calibrated prediction intervals.

Parameters:
  • lower_quantile (ndarray) – Predicted lower quantiles

  • upper_quantile (ndarray) – Predicted upper quantiles

Return type:

tuple[ndarray, ndarray]

Returns:

Tuple of (lower intervals, upper intervals)

get_scores(lower_quantile, upper_quantile, targets)#

Computes non-conformity scores for quantile predictions.

Parameters:
  • lower_quantile (ndarray) – Predicted lower quantiles

  • upper_quantile (ndarray) – Predicted upper quantiles

  • targets (ndarray) – Ground truth values

Return type:

ndarray

Returns:

Array of non-conformity scores

class twiga.distributions.conformal.crc.ConformalResidualFitting(score_type='res', alpha=0.1)#

Bases: BaseConformal

Conformal residual fitting for uncertainty estimation.

Implements conformal prediction intervals using residual-based scoring with optional scale adaptation.

Variables:
  • score_type (str) – Score calculation method (‘res’ or ‘sign-res’)

  • alpha (float) – Significance level for intervals

  • q_hat (float | np.ndarray) – Calibrated threshold(s)

Parameters:
  • score_type (str) – Residual scoring method. Defaults to ‘res’.

  • alpha (float) – Significance level (0 < alpha < 1). Defaults to 0.1.

Raises:

ValueError – For invalid score_type or alpha values.

__init__(score_type='res', alpha=0.1)#

Initializes residual fitter with validation.

generate_intervals(loc, sigma)#

Generates calibrated prediction intervals.

Parameters:
Return type:

tuple[ndarray, ndarray]

Returns:

Tuple of (lower intervals, upper intervals)

get_scores(predicts, sigma, targets)#

Computes scaled residual scores.

Parameters:
  • predicts (ndarray) – Model predictions

  • sigma (ndarray) – Scale factors

  • targets (ndarray) – Ground truth values

Return type:

ndarray

Returns:

Array of non-conformity scores