Quantile Regression#

What you’ll build

Prediction-interval forecasters using QR-LightGBM, QR-XGBoost, and FPQR-LightGBM, evaluated on PICP, NMPI, and Winkler score - giving you calibrated uncertainty estimates instead of point predictions.

Prerequisites

01 - Getting Started (DataPipelineConfig, ForecasterConfig, TwigaForecaster.fit)
05 - ML Point Forecasting (gradient-boosted trees)
07 - Neural Networks (helpful but optional)
Python: basic statistics, quartile concepts

Learning objectives

By the end of this notebook you will be able to:

Explain the pinball loss and why it trains models to predict specific quantiles
Configure QRLIGHTGBMConfig, QRXGBOOSTConfig, and FPQRLIGHTGBMConfig with custom quantile grids
Evaluate prediction intervals using PICP, NMPI, and Winkler score
Compare fixed-grid QR vs. FPQR and choose the right approach for your use case
Visualise quantile fans and interval coverage plots to diagnose calibration issues

1. Motivation: Why a Single Number Is Not Enough#

A point forecast gives one number per time step: the model’s best guess for net load at 14:30 tomorrow.
For many operational decisions that single number is not enough.

Energy dispatch example
A grid operator scheduling spinning reserves needs to know not just the expected net load but the range of plausible outcomes. If net load could realistically be 15 % higher than the median, reserves must cover that upside. Ignoring uncertainty leads to either:

under-reservation: system reliability at risk, or
over-reservation: unnecessarily expensive.

Prediction intervals address this by providing a lower and upper bound that should contain the true value with a specified probability - e.g. a 90 % prediction interval should cover the actual observation 90 % of the time over many forecasts.

Quantile Regression (QR) is one of the most practical approaches:

Train separate models (or a single multi-output model) for different quantile levels, e.g. q=0.05, 0.5, 0.95.
The q=0.05 output is the lower bound and q=0.95 the upper bound for a 90 % interval.
No distributional assumption is required - the model learns the conditional quantiles directly from data.
Conformal post-processing can be applied to provide coverage guarantees.

Key concept - quantile regression and pinball loss

A standard regression model minimises the mean squared error (MSE), which targets the conditional mean of the output. A quantile regression model instead minimises the pinball loss (also called quantile loss) for a chosen quantile level τ ∈ (0, 1):
L(y, q, τ) = τ · max(y − q, 0) + (1 − τ) · max(q − y, 0)
Concretely:

When τ = 0.5, the pinball loss is equivalent to mean absolute error - the model targets the median.

When τ = 0.05, over-predictions are penalised 19× more than under-predictions - the model learns the 5th percentile of the conditional distribution.

When τ = 0.95, under-predictions are penalised 19× more - the model learns the 95th percentile.

The pair (q₀.₀₅, q₀.₉₅) forms a 90 % prediction interval. No distributional assumption (Gaussian, etc.) is required - the model learns the quantile structure directly from data.

Key concept - prediction interval evaluation metrics

Three complementary metrics assess whether a prediction interval is both reliable and sharp:

Metric

Full name

Formula (sketch)

Ideal value

PICP

Prediction Interval Coverage Probability

fraction of actuals inside [lower, upper]

≥ 1 − α (e.g. ≥ 0.90)

NMPI

Normalised Mean Prediction Interval

mean(upper − lower) / range(y)

as small as possible

Winkler score

Winkler (1972)

width + penalty for misses

lower is better

PICP measures reliability: is the interval wide enough to contain the true value at the claimed frequency? A 90 % interval that only captures 70 % of actuals is miscalibrated.

NMPI measures sharpness: are the bounds tight? An interval covering ±∞ has perfect PICP but is useless in practice.

Winkler score synthesises both: it equals the interval width when the actual is inside, and adds a large penalty when the actual falls outside. Lower is better, and it rewards narrow-but-accurate intervals over wide-but-safe ones.

Raw QR bounds carry no formal coverage guarantee - empirical PICP may fall below 1 − α. For a guarantee, apply conformal calibration (NB09).

Key concept - FPQR vs fixed-grid QR

Standard quantile regression uses a fixed grid of τ values specified at training time (e.g. 0.05, 0.10, …, 0.95). The model learns a separate output for each τ - an independent pinball loss is minimised for each level.

Flexible Proportional Quantile Regression (FPQR) takes a different approach: a small QuantileProposal network learns to propose its own quantile grid adaptively, per sample and per horizon step. This has two practical consequences:

Coverage shape adapts to the data - if the conditional distribution is skewed or multimodal, FPQR can allocate more quantile mass where uncertainty is highest.

Point forecast quality improves - instead of reading off the median from a fixed grid, FPQR computes a weighted expectation ∑ τ̂ᵢ · qᵢ, which is a smoother and often more accurate point estimate.

The trade-off: fixed QR lets you audit specific quantile levels (e.g. “what is the 5th percentile?”) directly. FPQR returns an averaged grid that is harder to interpret at a specific level but better captures complex uncertainty shapes.

When to prefer FPQR: signals with skewed residuals (solar irradiance, electricity prices), large datasets, or when interval shape matters more than auditability of specific quantile levels.

2. Setup#

import warnings

from great_tables import GT
from lets_plot import LetsPlot
import pandas as pd
from sklearn.preprocessing import RobustScaler, StandardScaler

from twiga.core.plot.gt import twiga_gt, twiga_report

LetsPlot.setup_html()

from twiga.core.config import DataPipelineConfig, ForecasterConfig
from twiga.core.plot import (
    plot_forecast,
    plot_forecast_grid,
    plot_metrics_bar,
    plot_reliability_diagram,
)
from twiga.core.utils import configure, get_logger

warnings.filterwarnings("ignore")

configure()
log = get_logger("tutorials")

Load data#

data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)
# Restrict to 2019-2020 to keep tutorial execution fast
data = data[(data["timestamp"] >= "2019-01-01") & (data["timestamp"] <= "2020-12-31")].reset_index(drop=True)

log.info("Shape: %s", data.shape)
twiga_gt(GT(data.head().round(2)))

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 1
----> 1 data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
   use_nullable_dtypes = False
check_dtype_backend(dtype_backend)
--> 669 return impl.read(
   path,
   columns=columns,
   filters=filters,
   storage_options=storage_options,
   use_nullable_dtypes=use_nullable_dtypes,
   dtype_backend=dtype_backend,
   filesystem=filesystem,
   **kwargs,
)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
if manager == "array":
   to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
   path,
   filesystem,
   storage_options=storage_options,
   mode="rb",
)
try:
   pa_table = self.api.parquet.read_table(
       path_or_handle,
       columns=columns,
   (...)    270         **kwargs,
   )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:141, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
handles = None
if (
   not fs
   and not is_dir
   (...)    139     # fsspec resources can also point to directories
   # this branch is used for example when reading from non-fsspec URLs
--> 141     handles = get_handle(
       path_or_handle, mode, is_text=False, storage_options=storage_options
   )
   fs = None
   path_or_handle = handles.handle

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/common.py:882, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
       handle = open(
           handle,
           ioargs.mode,
   (...)    878             newline="",
       )
   else:
       # Binary mode
--> 882         handle = open(handle, ioargs.mode)
   handles.append(handle)
# Convert BytesIO or file objects passed with an encoding

FileNotFoundError: [Errno 2] No such file or directory: '../data/MLVS-PT.parquet'

Train / val / test splits#

from great_tables import GT, md

from twiga.core.plot.gt import twiga_gt

splits_df = pd.DataFrame(
    {
        "Split": ["Train", "Validation", "Test"],
        "Period": ["before 2020-01-01", "2020-01-01 – 2020-06-30", "2020-07-01 onwards"],
        "Purpose": ["Model learning", "Early-stopping / overfitting guard", "Final honest evaluation"],
    }
)

twiga_gt(
    GT(splits_df)
    .tab_header(title=md("**Data Splits**"), subtitle="Chronological — no shuffling, no overlap")
    .cols_label(**{c: md(f"**{c}**") for c in splits_df.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(splits_df),
)

train_df = data[data["timestamp"] < "2020-01-01"].reset_index(drop=True)
val_df = data[(data["timestamp"] >= "2020-01-01") & (data["timestamp"] < "2020-07-01")].reset_index(drop=True)
test_df = data[data["timestamp"] >= "2020-07-01"].reset_index(drop=True)

log.info(
    f"train : {train_df.shape[0]:,} rows  ({train_df['timestamp'].min().date()} → {train_df['timestamp'].max().date()})"
)
log.info(f"val   : {val_df.shape[0]:,} rows  ({val_df['timestamp'].min().date()} → {val_df['timestamp'].max().date()})")
log.info(
    f"test  : {test_df.shape[0]:,} rows  ({test_df['timestamp'].min().date()} → {test_df['timestamp'].max().date()})"
)

Shared pipeline and training configs#

data_config = DataPipelineConfig(
    target_feature="NetLoad(kW)",
    period="30min",
    latitude=32.371666,
    longitude=-16.274998,
    calendar_features=["hour", "day_night"],
    exogenous_features=["Ghi"],
    forecast_horizon=48,
    lookback_window_size=96,
    stride=48,
    input_scaler=StandardScaler(),
    target_scaler=RobustScaler(),
)

train_config = ForecasterConfig(split_freq="months", train_size=3, test_size=1)

data_config

3. ML Quantile Model: QR XGBoost#

QRXGBOOSTConfig trains a multi-quantile XGBoost model. Each quantile level is a separate output; the median (q=0.5) becomes the point forecast and the outer quantiles (q=0.05, q=0.95) form a 90 % prediction band.

The interval emerges directly from the model - no distributional assumption required. For coverage-guaranteed intervals via conformal post-processing, see NB09 - Conformal Prediction.

from IPython.display import clear_output

from twiga import TwigaForecaster
from twiga.models.ml import QRXGBOOSTConfig

qr_xg_config = QRXGBOOSTConfig(device="cpu")
forecaster_qr_xg = TwigaForecaster(
    data_params=data_config,
    model_params=[qr_xg_config],
    train_params=train_config,
)
forecaster_qr_xg.fit(train_df=train_df, val_df=val_df)
clear_output()
log.info("QR XGBoost training complete.")

Reading QR XGBoost quantile metrics

pinball: the mean pinball loss averaged across all quantile levels and time steps. Lower is better; compare across models on the same dataset but do not compare across different alpha settings.
sharpness: mean interval width (upper − lower). Smaller values indicate tighter, more informative intervals - desirable as long as PICP is maintained.
mae / rmse: point forecast quality using the median quantile as the prediction. These match what you would see from a standard point-forecast model.

pred_qr_xg, metric_qr_xg = forecaster_qr_xg.evaluate_quantile_forecast(test_df=test_df)
clear_output()
metric_qr_xg.columns

def get_metric_table(metric_df):
    res = (
        metric_df.groupby("Model")[["mae", "corr", "pinball", "crps", "sharpness", "calibration_error"]]
        .mean()
        .round(2)
        .reset_index()
    )
    res = res.rename(
        columns={
            "mae": "MAE",
            "corr": "Corr",
            "pinball": "PINBALL",
            "crps": "CRPS",
            "sharpness": "SHARPNESS",
            "calibration_error": "Cal-err",
        }
    )

    metric_name = ["MAE", "Corr", "PINBALL", "CRPS", "SHARPNESS", "Cal-err"]
    minimize_cols = ["MAE", "SMAPE", "RMSE"]
    maximize_cols = ["Corr"]

    return twiga_report(res, metric_name, minimize_cols, maximize_cols)

get_metric_table(metric_qr_xg)

4. NN Quantile Model: MLPF QR#

MLPFConfig(distribution="qr") adds a Group Additive Model head on top of the MLP encoder, giving the model an inductive bias toward additive feature contributions - useful when solar irradiance and calendar effects have near-separable impacts on net load.

As with the ML model, the quantile outputs form the interval directly. No calibration step is needed to obtain bounds.

from twiga.models.nn import MLPFConfig

qr_mlpf_config = MLPFConfig(distribution="qr", max_epochs=10, rich_progress_bar=False)

forecaster_qr_mlpf = TwigaForecaster(
    data_params=data_config,
    model_params=[qr_mlpf_config],
    train_params=train_config,
)
forecaster_qr_mlpf.fit(train_df=train_df, val_df=val_df)
clear_output()
log.info("MLPGAM QR training complete.")

pred_qr_mlpf, metric_qr_mlpf = forecaster_qr_mlpf.evaluate_quantile_forecast(test_df=test_df)
clear_output()

get_metric_table(metric_qr_mlpf)

4b. NN Flexible-Quantile Model: MLPGAM FPQR#

MLPGAMFPQRConfig pairs the GAM backbone with a Flexible Proportional Quantile Regression head. Unlike fixed-grid QR (where τ values are pre-specified), FPQR learns its own quantile grid adaptively during training via a QuantileProposal network.

Key differences from fixed-grid QR:

Property	Fixed QR (`MLPGAMConfig(distribution="qr")`)	Flexible QR (FPQR)
Quantile levels	Fixed (e.g. 0.05 → 0.95)	Learned per sample
Grid size	Set by `quantiles` list	Set by `n_quantiles`
Point forecast	Median of fixed grid	Weighted expectation of proposed quantiles
Use case	Calibrated specific quantiles	Flexible uncertainty shape

The quantile_levels returned are averaged over the batch for display purposes, since each sample proposes a slightly different grid.

fpqr_mlpf_config = MLPFConfig(
    distribution="fpqr",
    n_quantiles=9,
    conf_level=0.05,
    max_epochs=10,
    rich_progress_bar=False,
)

forecaster_fpqr_mlpf = TwigaForecaster(
    data_params=data_config,
    model_params=[fpqr_mlpf_config],
    train_params=train_config,
)
forecaster_fpqr_mlpf.fit(train_df=train_df, val_df=val_df)
clear_output()
log.info("MLPF FPQR training complete.")

pred_fpqr_mlpf, metrics_fpqr_mlpf = forecaster_fpqr_mlpf.evaluate_quantile_forecast(test_df=test_df)
# clear_output()
get_metric_table(metrics_fpqr_mlpf)

# QR workflow summary
workflow_df = pd.DataFrame(
    {
        "Step": ["Choose a QR model", "Assemble forecaster", "Train", "Get interval predictions", "Compute metrics"],
        "API call": [
            "QRXGBOOSTConfig(), MLPGAMConfig(distribution='qr'), or MLPGAMConfig(distribution='fpqr')",
            "TwigaForecaster(data_params, model_params, train_params)",
            ".fit(train_df, val_df)",
            ".evaluate_interval_forecast(test_df)",
            "get_interval_metrics(pred, true, lower, upper, alpha)",
        ],
    }
)

twiga_gt(
    GT(workflow_df)
    .tab_header(title=md("**Quantile Regression Workflow**"), subtitle="End-to-end steps in Twiga")
    .cols_label(**{c: md(f"**{c}**") for c in workflow_df.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(workflow_df),
)

from great_tables import GT, md
import pandas as pd

from twiga.core.plot.gt import twiga_gt

# Fixed QR vs FPQR comparison
comparison_df = pd.DataFrame(
    {
        "Property": ["Quantile levels", "Point forecast", "Key param", "Typical advantage"],
        "Fixed QR": [
            "Fixed at init (conf_level bounds + quantiles list)",
            "Median quantile of fixed grid",
            "quantiles=[0.1, ..., 0.9]",
            "Easier to audit specific levels; stable training",
        ],
        "FPQR": [
            "Learned per sample via QuantileProposal network",
            "Weighted expectation ∑ τ̂ᵢ · qᵢ",
            "n_quantiles=9",
            "Better uncertainty shape on complex / skewed data",
        ],
    }
)

twiga_gt(
    GT(comparison_df)
    .tab_header(
        title=md("**Fixed-Grid QR vs Flexible QR (FPQR)**"),
        subtitle="Choose based on auditability vs. adaptability needs",
    )
    .cols_label(**{c: md(f"**{c}**") for c in comparison_df.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(comparison_df),
)

# ruff: noqa: E501, E701, E702
from IPython.display import HTML

_TEAL = "#107591"
_TEAL_MID = "#069fac"
_TEAL_LIGHT = "#e8f5f8"
_TEAL_BEST = "#d0ecf1"
_TEXT_DARK = "#2d3748"
_TEXT_MUTED = "#718096"
_WHITE = "#ffffff"

steps = [
    {
        "num": "06",
        "title": "Backtesting & Evaluation",
        "desc": "Rolling-window backtesting · fold-level metrics",
        "tags": ["backtesting", "evaluation"],
        "active": False,
    },
    {
        "num": "07",
        "title": "Neural Networks",
        "desc": "MLPF · N-HiTS · Lightning training loop",
        "tags": ["neural network", "pytorch"],
        "active": False,
    },
    {
        "num": "08",
        "title": "Quantile Regression",
        "desc": "QR-LightGBM · QR-XGBoost · FPQR — calibrated prediction intervals",
        "tags": ["probabilistic", "quantile", "pinball loss"],
        "active": True,
    },
    {
        "num": "09",
        "title": "Parametric Distributions",
        "desc": "Normal · Laplace · Gamma heads — NLL training",
        "tags": ["parametric", "NLL", "distributions"],
        "active": False,
    },
    {
        "num": "10",
        "title": "Conformal Prediction",
        "desc": "Coverage-guaranteed intervals with CQR and CRC",
        "tags": ["conformal", "CQR", "CRC"],
        "active": False,
    },
]
track_name = "Probabilistic Track"
footer = 'Next: <span style="color:#107591;font-weight:600;">Parametric Distributions</span> (09) for NLL-trained models, or <span style="color:#107591;font-weight:600;">Conformal Prediction</span> (10) for coverage guarantees.'


def _b(t, bg, fg):
    return f'<span style="display:inline-block;background:{bg};color:{fg};font-size:10px;font-weight:600;padding:2px 7px;border-radius:10px;margin:2px 2px 0 0;">{t}</span>'


ch = ""
for i, s in enumerate(steps):
    a = s["active"]
    cb = _TEAL if a else _WHITE
    cbo = _TEAL if a else "#d1ecf1"
    nb = _TEAL_MID if a else _TEAL_LIGHT
    nf = _WHITE if a else _TEAL
    tf = _WHITE if a else _TEXT_DARK
    df = "#cce8ef" if a else _TEXT_MUTED
    bb = "#0d5f75" if a else _TEAL_BEST
    bf = "#b8e4ed" if a else _TEAL
    yh = (
        f'<span style="float:right;background:{_TEAL_MID};color:{_WHITE};font-size:10px;font-weight:700;padding:2px 10px;border-radius:12px;">★ you are here</span>'
        if a
        else ""
    )
    bdg = "".join(_b(t, bb, bf) for t in s["tags"])
    ch += f'<div style="background:{cb};border:2px solid {cbo};border-radius:12px;padding:16px 20px;display:flex;align-items:flex-start;gap:16px;box-shadow:{"0 4px 14px rgba(16,117,145,.25)" if a else "0 1px 4px rgba(0,0,0,.06)"};"><div style="min-width:44px;height:44px;background:{nb};color:{nf};border-radius:50%;display:flex;align-items:center;justify-content:center;font-size:15px;font-weight:800;flex-shrink:0;">{s["num"]}</div><div style="flex:1;"><div style="font-size:15px;font-weight:700;color:{tf};margin-bottom:4px;">{s["title"]}{yh}</div><div style="font-size:12.5px;color:{df};margin-bottom:8px;line-height:1.5;">{s["desc"]}</div><div>{bdg}</div></div></div>'
    if i < len(steps) - 1:
        ch += f'<div style="display:flex;justify-content:center;height:32px;"><svg width="24" height="32" viewBox="0 0 24 32" fill="none"><line x1="12" y1="0" x2="12" y2="24" stroke="{_TEAL_MID}" stroke-width="2" stroke-dasharray="4 3"/><polygon points="6,20 18,20 12,30" fill="{_TEAL_MID}"/></svg></div>'

HTML(
    f'<div style="font-family:Inter,\'Segoe UI\',sans-serif;max-width:640px;margin:8px 0;"><div style="background:linear-gradient(135deg,{_TEAL} 0%,{_TEAL_MID} 100%);border-radius:12px 12px 0 0;padding:14px 20px;display:flex;align-items:center;gap:10px;"><svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="{_WHITE}" stroke-width="2"><path d="M12 2L2 7l10 5 10-5-10-5z"/><path d="M2 17l10 5 10-5"/><path d="M2 12l10 5 10-5"/></svg><span style="color:{_WHITE};font-size:14px;font-weight:700;">Twiga Learning Path — {track_name}</span></div><div style="border:2px solid {_TEAL_LIGHT};border-top:none;border-radius:0 0 12px 12px;padding:20px 20px 16px;background:#f9fdfe;display:flex;flex-direction:column;">{ch}<div style="margin-top:16px;font-size:11.5px;color:{_TEXT_MUTED};text-align:center;border-top:1px solid {_TEAL_LIGHT};padding-top:12px;">{footer}</div></div></div>'
)

Fan Chart#

plot_quantile_fan renders graduated ribbon bands from widest to narrowest, with the median line on top - the standard chart for communicating quantile forecast uncertainty in energy and meteorological forecasting.

Interval Comparison#

plot_forecast_intervals overlays multiple models’ prediction intervals as semi-transparent ribbons - useful for visually comparing sharpness and coverage across methods on the same axis.

Metric	Full name	Formula (sketch)	Ideal value
PICP	Prediction Interval Coverage Probability	fraction of actuals inside [lower, upper]	≥ 1 − α (e.g. ≥ 0.90)
NMPI	Normalised Mean Prediction Interval	mean(upper − lower) / range(y)	as small as possible
Winkler score	Winkler (1972)	width + penalty for misses	lower is better