Conformal Prediction & Guaranteed Coverage#

Level Python Twiga Time


What you’ll build

Coverage-guaranteed prediction intervals using Conformalised Quantile Regression (CQR) and Conformal Risk Control (CRC) applied as post-training wrappers on any trained Twiga model.

Prerequisites

  • 08 - Quantile Regression (QR models and interval metrics)

  • 09 - Parametric Distributions (parametric probabilistic heads)

  • Python: basic probability, coverage concepts

Learning objectives

By the end of this notebook you will be able to:

  1. Explain the conformal prediction coverage guarantee and when it holds

  2. Apply CQR and CRC as post-training wrappers to any Twiga model via ConformalConfig

  3. Distinguish between CQR (tightens intervals) and CRC (adjusts a single threshold)

  4. Evaluate coverage and sharpness trade-offs using PICP, NMPI, and Winkler score

  5. Select the right conformal method based on your base model type

1. Motivation: Why Conformal Prediction?#

The two probabilistic approaches covered in earlier notebooks each have a fundamental limitation:

  • Parametric (NB08): assumes data follows a known distribution family. Wrong family → miscalibrated coverage.

  • Quantile Regression (NB07): relies on asymptotic regime. Finite-sample coverage is not guaranteed.

Key concept - conformal prediction

Conformal prediction produces prediction intervals with a finite-sample, distribution-free coverage guarantee. Formally, for a significance level α ∈ (0, 1), it constructs intervals [l̂, û] such that:

P(y_{n+1} ∈ [l̂_{n+1}, û_{n+1}]) ≥ 1 − α

This is a marginal coverage guarantee - it holds over the randomness of the calibration and test draw, with no distributional assumption required. It works with any underlying model: tree ensembles, neural networks, or linear models.

The key ingredient is a calibration set - a held-out set of labeled examples (separate from training) used to compute nonconformity scores, which measure how surprised the model is by each observation. The coverage guarantee follows directly from the exchangeability of calibration and test data.

Key concept - CQR vs. CRC

Twiga provides three conformal methods, differing in how they compute and adjust nonconformity scores:

  • Split Conformal (method="residual"): the simplest approach. Nonconformity score = |y − ŷ|. Interval = ŷ ± q_{1−α}(scores). Produces constant-width intervals - cannot adapt to local uncertainty.

  • CQR - Conformal Quantile Regression (method="quantile"): requires a QR base model. Adjusts the QR interval using calibration residuals. Inherits the adaptive width of the underlying QR model while adding the coverage guarantee. Best when you already have a QR model (NB07) and want sharper, locally adaptive intervals.

  • CRC - Conformal Residual Calibration (method="residual-fitting"): fits a secondary model to predict residual magnitude. Most expressive - interval width adapts to predicted local uncertainty. Best for heteroscedastic signals (e.g. solar-driven net load with strong day/night cycles).

Key concept - coverage vs. sharpness trade-off

All three methods satisfy the marginal coverage guarantee, but they differ in sharpness - how narrow the intervals are on average. Wider intervals are trivially easy to achieve (just make them very wide), so a good method is one that achieves the nominal coverage with the smallest possible interval width. This is measured by metrics like NMPI (Normalised Mean Prediction Interval width) and the Winkler score, which penalises both excess width and coverage violations jointly. The reliability diagram reveals whether a method is over- or under-covering systematically, while the sharpness comparison reveals which method produces the tightest intervals at the same nominal level.

from great_tables import GT, md
import pandas as pd

from twiga.core.plot.gt import twiga_gt, twiga_report

df_methods = pd.DataFrame(
    {
        "Method": ["Split Conformal", "CQR", "CRC"],
        "Config method=": ['"residual"', '"quantile"', '"residual-fitting"'],
        "Key idea": [
            "Quantile of absolute residuals on the calibration set",
            "Adjust the QR interval using calibration residuals",
            "Fit a secondary model to predict residual magnitude adaptively",
        ],
        "Interval width": ["Constant", "Adaptive (from QR)", "Adaptive (from residual model)"],
    }
)

twiga_gt(
    GT(df_methods)
    .tab_header(
        title=md("**Conformal methods in Twiga**"),
        subtitle="All three methods satisfy the marginal coverage guarantee",
    )
    .cols_label(**{c: md(f"**{c}**") for c in df_methods.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(df_methods),
)
Conformal methods in Twiga
All three methods satisfy the marginal coverage guarantee
Method Config method= Key idea Interval width
Split Conformal "residual" Quantile of absolute residuals on the calibration set Constant
CQR "quantile" Adjust the QR interval using calibration residuals Adaptive (from QR)
CRC "residual-fitting" Fit a secondary model to predict residual magnitude adaptively Adaptive (from residual model)
Twiga Forecast

2. Setup#

import warnings

from IPython.display import clear_output
from lets_plot import LetsPlot
import numpy as np
import pandas as pd
from sklearn.preprocessing import RobustScaler, StandardScaler

LetsPlot.setup_html()

from twiga.core.config import DataPipelineConfig, ForecasterConfig
from twiga.core.plot import (
    plot_forecast,
    plot_forecast_grid,
    plot_metrics_bar,
    plot_reliability_diagram,
)
from twiga.core.utils import configure, get_logger

warnings.filterwarnings("ignore")

configure()
log = get_logger("tutorials")

Load data#

The dataset covers Madeira, Portugal (32.37°N, 16.27°W) at 30-minute resolution. We load only the columns needed: timestamp, net load (target), and two exogenous variables.

data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)
# Restrict to 2019-2020 to keep tutorial execution fast
data = data[(data["timestamp"] >= "2019-01-01") & (data["timestamp"] <= "2020-12-31")].reset_index(drop=True)

log.info("Shape: %s", data.shape)
twiga_gt(GT(data.head()))
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[3], line 1
----> 1 data = pd.read_parquet("../data/MLVS-PT.parquet")
      2 data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
      3 data["timestamp"] = pd.to_datetime(data["timestamp"])
      4 data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
    666     use_nullable_dtypes = False
    667 check_dtype_backend(dtype_backend)
--> 669 return impl.read(
    670     path,
    671     columns=columns,
    672     filters=filters,
    673     storage_options=storage_options,
    674     use_nullable_dtypes=use_nullable_dtypes,
    675     dtype_backend=dtype_backend,
    676     filesystem=filesystem,
    677     **kwargs,
    678 )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
    256 if manager == "array":
    257     to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
    259     path,
    260     filesystem,
    261     storage_options=storage_options,
    262     mode="rb",
    263 )
    264 try:
    265     pa_table = self.api.parquet.read_table(
    266         path_or_handle,
    267         columns=columns,
   (...)    270         **kwargs,
    271     )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:141, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
    131 handles = None
    132 if (
    133     not fs
    134     and not is_dir
   (...)    139     # fsspec resources can also point to directories
    140     # this branch is used for example when reading from non-fsspec URLs
--> 141     handles = get_handle(
    142         path_or_handle, mode, is_text=False, storage_options=storage_options
    143     )
    144     fs = None
    145     path_or_handle = handles.handle

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/common.py:882, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    873         handle = open(
    874             handle,
    875             ioargs.mode,
   (...)    878             newline="",
    879         )
    880     else:
    881         # Binary mode
--> 882         handle = open(handle, ioargs.mode)
    883     handles.append(handle)
    885 # Convert BytesIO or file objects passed with an encoding

FileNotFoundError: [Errno 2] No such file or directory: '../data/MLVS-PT.parquet'

Train / val / test splits#

We use the same fixed temporal split as all other tutorials. The validation set doubles as the calibration set - it is held out of training and used to compute nonconformity scores.

df_splits = pd.DataFrame(
    {
        "Split": ["train", "val / calibration", "test"],
        "Period": ["before 2020-01-01", "2020-01-01 – 2020-06-30", "2020-07-01 onwards"],
        "Role": [
            "Model training",
            "Early stopping + conformal calibration (nonconformity scores)",
            "Final evaluation of coverage and sharpness",
        ],
    }
)

twiga_gt(
    GT(df_splits)
    .tab_header(
        title=md("**Data splits**"),
        subtitle="Validation set is reused as the calibration set for all three conformal methods",
    )
    .cols_label(**{c: md(f"**{c}**") for c in df_splits.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(df_splits),
)
train_df = data[data["timestamp"] < "2020-01-01"].reset_index(drop=True)
val_df = data[(data["timestamp"] >= "2020-01-01") & (data["timestamp"] < "2020-07-01")].reset_index(drop=True)
test_df = data[data["timestamp"] >= "2020-07-01"].reset_index(drop=True)

# The calibration set is the validation set
calibrate_df = val_df.copy()

log.info(
    f"train      : {train_df.shape[0]:,} rows  "
    f"({train_df['timestamp'].min().date()}{train_df['timestamp'].max().date()})"
)
log.info(
    f"val        : {val_df.shape[0]:,} rows  ({val_df['timestamp'].min().date()}{val_df['timestamp'].max().date()})"
)
log.info(
    f"test       : {test_df.shape[0]:,} rows  "
    f"({test_df['timestamp'].min().date()}{test_df['timestamp'].max().date()})"
)
log.info(f"calibration: same as val — {calibrate_df.shape[0]:,} rows")

Shared data and training configs#

All three conformal methods share the same DataPipelineConfig and ForecasterConfig. We define them once and reuse them throughout the notebook.

data_config = DataPipelineConfig(
    target_feature="NetLoad(kW)",
    period="30min",
    latitude=32.371666,
    longitude=-16.274998,
    calendar_features=["hour", "day_night"],
    exogenous_features=["Ghi"],
    forecast_horizon=48,
    stride=48,
    lookback_window_size=96,
    input_scaler=StandardScaler(),
    target_scaler=RobustScaler(),
)

train_config = ForecasterConfig(
    project_name="conformal_tutorial",
)

data_config

3. Method 1: Split Conformal (method="residual")#

Split conformal (also called inductive conformal prediction) is the simplest approach:

  1. Train any point forecasting model on train_df.

  2. Compute nonconformity scores \(s_i = |y_i - \hat{y}_i|\) on the calibration set.

  3. At test time, construct the interval as \(\hat{y} \pm q_{1-\alpha}(s)\) where \(q_{1-\alpha}\) is the \((1-\alpha)(1 + 1/n)\)-quantile of the calibration scores.

Pros: works with any base model, fast calibration, finite-sample coverage guarantee.
Cons: produces constant-width intervals - cannot adapt to regions of higher or lower uncertainty.

from twiga import TwigaForecaster
from twiga.core.config import ConformalConfig
from twiga.models.ml import LIGHTGBMConfig
# method="residual" corresponds to Split Conformal Prediction
conformal_config_split = ConformalConfig(method="residual", alpha=0.1)

forecaster_split = TwigaForecaster(
    data_params=data_config,
    model_params=[LIGHTGBMConfig()],
    train_params=train_config,
    conformal_params=conformal_config_split,
)
forecaster_split.fit(train_df=train_df, val_df=val_df)
clear_output()
log.info("Split Conformal — LightGBM training complete.")
# Calibration step: computes nonconformity scores on the calibration set
forecaster_split.calibrate(calibrate_df=calibrate_df, ensemble_strategy="mean")
log.info("Calibration complete.")

. Interval Metrics#

Three standard metrics assess prediction interval quality. Raw QR intervals do not carry a formal coverage guarantee - empirical PICP may differ from the nominal level. For guaranteed coverage, apply conformal calibration (NB09).

interval_metrics_df = pd.DataFrame(
    {
        "Metric": ["PICP", "NMPI", "Winkler score"],
        "Full name": [
            "Prediction Interval Coverage Probability",
            "Normalised Mean Prediction Interval",
            "Winkler (1972)",
        ],
        "Interpretation": [
            "Fraction of true values inside the interval. Should be ≥ 1 − alpha.",
            "Average interval width normalised by the target range. Smaller = sharper.",
            "Combines sharpness and coverage in one number. Lower is better.",
        ],
    }
)

twiga_gt(
    GT(interval_metrics_df)
    .tab_header(title=md("**Interval Evaluation Metrics**"), subtitle="Reliability, sharpness, and their synthesis")
    .cols_label(**{c: md(f"**{c}**") for c in interval_metrics_df.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(interval_metrics_df),
)

Reading the interval metric comparison

  • A model with PICP < 1 − alpha is under-covering - its intervals are too narrow and will miss more events than claimed. For a 90 % interval (alpha = 0.10), PICP below 0.90 indicates miscalibration.

  • Among models with adequate PICP, prefer the one with the smallest NMPI (sharpest bounds) and lowest Winkler score.

  • If all models under-cover, the raw quantile bounds need conformal calibration (NB09) to meet the nominal guarantee.

  • FPQR may show different PICP than fixed-grid QR because its quantile grid adapts per sample - check both coverage and sharpness before choosing.

def get_metric_table(metric_df):
    res = (
        metric_df.groupby("Model")[
            [
                "mae",
                "corr",
                "winkler_score",
                "picp",
                "nmpi",
                "ace",
                "cwe",
            ]
        ]
        .mean()
        .round(2)
        .reset_index()
    )
    res = res.rename(
        columns={
            "mae": "MAE",
            "corr": "Corr",
            "picp": "PICP",
            "nmpi": "NMPI",
            "cwe": "CWE",
            "ace": "ACE",
        }
    )

    metric_name = [
        "MAE",
        "Corr",
    ]
    minimize_cols = ["MAE", "SMAPE", "RMSE"]
    maximize_cols = ["Corr"]

    return twiga_report(res, metric_name, minimize_cols, maximize_cols)
# predict_interval returns: dict[model_name -> (lower, forecast, upper)]
intervals_split, metric_split = forecaster_split.evaluate_interval_forecast(test_df=test_df)
get_metric_table(metric_split)
p = plot_forecast(
    intervals_split.iloc[: 7 * 48],
    actual_col="Actual",
    forecast_col="forecast",
    lower_col="lower",
    upper_col="upper",
    title="SCP - 90% Prediction Interval (first 7 days of test)",
    y_label="NetLoad (kW)",
    x_label="Time step (30-min)",
)
p

4. Method 2: CQR: Conformal Quantile Regression (method="quantile")#

CQR (Angelopoulos & Bates, 2021) extends split conformal to models that already output quantile estimates:

  1. Train a quantile regression model to output lower (\(\hat{q}_{\alpha/2}\)) and upper (\(\hat{q}_{1-\alpha/2}\)) quantiles.

  2. Compute calibration nonconformity scores as \(s_i = \max(\hat{q}_{\alpha/2} - y_i,\; y_i - \hat{q}_{1-\alpha/2})\).

  3. Adjust the QR interval by adding the calibration quantile of these scores.

Result: CQR adapts the interval width to the local uncertainty estimated by the QR model, while still maintaining the marginal coverage guarantee. Intervals are typically sharper than split conformal.

from twiga.models.ml import QRLIGHTGBMConfig
# method="quantile" corresponds to Conformal Quantile Regression (CQR)
conformal_config_cqr = ConformalConfig(method="quantile", score_type="unscaled", alpha=0.1)

forecaster_cqr = TwigaForecaster(
    data_params=data_config,
    model_params=[QRLIGHTGBMConfig()],
    train_params=train_config,
    conformal_params=conformal_config_cqr,
)
forecaster_cqr.fit(train_df=train_df, val_df=val_df)
clear_output()
log.info("CQR — QR-LightGBM training complete.")
forecaster_cqr.calibrate(calibrate_df=calibrate_df, ensemble_strategy="mean")
log.info("CQR calibration complete.")
intervals_cqr, metric_cqr = forecaster_cqr.evaluate_interval_forecast(test_df=test_df)
get_metric_table(metric_cqr)
p = plot_forecast(
    intervals_cqr.iloc[: 7 * 48],
    actual_col="Actual",
    forecast_col="forecast",
    lower_col="lower",
    upper_col="upper",
    title="SCP - 90% Prediction Interval (first 7 days of test)",
    y_label="NetLoad (kW)",
    x_label="Time step (30-min)",
)
p

5. Method 3: CRC: Conformal Residual Calibration (method="residual-fitting")#

CRC takes a different approach: instead of a fixed correction term, it fits a secondary model that predicts the magnitude of the residuals. This gives the interval an adaptive width - wider where the model is uncertain, narrower where it is confident.

CRC requires a base model that outputs both a location (point forecast) and a scale (predicted uncertainty). The MLPGAMCRCConfig model is purpose-built for this: it is a Group Additive Model with a conformal residual calibration head.

Note: CRC is the most expressive of the three methods but also the most computationally demanding, since it involves training a neural network. We set max_epochs=5 here for a quick demo.

from twiga.models.nn import MLPGAMCRCConfig
# method="residual-fitting" corresponds to Conformal Residual Calibration (CRC)
conformal_config_crc = ConformalConfig(method="residual-fitting", alpha=0.1)

crc_config = MLPGAMCRCConfig.from_data_config(data_config)
crc_config.max_epochs = 25
crc_config.rich_progress_bar = False

log.info("CRC model config ready.")
forecaster_crc = TwigaForecaster(
    data_params=data_config,
    model_params=[crc_config],
    train_params=train_config,
    conformal_params=conformal_config_crc,
)
forecaster_crc.fit(train_df=train_df, val_df=val_df)
clear_output()
log.info("CRC — MLPGAMCRCConfig training complete.")
forecaster_crc.calibrate(calibrate_df=calibrate_df, ensemble_strategy="mean")
log.info("CRC calibration complete.")
intervals_crc, metric_cqr = forecaster_crc.evaluate_interval_forecast(test_df=test_df)
get_metric_table(metric_cqr)
p = plot_forecast(
    intervals_crc.iloc[: 7 * 48],
    actual_col="Actual",
    forecast_col="forecast",
    lower_col="lower",
    upper_col="upper",
    title="SCP - 90% Prediction Interval (first 7 days of test)",
    y_label="NetLoad (kW)",
    x_label="Time step (30-min)",
)
p

9. When to Use Which Method#

General guidance:

  • Start with Split Conformal: it requires zero additional modelling effort and provides the coverage guarantee immediately with any point forecasting model.

  • If you already have a quantile regression model (NB07), upgrade to CQR for free: it only adds a calibration step on top of your existing QR model and gives locally adaptive intervals.

  • Use CRC when you have reason to believe uncertainty varies significantly across time (e.g. solar-driven net load, day/night cycles) and are willing to pay the cost of training a neural network backbone.

All three methods satisfy the marginal coverage guarantee under exchangeability. None requires specifying a parametric distribution family.

from great_tables import GT, md
import pandas as pd

from twiga.core.plot.gt import twiga_gt

df_selector = pd.DataFrame(
    {
        "Method": ["Split Conformal", "CQR", "CRC"],
        "Config method=": ['"residual"', '"quantile"', '"residual-fitting"'],
        "Requires": [
            "Any point forecasting model",
            "A QR model (e.g. QRLIGHTGBMConfig)",
            "A location-scale model (e.g. MLPGAMCRCConfig)",
        ],
        "Best when": [
            "Fast, general-purpose guarantee; ideal starting point",
            "Want tighter, locally adaptive intervals on top of QR",
            "Heteroscedastic data; want residual model to drive width",
        ],
    }
)

twiga_gt(
    GT(df_selector)
    .tab_header(
        title=md("**Conformal method selector**"),
        subtitle="All three meet the marginal coverage guarantee under exchangeability",
    )
    .cols_label(**{c: md(f"**{c}**") for c in df_selector.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(df_selector),
)

Wrapping up#

What you did

  • Understood the conformal prediction finite-sample coverage guarantee and the exchangeability assumption

  • Calibrated a LightGBM model with Split Conformal using calibrate()

  • Applied CQR on top of a QR-LightGBM model for locally adaptive intervals

  • Applied CRC with an MLPGAMCRCConfig model for heteroscedastic adaptive intervals

  • Compared all three methods on coverage (reliability diagram) and sharpness (mean interval width)

Key takeaways

  1. Conformal prediction produces finite-sample coverage-guaranteed intervals without any distributional assumption - it works with any base model.

  2. The calibration set (held-out labeled data) is the core ingredient: nonconformity scores from this set drive the coverage guarantee.

  3. Split Conformal is the safest starting point - zero modelling overhead, immediate guarantee.

  4. CQR inherits adaptive width from the underlying QR model; CRC trains a secondary residual model for maximum adaptivity.

  5. The reliability diagram is the primary diagnostic: all conformal methods should meet or exceed the diagonal; persistent under-coverage signals a violation of the exchangeability assumption.


What’s next?#

NB10 - Hyperparameter Optimisation (10-hyperparameter-tuning.ipynb)

Learn how to use Optuna-powered HPO via forecaster.tune() to automatically search for optimal model hyperparameters across any model type, with resumable SQLite-backed studies.

# ruff: noqa: E501, E701, E702
from IPython.display import HTML

_TEAL = "#107591"
_TEAL_MID = "#069fac"
_TEAL_LIGHT = "#e8f5f8"
_TEAL_BEST = "#d0ecf1"
_TEXT_DARK = "#2d3748"
_TEXT_MUTED = "#718096"
_WHITE = "#ffffff"

steps = [
    {
        "num": "08",
        "title": "Quantile Regression",
        "desc": "QR-LightGBM · FPQR — prediction intervals",
        "tags": ["quantile", "pinball loss"],
        "active": False,
    },
    {
        "num": "09",
        "title": "Parametric Distributions",
        "desc": "Normal · Laplace · Gamma heads — NLL training",
        "tags": ["parametric", "NLL"],
        "active": False,
    },
    {
        "num": "10",
        "title": "Conformal Prediction",
        "desc": "CQR · CRC — finite-sample coverage guarantees",
        "tags": ["conformal", "CQR", "CRC", "coverage"],
        "active": True,
    },
    {
        "num": "11",
        "title": "Hyperparameter Tuning",
        "desc": "Optuna-backed HPO · typed search spaces · resumable SQLite",
        "tags": ["optuna", "HPO", "tuning"],
        "active": False,
    },
    {
        "num": "12",
        "title": "Ensemble Strategies",
        "desc": "Mean · median · weighted-mean ensembles",
        "tags": ["ensemble", "weighted", "aggregation"],
        "active": False,
    },
]
track_name = "Probabilistic Track"
footer = 'Next: push coverage further with <span style="color:#107591;font-weight:600;">Hyperparameter Tuning</span> (11) or combine models with <span style="color:#107591;font-weight:600;">Ensemble Strategies</span> (12).'


def _b(t, bg, fg):
    return f'<span style="display:inline-block;background:{bg};color:{fg};font-size:10px;font-weight:600;padding:2px 7px;border-radius:10px;margin:2px 2px 0 0;">{t}</span>'


ch = ""
for i, s in enumerate(steps):
    a = s["active"]
    cb = _TEAL if a else _WHITE
    cbo = _TEAL if a else "#d1ecf1"
    nb = _TEAL_MID if a else _TEAL_LIGHT
    nf = _WHITE if a else _TEAL
    tf = _WHITE if a else _TEXT_DARK
    df = "#cce8ef" if a else _TEXT_MUTED
    bb = "#0d5f75" if a else _TEAL_BEST
    bf = "#b8e4ed" if a else _TEAL
    yh = (
        f'<span style="float:right;background:{_TEAL_MID};color:{_WHITE};font-size:10px;font-weight:700;padding:2px 10px;border-radius:12px;">★ you are here</span>'
        if a
        else ""
    )
    bdg = "".join(_b(t, bb, bf) for t in s["tags"])
    ch += f'<div style="background:{cb};border:2px solid {cbo};border-radius:12px;padding:16px 20px;display:flex;align-items:flex-start;gap:16px;box-shadow:{"0 4px 14px rgba(16,117,145,.25)" if a else "0 1px 4px rgba(0,0,0,.06)"};"><div style="min-width:44px;height:44px;background:{nb};color:{nf};border-radius:50%;display:flex;align-items:center;justify-content:center;font-size:15px;font-weight:800;flex-shrink:0;">{s["num"]}</div><div style="flex:1;"><div style="font-size:15px;font-weight:700;color:{tf};margin-bottom:4px;">{s["title"]}{yh}</div><div style="font-size:12.5px;color:{df};margin-bottom:8px;line-height:1.5;">{s["desc"]}</div><div>{bdg}</div></div></div>'
    if i < len(steps) - 1:
        ch += f'<div style="display:flex;justify-content:center;height:32px;"><svg width="24" height="32" viewBox="0 0 24 32" fill="none"><line x1="12" y1="0" x2="12" y2="24" stroke="{_TEAL_MID}" stroke-width="2" stroke-dasharray="4 3"/><polygon points="6,20 18,20 12,30" fill="{_TEAL_MID}"/></svg></div>'

HTML(
    f'<div style="font-family:Inter,\'Segoe UI\',sans-serif;max-width:640px;margin:8px 0;"><div style="background:linear-gradient(135deg,{_TEAL} 0%,{_TEAL_MID} 100%);border-radius:12px 12px 0 0;padding:14px 20px;display:flex;align-items:center;gap:10px;"><svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="{_WHITE}" stroke-width="2"><path d="M12 2L2 7l10 5 10-5-10-5z"/><path d="M2 17l10 5 10-5"/><path d="M2 12l10 5 10-5"/></svg><span style="color:{_WHITE};font-size:14px;font-weight:700;">Twiga Learning Path — {track_name}</span></div><div style="border:2px solid {_TEAL_LIGHT};border-top:none;border-radius:0 0 12px 12px;padding:20px 20px 16px;background:#f9fdfe;display:flex;flex-direction:column;">{ch}<div style="margin-top:16px;font-size:11.5px;color:{_TEXT_MUTED};text-align:center;border-top:1px solid {_TEAL_LIGHT};padding-top:12px;">{footer}</div></div></div>'
)