Parametric Distributions#

Level Python Twiga Time


What you’ll build

Probabilistic neural network forecasters using parametric distribution heads (Normal, Laplace, Gamma) trained with negative log-likelihood loss, evaluated on CRPS and calibration diagrams.

Prerequisites

  • 07 - Neural Networks (MLPF, NHiTS, Lightning training loop)

  • 08 - Quantile Regression (interval metrics: PICP, NMPI, Winkler)

  • Python: basic probability distributions helpful

Learning objectives

By the end of this notebook you will be able to:

  1. Choose a distribution family (Normal, Laplace, Gamma, LogNormal, Beta) based on signal characteristics

  2. Configure and train probabilistic neural networks using negative log-likelihood loss

  3. Evaluate probabilistic forecasts with CRPS and Winkler score

  4. Interpret reliability diagrams to diagnose over- and under-coverage

  5. Compare parametric distribution heads and select the best-calibrated model

1. Choosing the right distribution#

Different physical signals have fundamentally different statistical shapes. Picking a distribution that matches that shape is the most important modelling decision in parametric forecasting.

Key concept - parametric distributions

Instead of predicting a single number, a parametric model outputs the parameters of a probability distribution - for example, a mean µ and standard deviation σ for the Normal family. The model is trained by maximising the log-likelihood of the observed targets under the predicted distribution (equivalently, minimising the negative log-likelihood, NLL). This is fundamentally different from pinball/quantile loss, which directly targets specific quantile levels. NLL training uses all the information in the distributional shape, making it more data-efficient when the chosen family is a good match - but poorly calibrated when it is not.

  • Normal: symmetric, unbounded - the natural default for net load or temperature deltas.

  • Laplace: heavier tails than Normal - robust to outlier spikes (electricity prices, residual demand).

  • Gamma / LogNormal: strictly positive, right-skewed - ideal for PV generation or aggregate wind.

  • Beta: bounded in [0, 1] - suited to capacity factors and state-of-charge signals.

The forecastability profile from NB02 told us NetLoad(kW) is approximately symmetric and can go negative - Normal is the natural starting point.

import warnings

from great_tables import GT, md
from IPython.display import clear_output
from lets_plot import LetsPlot
import numpy as np
import pandas as pd
from sklearn.preprocessing import RobustScaler, StandardScaler

from twiga.core.plot.gt import twiga_gt, twiga_report

LetsPlot.setup_html()

from twiga.core.plot import (
    plot_density,
    plot_forecast,
    plot_forecast_grid,
    plot_metrics_bar,
    plot_reliability_diagram,
)
from twiga.core.utils import configure, get_logger

warnings.filterwarnings("ignore")

configure()
log = get_logger("tutorials")

2. Setup#

Load data#

data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)
# Restrict to 2019-2020 to keep tutorial execution fast
data = data[(data["timestamp"] >= "2019-01-01") & (data["timestamp"] <= "2020-12-31")].reset_index(drop=True)

log.info("Shape: %s", data.shape)
twiga_gt(GT(data.head()))
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 1
----> 1 data = pd.read_parquet("../data/MLVS-PT.parquet")
      2 data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
      3 data["timestamp"] = pd.to_datetime(data["timestamp"])
      4 data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
    666     use_nullable_dtypes = False
    667 check_dtype_backend(dtype_backend)
--> 669 return impl.read(
    670     path,
    671     columns=columns,
    672     filters=filters,
    673     storage_options=storage_options,
    674     use_nullable_dtypes=use_nullable_dtypes,
    675     dtype_backend=dtype_backend,
    676     filesystem=filesystem,
    677     **kwargs,
    678 )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
    256 if manager == "array":
    257     to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
    259     path,
    260     filesystem,
    261     storage_options=storage_options,
    262     mode="rb",
    263 )
    264 try:
    265     pa_table = self.api.parquet.read_table(
    266         path_or_handle,
    267         columns=columns,
   (...)    270         **kwargs,
    271     )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:141, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
    131 handles = None
    132 if (
    133     not fs
    134     and not is_dir
   (...)    139     # fsspec resources can also point to directories
    140     # this branch is used for example when reading from non-fsspec URLs
--> 141     handles = get_handle(
    142         path_or_handle, mode, is_text=False, storage_options=storage_options
    143     )
    144     fs = None
    145     path_or_handle = handles.handle

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/common.py:882, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    873         handle = open(
    874             handle,
    875             ioargs.mode,
   (...)    878             newline="",
    879         )
    880     else:
    881         # Binary mode
--> 882         handle = open(handle, ioargs.mode)
    883     handles.append(handle)
    885 # Convert BytesIO or file objects passed with an encoding

FileNotFoundError: [Errno 2] No such file or directory: '../data/MLVS-PT.parquet'

Train / val / test splits#

We use the same fixed temporal split as all other tutorials.

df_splits = pd.DataFrame(
    {
        "Split": ["train", "val", "test"],
        "Period": ["before 2020-01-01", "2020-01-01 – 2020-06-30", "2020-07-01 onwards"],
        "Role": ["Model training", "Early stopping", "Final evaluation"],
    }
)

twiga_gt(
    GT(df_splits)
    .tab_header(title=md("**Data splits**"), subtitle="Fixed temporal partition")
    .cols_label(**{c: md(f"**{c}**") for c in df_splits.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(df_splits),
)
train_df = data[data["timestamp"] < "2020-01-01"].reset_index(drop=True)
val_df = data[(data["timestamp"] >= "2020-01-01") & (data["timestamp"] < "2020-07-01")].reset_index(drop=True)
test_df = data[data["timestamp"] >= "2020-07-01"].reset_index(drop=True)

log.info(
    f"train : {train_df.shape[0]:,} rows  ({train_df['timestamp'].min().date()}{train_df['timestamp'].max().date()})"
)
log.info(f"val   : {val_df.shape[0]:,} rows  ({val_df['timestamp'].min().date()}{val_df['timestamp'].max().date()})")
log.info(
    f"test  : {test_df.shape[0]:,} rows  ({test_df['timestamp'].min().date()}{test_df['timestamp'].max().date()})"
)

Data and training configs#

from twiga.core.config import ConformalConfig, DataPipelineConfig, ForecasterConfig

data_config = DataPipelineConfig(
    target_feature="NetLoad(kW)",
    period="30min",
    latitude=32.371666,
    longitude=-16.274998,
    calendar_features=["hour", "day_night"],
    exogenous_features=["Ghi"],
    forecast_horizon=48,
    stride=48,
    lookback_window_size=96,
    input_scaler=StandardScaler(),
    target_scaler=RobustScaler(),
)

train_config = ForecasterConfig(project_name="Experiment-parametric")
conformal_config = ConformalConfig(method="residual", alpha=0.1)

data_config
df_dist = pd.DataFrame(
    {
        "Signal characteristic": [
            "Symmetric, can go negative (net load, temp delta)",
            "Heavy-tailed, outlier-prone (price spikes)",
            "Strictly positive, right-skewed (PV, wind)",
            "Bounded [0, 1] (capacity factor, SoC)",
        ],
        "Distribution": ["Normal", "Laplace / StudentT", "LogNormal / Gamma", "Beta"],
        "Config shorthand": [
            'MLPFConfig(distribution="normal")',
            'MLPGAMConfig(distribution="laplace")',
            'MLPFConfig(distribution="lognormal")',
            'MLPGAMConfig(distribution="beta")',
        ],
    }
)

twiga_gt(
    GT(df_dist)
    .tab_header(
        title=md("**Distribution family selector**"),
        subtitle="Match the distribution to the physical shape of your signal",
    )
    .cols_label(**{c: md(f"**{c}**") for c in df_dist.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(df_dist),
)

3. The parametric head interface#

Every distribution in Twiga is an nn.Module that wraps a lightweight linear projection on top of the backbone’s latent vector. They all share the same three-method contract:

  • forward(z) → distribution parameters as tensors

  • get_distribution(*params) → a torch.distributions object

  • get_log_likelihood(*params, targets) → negative log-likelihood scalar (the training loss)

The DISTRIBUTIONS registry maps string names to classes, and build_distribution instantiates them by name.

import torch

from twiga.distributions.nn import DISTRIBUTIONS, build_distribution

log.info("Available distributions: %s", list(DISTRIBUTIONS.keys()))

# Peek at one head
head = build_distribution("normal", num_target_output=1, hidden_size=64, forecast_horizon=48)
z = torch.randn(4, 64)  # batch of 4 samples
mu, sigma = head(z)
log.info("mu shape   : %s", mu.shape)  # (4, 48, 1)
log.info("sigma shape: %s", sigma.shape)

dist = head.get_distribution(mu, sigma)
samples = dist.sample()
log.info("sample shape: %s", samples.shape)

Shape convention - all parametric heads output tensors of shape (B, forecast_horizon, num_target_output), where B is the batch size. This matches the target tensor shape used throughout the training loop, so no reshaping is needed before computing the NLL loss.

4. Normal distribution: NetLoad (MLPF backbone)#

The Normal head predicts a mean mu and a standard deviation sigma for every horizon step. The 90 % prediction interval is [mu 1.645σ, mu + 1.645σ]. Because NetLoad(kW) is approximately symmetric and can go negative, Normal is the textbook choice.

from twiga import TwigaForecaster
from twiga.models.nn import MLPFConfig

normal_config = MLPFConfig(distribution="normal", max_epochs=5, rich_progress_bar=False)

forecaster_normal = TwigaForecaster(
    data_params=data_config,
    model_params=[normal_config],
    train_params=train_config,
    conformal_params=conformal_config,
)
forecaster_normal.fit(train_df=train_df, val_df=val_df)
clear_output()
pred_normal, metric_normal = forecaster_normal.evaluate_parametric_forecast(test_df=test_df)
clear_output()
def get_metric_table(metric_df):
    res = metric_df.groupby("Model")[["mae", "corr", "nll", "crps", "dss"]].mean().round(2).reset_index()
    res = res.rename(columns={"mae": "MAE", "corr": "Corr", "nll": "NLL", "crps": "CRPS", "dss": "DSS"})

    metric_name = ["MAE", "Corr", "CRPS", "NLL", "DSS"]
    minimize_cols = ["MAE", "CRPS", "NLL", "DSS"]
    maximize_cols = ["Corr"]

    return twiga_report(res, metric_name, minimize_cols, maximize_cols)
get_metric_table(metric_normal)
p = plot_forecast(
    pred_normal.iloc[: 7 * 48],
    actual_col="Actual",
    forecast_col="forecast",
    title="MLPFNormal — point forecast",
    y_label="Net Load (kW)",
    x_label="Step (30 min)",
)
p

6. Laplace: heavier tails#

Laplace has heavier tails than Normal - it assigns more probability to extreme events. For signals with frequent, sharp spikes (spot electricity prices, residual net load during demand response events), Laplace can produce better-calibrated intervals than Normal while using the same MLPGAM backbone’s additive structure.

from twiga.models.nn import MLPFConfig

laplace_config = MLPFConfig(distribution="laplace", max_epochs=5, rich_progress_bar=False)

forecaster_laplace = TwigaForecaster(
    data_params=data_config,
    model_params=[laplace_config],
    train_params=train_config,
    conformal_params=conformal_config,
)
forecaster_laplace.fit(train_df=train_df, val_df=val_df)
clear_output()
pred_laplace, metric_laplace = forecaster_laplace.evaluate_parametric_forecast(test_df=test_df)
clear_output()

get_metric_table(metric_laplace)

8. Evaluating distributional quality#

Point metrics (MAE, RMSE) only assess the mean forecast. To evaluate the full predicted distribution we need interval-aware metrics. We use the Normal model’s outputs here since it was trained on the original (unclipped) data.

Key concept - CRPS

The Continuous Ranked Probability Score (CRPS) measures the entire predictive distribution against the observed outcome in a single number. Formally it is the integrated squared difference between the predicted CDF F and the empirical step-function at the observation y:

CRPS(F, y) = ∫ (F(z) − 𝟙[z ≥ y])² dz

It unifies point and interval evaluation: a degenerate (point) forecast recovers MAE, while a perfectly calibrated distribution achieves the minimum possible CRPS. Lower is better. Unlike interval metrics that target a fixed coverage level, CRPS evaluates the full distributional shape, making it the standard metric for comparing parametric models.

Key concept - aleatoric vs. epistemic uncertainty

Parametric heads capture aleatoric uncertainty - the irreducible randomness in the signal itself (e.g. weather-driven demand variability). The predicted σ grows where the data is intrinsically noisy regardless of how much more training data you add. Epistemic uncertainty - uncertainty due to limited data or model capacity - is not directly represented by a single parametric head; ensemble methods or Bayesian approaches are needed for that. When interpreting prediction intervals here, you are reading aleatoric uncertainty only.

Interval metrics from get_interval_metrics#

Reliability diagram#

A reliability diagram checks calibration: for every nominal coverage level 1 α, the empirical fraction of test points inside the predicted interval should equal 1 α. A perfectly calibrated model lies on the diagonal.

How to read the plot:

  • Points on the diagonal → the model is perfectly calibrated at that level

  • Points above the diagonal → over-coverage (intervals are wider than necessary - safe but inefficient)

  • Points below the diagonal → under-coverage (intervals are too narrow - the nominal guarantee is not met)

  • A systematic upward bow → the distribution family has too heavy tails (e.g. Laplace overestimates spread for a Normal signal)

  • A systematic downward bow → the distribution family is too light-tailed, underestimating extreme events

collection_n = forecaster_normal.forecast(test_df)
results_n = next(iter(collection_n.results.values()))
collection_l = forecaster_laplace.forecast(test_df)
results_l = next(iter(collection_l.results.values()))
clear_output()
mu_n = results_n.loc.flatten()
sigma_n = results_n.scale.flatten()
actual = results_n.ground_truth.flatten()

mu_l = results_l.loc.flatten()
b_l = results_l.scale.flatten()
from scipy.stats import laplace as scipy_laplace, norm as scipy_norm

alphas_rd = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5]
cov_normal = []
cov_laplace = []

for a in alphas_rd:
    z = scipy_norm.ppf(1 - a / 2)
    cov_normal.append(((actual >= mu_n - z * sigma_n) & (actual <= mu_n + z * sigma_n)).mean())
    b_ppf = -np.log(a / 2)
    cov_laplace.append(((actual >= mu_l - b_ppf * b_l) & (actual <= mu_l + b_ppf * b_l)).mean())

nominal_levels = [1 - a for a in alphas_rd]
empirical_data = pd.DataFrame(
    {
        "nominal": nominal_levels * 2,
        "empirical": cov_normal + cov_laplace,
        "group": ["Normal (MLPFNormal)"] * len(nominal_levels) + ["Laplace (MLPGAMLaplace)"] * len(nominal_levels),
    }
)

p = plot_reliability_diagram(
    nominal=nominal_levels,
    empirical=cov_normal,
    group_col="group",
    groups=["Normal (MLPFNormal)", "Laplace (MLPGAMLaplace)"],
    title="Reliability diagram",
)
p

Residual Distribution Diagnostics#

plot_kde and plot_cdf let you inspect the shape of the forecast residuals - a quick sanity check that errors are roughly symmetric and unimodal before committing to a Gaussian distributional assumption.

from twiga.core.plot import plot_cdf, plot_kde

residuals_df = pd.DataFrame(
    {
        "residual": pred_normal["Actual"].values - pred_normal["forecast"].values,
    }
)

p_kde = plot_kde(
    residuals_df,
    x_col="residual",
    title="Residual Density — MLPFNormal",
    x_label="Residual (kW)",
)
p_kde
p_cdf = plot_cdf(
    residuals_df,
    x_col="residual",
    title="Residual Cumulative Distribution — MLPFNormal",
    x_label="Residual (kW)",
)
p_cdf

Forecast Error Distribution by Step#

plot_distribution groups by an x-axis column and renders the per-group mean ± 1σ ribbon - useful for diagnosing whether errors are uniformly distributed across the forecast horizon or worsen at specific steps.

from twiga.core.plot import plot_distribution

dist_df = pred_normal.copy()
dist_df["step"] = np.arange(len(dist_df)) % data_config.forecast_horizon
dist_df["residual"] = dist_df["Actual"] - dist_df["forecast"]

p_dist = plot_distribution(
    dist_df,
    x_col="step",
    y_col="residual",
    title="Residual Distribution by Forecast Step — MLPFNormal",
    x_label="Step (30 min)",
    y_label="Residual (kW)",
)
p_dist

PIT Histogram#

The Probability Integral Transform (PIT) histogram is the gold-standard calibration diagnostic for parametric forecasts (Gneiting et al., 2007). A flat histogram indicates perfect calibration; a hump shape means the model is overconfident (intervals too narrow); a U-shape means it is over-dispersed.

from twiga.core.plot import plot_pit_histogram

# Recover sigma from the pre-computed 90 % interval: upper = mu + z_0.95 * sigma
mu_pit = mu_n
sigma_pit = sigma_n
y_pit = actual

pit_values = scipy_norm.cdf(y_pit, loc=mu_pit, scale=sigma_pit)

p_pit = plot_pit_histogram(
    pit_values,
    n_bins=20,
    title="PIT Histogram — MLPFNormal",
)
p_pit

Reliability Diagram#

Plots empirical coverage against nominal coverage levels. Points on the diagonal = perfectly calibrated. Points above = conservative (intervals too wide). Points below = anti-conservative (intervals too narrow).

from twiga.core.plot import plot_reliability_diagram

nominal_levels = np.linspace(0.10, 0.95, 18)

mu_rd = mu_n
sigma_rd = sigma_n
y_rd = actual

empirical_coverage = np.array(
    [
        (
            (y_rd >= mu_rd - scipy_norm.ppf(1 - (1 - lvl) / 2) * sigma_rd)
            & (y_rd <= mu_rd + scipy_norm.ppf(1 - (1 - lvl) / 2) * sigma_rd)
        ).mean()
        for lvl in nominal_levels
    ]
)

p_rel = plot_reliability_diagram(
    nominal=nominal_levels,
    empirical=empirical_coverage,
    title="Reliability Diagram — MLPFNormal",
)
p_rel

Wrapping up#

What you did

  • Chose distribution families based on signal characteristics (Normal, Laplace, Gamma)

  • Configured parametric distribution heads on MLPF and MLPGAM backbone architectures

  • Trained models end-to-end with NLL loss and extracted predictive mean and credible intervals

  • Compared Normal vs. Laplace tails with density plots and residual diagnostics

  • Evaluated distributional quality using CRPS, PIT histograms, and reliability diagrams

Key takeaways

  1. Parametric models output distribution parameters (µ, σ) - not point predictions - and are trained by minimising NLL rather than a point loss.

  2. The choice of distribution family matters: Normal for symmetric signals, Laplace for heavy-tailed ones, Gamma/LogNormal for strictly positive targets.

  3. CRPS is the gold-standard metric for comparing parametric models - it evaluates the full distributional shape in a single score.

  4. A well-calibrated model tracks the diagonal of the reliability diagram; systematic deviations reveal distributional mismatch.

  5. Parametric heads capture aleatoric (irreducible) uncertainty - epistemic uncertainty requires ensembles or Bayesian approaches.


What’s next?#

NB09 - Conformal Prediction (09-conformal-prediction.ipynb)

Learn how to wrap any fitted Twiga model with a coverage-guaranteed conformal calibration step. Conformal prediction requires no distributional assumption and works with any base model - including the parametric models you trained here.

# ruff: noqa: E501, E701, E702
from IPython.display import HTML

_TEAL = "#107591"
_TEAL_MID = "#069fac"
_TEAL_LIGHT = "#e8f5f8"
_TEAL_BEST = "#d0ecf1"
_TEXT_DARK = "#2d3748"
_TEXT_MUTED = "#718096"
_WHITE = "#ffffff"

steps = [
    {
        "num": "07",
        "title": "Neural Networks",
        "desc": "MLPF · N-HiTS · Lightning training · sequence embeddings",
        "tags": ["neural network", "pytorch"],
        "active": False,
    },
    {
        "num": "08",
        "title": "Quantile Regression",
        "desc": "QR-LightGBM · FPQR — calibrated prediction intervals",
        "tags": ["quantile", "pinball loss"],
        "active": False,
    },
    {
        "num": "09",
        "title": "Parametric Distributions",
        "desc": "Normal · Laplace · Gamma heads — NLL training · CRPS evaluation",
        "tags": ["parametric", "NLL", "CRPS"],
        "active": True,
    },
    {
        "num": "10",
        "title": "Conformal Prediction",
        "desc": "Coverage-guaranteed intervals with CQR and CRC wrappers",
        "tags": ["conformal", "CQR", "coverage"],
        "active": False,
    },
    {
        "num": "11",
        "title": "Hyperparameter Tuning",
        "desc": "Optuna-backed HPO · search spaces · resumable SQLite",
        "tags": ["optuna", "HPO", "tuning"],
        "active": False,
    },
]
track_name = "Probabilistic Track"
footer = 'Next: wrap any model with <span style="color:#107591;font-weight:600;">Conformal Prediction</span> (10) for finite-sample coverage guarantees.'


def _b(t, bg, fg):
    return f'<span style="display:inline-block;background:{bg};color:{fg};font-size:10px;font-weight:600;padding:2px 7px;border-radius:10px;margin:2px 2px 0 0;">{t}</span>'


ch = ""
for i, s in enumerate(steps):
    a = s["active"]
    cb = _TEAL if a else _WHITE
    cbo = _TEAL if a else "#d1ecf1"
    nb = _TEAL_MID if a else _TEAL_LIGHT
    nf = _WHITE if a else _TEAL
    tf = _WHITE if a else _TEXT_DARK
    df = "#cce8ef" if a else _TEXT_MUTED
    bb = "#0d5f75" if a else _TEAL_BEST
    bf = "#b8e4ed" if a else _TEAL
    yh = (
        f'<span style="float:right;background:{_TEAL_MID};color:{_WHITE};font-size:10px;font-weight:700;padding:2px 10px;border-radius:12px;">★ you are here</span>'
        if a
        else ""
    )
    bdg = "".join(_b(t, bb, bf) for t in s["tags"])
    ch += f'<div style="background:{cb};border:2px solid {cbo};border-radius:12px;padding:16px 20px;display:flex;align-items:flex-start;gap:16px;box-shadow:{"0 4px 14px rgba(16,117,145,.25)" if a else "0 1px 4px rgba(0,0,0,.06)"};"><div style="min-width:44px;height:44px;background:{nb};color:{nf};border-radius:50%;display:flex;align-items:center;justify-content:center;font-size:15px;font-weight:800;flex-shrink:0;">{s["num"]}</div><div style="flex:1;"><div style="font-size:15px;font-weight:700;color:{tf};margin-bottom:4px;">{s["title"]}{yh}</div><div style="font-size:12.5px;color:{df};margin-bottom:8px;line-height:1.5;">{s["desc"]}</div><div>{bdg}</div></div></div>'
    if i < len(steps) - 1:
        ch += f'<div style="display:flex;justify-content:center;height:32px;"><svg width="24" height="32" viewBox="0 0 24 32" fill="none"><line x1="12" y1="0" x2="12" y2="24" stroke="{_TEAL_MID}" stroke-width="2" stroke-dasharray="4 3"/><polygon points="6,20 18,20 12,30" fill="{_TEAL_MID}"/></svg></div>'

HTML(
    f'<div style="font-family:Inter,\'Segoe UI\',sans-serif;max-width:640px;margin:8px 0;"><div style="background:linear-gradient(135deg,{_TEAL} 0%,{_TEAL_MID} 100%);border-radius:12px 12px 0 0;padding:14px 20px;display:flex;align-items:center;gap:10px;"><svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="{_WHITE}" stroke-width="2"><path d="M12 2L2 7l10 5 10-5-10-5z"/><path d="M2 17l10 5 10-5"/><path d="M2 12l10 5 10-5"/></svg><span style="color:{_WHITE};font-size:14px;font-weight:700;">Twiga Learning Path — {track_name}</span></div><div style="border:2px solid {_TEAL_LIGHT};border-top:none;border-radius:0 0 12px 12px;padding:20px 20px 16px;background:#f9fdfe;display:flex;flex-direction:column;">{ch}<div style="margin-top:16px;font-size:11.5px;color:{_TEXT_MUTED};text-align:center;border-top:1px solid {_TEAL_LIGHT};padding-top:12px;">{footer}</div></div></div>'
)