Parametric Distributions#

What you’ll build

Probabilistic neural network forecasters using parametric distribution heads (Normal, Laplace, Gamma) trained with negative log-likelihood loss, evaluated on CRPS and calibration diagrams.

Prerequisites

07 - Neural Networks (MLPF, NHiTS, Lightning training loop)
08 - Quantile Regression (interval metrics: PICP, NMPI, Winkler)
Python: basic probability distributions helpful

Learning objectives

By the end of this notebook you will be able to:

Choose a distribution family (Normal, Laplace, Gamma, LogNormal, Beta) based on signal characteristics
Configure and train probabilistic neural networks using negative log-likelihood loss
Evaluate probabilistic forecasts with CRPS and Winkler score
Interpret reliability diagrams to diagnose over- and under-coverage
Compare parametric distribution heads and select the best-calibrated model

1. Choosing the right distribution#

Different physical signals have fundamentally different statistical shapes. Picking a distribution that matches that shape is the most important modelling decision in parametric forecasting.

Key concept - parametric distributions

Instead of predicting a single number, a parametric model outputs the parameters of a probability distribution - for example, a mean µ and standard deviation σ for the Normal family. The model is trained by maximising the log-likelihood of the observed targets under the predicted distribution (equivalently, minimising the negative log-likelihood, NLL). This is fundamentally different from pinball/quantile loss, which directly targets specific quantile levels. NLL training uses all the information in the distributional shape, making it more data-efficient when the chosen family is a good match - but poorly calibrated when it is not.

Normal: symmetric, unbounded - the natural default for net load or temperature deltas.

Laplace: heavier tails than Normal - robust to outlier spikes (electricity prices, residual demand).

Gamma / LogNormal: strictly positive, right-skewed - ideal for PV generation or aggregate wind.

Beta: bounded in [0, 1] - suited to capacity factors and state-of-charge signals.

The forecastability profile from NB02 told us NetLoad(kW) is approximately symmetric and can go negative - Normal is the natural starting point.

import warnings

from great_tables import GT, md
from IPython.display import clear_output
from lets_plot import LetsPlot
import numpy as np
import pandas as pd
from sklearn.preprocessing import RobustScaler, StandardScaler

from twiga.core.plot.gt import twiga_gt, twiga_report

LetsPlot.setup_html()

from twiga.core.plot import (
    plot_density,
    plot_forecast,
    plot_forecast_grid,
    plot_metrics_bar,
    plot_reliability_diagram,
)
from twiga.core.utils import configure, get_logger

warnings.filterwarnings("ignore")

configure()
log = get_logger("tutorials")

2. Setup#

Load data#

data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)
# Restrict to 2019-2020 to keep tutorial execution fast
data = data[(data["timestamp"] >= "2019-01-01") & (data["timestamp"] <= "2020-12-31")].reset_index(drop=True)

log.info("Shape: %s", data.shape)
twiga_gt(GT(data.head()))

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 1
----> 1 data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
   use_nullable_dtypes = False
check_dtype_backend(dtype_backend)
--> 669 return impl.read(
   path,
   columns=columns,
   filters=filters,
   storage_options=storage_options,
   use_nullable_dtypes=use_nullable_dtypes,
   dtype_backend=dtype_backend,
   filesystem=filesystem,
   **kwargs,
)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
if manager == "array":
   to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
   path,
   filesystem,
   storage_options=storage_options,
   mode="rb",
)
try:
   pa_table = self.api.parquet.read_table(
       path_or_handle,
       columns=columns,
   (...)    270         **kwargs,
   )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:141, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
handles = None
if (
   not fs
   and not is_dir
   (...)    139     # fsspec resources can also point to directories
   # this branch is used for example when reading from non-fsspec URLs
--> 141     handles = get_handle(
       path_or_handle, mode, is_text=False, storage_options=storage_options
   )
   fs = None
   path_or_handle = handles.handle

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/common.py:882, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
       handle = open(
           handle,
           ioargs.mode,
   (...)    878             newline="",
       )
   else:
       # Binary mode
--> 882         handle = open(handle, ioargs.mode)
   handles.append(handle)
# Convert BytesIO or file objects passed with an encoding

FileNotFoundError: [Errno 2] No such file or directory: '../data/MLVS-PT.parquet'

Train / val / test splits#

We use the same fixed temporal split as all other tutorials.

df_splits = pd.DataFrame(
    {
        "Split": ["train", "val", "test"],
        "Period": ["before 2020-01-01", "2020-01-01 – 2020-06-30", "2020-07-01 onwards"],
        "Role": ["Model training", "Early stopping", "Final evaluation"],
    }
)

twiga_gt(
    GT(df_splits)
    .tab_header(title=md("**Data splits**"), subtitle="Fixed temporal partition")
    .cols_label(**{c: md(f"**{c}**") for c in df_splits.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(df_splits),
)

train_df = data[data["timestamp"] < "2020-01-01"].reset_index(drop=True)
val_df = data[(data["timestamp"] >= "2020-01-01") & (data["timestamp"] < "2020-07-01")].reset_index(drop=True)
test_df = data[data["timestamp"] >= "2020-07-01"].reset_index(drop=True)

log.info(
    f"train : {train_df.shape[0]:,} rows  ({train_df['timestamp'].min().date()} → {train_df['timestamp'].max().date()})"
)
log.info(f"val   : {val_df.shape[0]:,} rows  ({val_df['timestamp'].min().date()} → {val_df['timestamp'].max().date()})")
log.info(
    f"test  : {test_df.shape[0]:,} rows  ({test_df['timestamp'].min().date()} → {test_df['timestamp'].max().date()})"
)

Data and training configs#

from twiga.core.config import ConformalConfig, DataPipelineConfig, ForecasterConfig

data_config = DataPipelineConfig(
    target_feature="NetLoad(kW)",
    period="30min",
    latitude=32.371666,
    longitude=-16.274998,
    calendar_features=["hour", "day_night"],
    exogenous_features=["Ghi"],
    forecast_horizon=48,
    stride=48,
    lookback_window_size=96,
    input_scaler=StandardScaler(),
    target_scaler=RobustScaler(),
)

train_config = ForecasterConfig(project_name="Experiment-parametric")
conformal_config = ConformalConfig(method="residual", alpha=0.1)

data_config

df_dist = pd.DataFrame(
    {
        "Signal characteristic": [
            "Symmetric, can go negative (net load, temp delta)",
            "Heavy-tailed, outlier-prone (price spikes)",
            "Strictly positive, right-skewed (PV, wind)",
            "Bounded [0, 1] (capacity factor, SoC)",
        ],
        "Distribution": ["Normal", "Laplace / StudentT", "LogNormal / Gamma", "Beta"],
        "Config shorthand": [
            'MLPFConfig(distribution="normal")',
            'MLPGAMConfig(distribution="laplace")',
            'MLPFConfig(distribution="lognormal")',
            'MLPGAMConfig(distribution="beta")',
        ],
    }
)

twiga_gt(
    GT(df_dist)
    .tab_header(
        title=md("**Distribution family selector**"),
        subtitle="Match the distribution to the physical shape of your signal",
    )
    .cols_label(**{c: md(f"**{c}**") for c in df_dist.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(df_dist),
)

3. The parametric head interface#

Every distribution in Twiga is an nn.Module that wraps a lightweight linear projection on top of the backbone’s latent vector. They all share the same three-method contract:

forward(z) → distribution parameters as tensors
get_distribution(*params) → a torch.distributions object
get_log_likelihood(*params, targets) → negative log-likelihood scalar (the training loss)

The DISTRIBUTIONS registry maps string names to classes, and build_distribution instantiates them by name.

import torch

from twiga.distributions.nn import DISTRIBUTIONS, build_distribution

log.info("Available distributions: %s", list(DISTRIBUTIONS.keys()))

# Peek at one head
head = build_distribution("normal", num_target_output=1, hidden_size=64, forecast_horizon=48)
z = torch.randn(4, 64)  # batch of 4 samples
mu, sigma = head(z)
log.info("mu shape   : %s", mu.shape)  # (4, 48, 1)
log.info("sigma shape: %s", sigma.shape)

dist = head.get_distribution(mu, sigma)
samples = dist.sample()
log.info("sample shape: %s", samples.shape)

Shape convention - all parametric heads output tensors of shape (B, forecast_horizon, num_target_output), where B is the batch size. This matches the target tensor shape used throughout the training loop, so no reshaping is needed before computing the NLL loss.

4. Normal distribution: NetLoad (MLPF backbone)#

The Normal head predicts a mean mu and a standard deviation sigma for every horizon step. The 90 % prediction interval is [mu − 1.645σ, mu + 1.645σ]. Because NetLoad(kW) is approximately symmetric and can go negative, Normal is the textbook choice.

from twiga import TwigaForecaster
from twiga.models.nn import MLPFConfig

normal_config = MLPFConfig(distribution="normal", max_epochs=5, rich_progress_bar=False)

forecaster_normal = TwigaForecaster(
    data_params=data_config,
    model_params=[normal_config],
    train_params=train_config,
    conformal_params=conformal_config,
)
forecaster_normal.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_normal, metric_normal = forecaster_normal.evaluate_parametric_forecast(test_df=test_df)
clear_output()

def get_metric_table(metric_df):
    res = metric_df.groupby("Model")[["mae", "corr", "nll", "crps", "dss"]].mean().round(2).reset_index()
    res = res.rename(columns={"mae": "MAE", "corr": "Corr", "nll": "NLL", "crps": "CRPS", "dss": "DSS"})

    metric_name = ["MAE", "Corr", "CRPS", "NLL", "DSS"]
    minimize_cols = ["MAE", "CRPS", "NLL", "DSS"]
    maximize_cols = ["Corr"]

    return twiga_report(res, metric_name, minimize_cols, maximize_cols)

get_metric_table(metric_normal)

p = plot_forecast(
    pred_normal.iloc[: 7 * 48],
    actual_col="Actual",
    forecast_col="forecast",
    title="MLPFNormal — point forecast",
    y_label="Net Load (kW)",
    x_label="Step (30 min)",
)
p

6. Laplace: heavier tails#

Laplace has heavier tails than Normal - it assigns more probability to extreme events. For signals with frequent, sharp spikes (spot electricity prices, residual net load during demand response events), Laplace can produce better-calibrated intervals than Normal while using the same MLPGAM backbone’s additive structure.

from twiga.models.nn import MLPFConfig

laplace_config = MLPFConfig(distribution="laplace", max_epochs=5, rich_progress_bar=False)

forecaster_laplace = TwigaForecaster(
    data_params=data_config,
    model_params=[laplace_config],
    train_params=train_config,
    conformal_params=conformal_config,
)
forecaster_laplace.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_laplace, metric_laplace = forecaster_laplace.evaluate_parametric_forecast(test_df=test_df)
clear_output()

get_metric_table(metric_laplace)

8. Evaluating distributional quality#

Point metrics (MAE, RMSE) only assess the mean forecast. To evaluate the full predicted distribution we need interval-aware metrics. We use the Normal model’s outputs here since it was trained on the original (unclipped) data.

Key concept - CRPS

The Continuous Ranked Probability Score (CRPS) measures the entire predictive distribution against the observed outcome in a single number. Formally it is the integrated squared difference between the predicted CDF F and the empirical step-function at the observation y:

CRPS(F, y) = ∫ (F(z) − 𝟙[z ≥ y])² dz

It unifies point and interval evaluation: a degenerate (point) forecast recovers MAE, while a perfectly calibrated distribution achieves the minimum possible CRPS. Lower is better. Unlike interval metrics that target a fixed coverage level, CRPS evaluates the full distributional shape, making it the standard metric for comparing parametric models.

Key concept - aleatoric vs. epistemic uncertainty

Parametric heads capture aleatoric uncertainty - the irreducible randomness in the signal itself (e.g. weather-driven demand variability). The predicted σ grows where the data is intrinsically noisy regardless of how much more training data you add. Epistemic uncertainty - uncertainty due to limited data or model capacity - is not directly represented by a single parametric head; ensemble methods or Bayesian approaches are needed for that. When interpreting prediction intervals here, you are reading aleatoric uncertainty only.

Interval metrics from `get_interval_metrics`#

Reliability diagram#

A reliability diagram checks calibration: for every nominal coverage level 1 − α, the empirical fraction of test points inside the predicted interval should equal 1 − α. A perfectly calibrated model lies on the diagonal.

How to read the plot:

Points on the diagonal → the model is perfectly calibrated at that level
Points above the diagonal → over-coverage (intervals are wider than necessary - safe but inefficient)
Points below the diagonal → under-coverage (intervals are too narrow - the nominal guarantee is not met)
A systematic upward bow → the distribution family has too heavy tails (e.g. Laplace overestimates spread for a Normal signal)
A systematic downward bow → the distribution family is too light-tailed, underestimating extreme events

collection_n = forecaster_normal.forecast(test_df)
results_n = next(iter(collection_n.results.values()))
collection_l = forecaster_laplace.forecast(test_df)
results_l = next(iter(collection_l.results.values()))
clear_output()

mu_n = results_n.loc.flatten()
sigma_n = results_n.scale.flatten()
actual = results_n.ground_truth.flatten()

mu_l = results_l.loc.flatten()
b_l = results_l.scale.flatten()

from scipy.stats import laplace as scipy_laplace, norm as scipy_norm

alphas_rd = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5]
cov_normal = []
cov_laplace = []

for a in alphas_rd:
    z = scipy_norm.ppf(1 - a / 2)
    cov_normal.append(((actual >= mu_n - z * sigma_n) & (actual <= mu_n + z * sigma_n)).mean())
    b_ppf = -np.log(a / 2)
    cov_laplace.append(((actual >= mu_l - b_ppf * b_l) & (actual <= mu_l + b_ppf * b_l)).mean())

nominal_levels = [1 - a for a in alphas_rd]
empirical_data = pd.DataFrame(
    {
        "nominal": nominal_levels * 2,
        "empirical": cov_normal + cov_laplace,
        "group": ["Normal (MLPFNormal)"] * len(nominal_levels) + ["Laplace (MLPGAMLaplace)"] * len(nominal_levels),
    }
)

p = plot_reliability_diagram(
    nominal=nominal_levels,
    empirical=cov_normal,
    group_col="group",
    groups=["Normal (MLPFNormal)", "Laplace (MLPGAMLaplace)"],
    title="Reliability diagram",
)
p

Residual Distribution Diagnostics#

plot_kde and plot_cdf let you inspect the shape of the forecast residuals - a quick sanity check that errors are roughly symmetric and unimodal before committing to a Gaussian distributional assumption.

from twiga.core.plot import plot_cdf, plot_kde

residuals_df = pd.DataFrame(
    {
        "residual": pred_normal["Actual"].values - pred_normal["forecast"].values,
    }
)

p_kde = plot_kde(
    residuals_df,
    x_col="residual",
    title="Residual Density — MLPFNormal",
    x_label="Residual (kW)",
)
p_kde

p_cdf = plot_cdf(
    residuals_df,
    x_col="residual",
    title="Residual Cumulative Distribution — MLPFNormal",
    x_label="Residual (kW)",
)
p_cdf

Forecast Error Distribution by Step#

plot_distribution groups by an x-axis column and renders the per-group mean ± 1σ ribbon - useful for diagnosing whether errors are uniformly distributed across the forecast horizon or worsen at specific steps.

from twiga.core.plot import plot_distribution

dist_df = pred_normal.copy()
dist_df["step"] = np.arange(len(dist_df)) % data_config.forecast_horizon
dist_df["residual"] = dist_df["Actual"] - dist_df["forecast"]

p_dist = plot_distribution(
    dist_df,
    x_col="step",
    y_col="residual",
    title="Residual Distribution by Forecast Step — MLPFNormal",
    x_label="Step (30 min)",
    y_label="Residual (kW)",
)
p_dist

PIT Histogram#

The Probability Integral Transform (PIT) histogram is the gold-standard calibration diagnostic for parametric forecasts (Gneiting et al., 2007). A flat histogram indicates perfect calibration; a hump shape means the model is overconfident (intervals too narrow); a U-shape means it is over-dispersed.

from twiga.core.plot import plot_pit_histogram

# Recover sigma from the pre-computed 90 % interval: upper = mu + z_0.95 * sigma
mu_pit = mu_n
sigma_pit = sigma_n
y_pit = actual

pit_values = scipy_norm.cdf(y_pit, loc=mu_pit, scale=sigma_pit)

p_pit = plot_pit_histogram(
    pit_values,
    n_bins=20,
    title="PIT Histogram — MLPFNormal",
)
p_pit

Reliability Diagram#

Plots empirical coverage against nominal coverage levels. Points on the diagonal = perfectly calibrated. Points above = conservative (intervals too wide). Points below = anti-conservative (intervals too narrow).

from twiga.core.plot import plot_reliability_diagram

nominal_levels = np.linspace(0.10, 0.95, 18)

mu_rd = mu_n
sigma_rd = sigma_n
y_rd = actual

empirical_coverage = np.array(
    [
        (
            (y_rd >= mu_rd - scipy_norm.ppf(1 - (1 - lvl) / 2) * sigma_rd)
            & (y_rd <= mu_rd + scipy_norm.ppf(1 - (1 - lvl) / 2) * sigma_rd)
        ).mean()
        for lvl in nominal_levels
    ]
)

p_rel = plot_reliability_diagram(
    nominal=nominal_levels,
    empirical=empirical_coverage,
    title="Reliability Diagram — MLPFNormal",
)
p_rel

Wrapping up#

What you did

Chose distribution families based on signal characteristics (Normal, Laplace, Gamma)
Configured parametric distribution heads on MLPF and MLPGAM backbone architectures
Trained models end-to-end with NLL loss and extracted predictive mean and credible intervals
Compared Normal vs. Laplace tails with density plots and residual diagnostics
Evaluated distributional quality using CRPS, PIT histograms, and reliability diagrams

Key takeaways

Parametric models output distribution parameters (µ, σ) - not point predictions - and are trained by minimising NLL rather than a point loss.
The choice of distribution family matters: Normal for symmetric signals, Laplace for heavy-tailed ones, Gamma/LogNormal for strictly positive targets.
CRPS is the gold-standard metric for comparing parametric models - it evaluates the full distributional shape in a single score.
A well-calibrated model tracks the diagonal of the reliability diagram; systematic deviations reveal distributional mismatch.
Parametric heads capture aleatoric (irreducible) uncertainty - epistemic uncertainty requires ensembles or Bayesian approaches.

What’s next?#

NB09 - Conformal Prediction (09-conformal-prediction.ipynb)

Learn how to wrap any fitted Twiga model with a coverage-guaranteed conformal calibration step. Conformal prediction requires no distributional assumption and works with any base model - including the parametric models you trained here.

# ruff: noqa: E501, E701, E702
from IPython.display import HTML

_TEAL = "#107591"
_TEAL_MID = "#069fac"
_TEAL_LIGHT = "#e8f5f8"
_TEAL_BEST = "#d0ecf1"
_TEXT_DARK = "#2d3748"
_TEXT_MUTED = "#718096"
_WHITE = "#ffffff"

steps = [
    {
        "num": "07",
        "title": "Neural Networks",
        "desc": "MLPF · N-HiTS · Lightning training · sequence embeddings",
        "tags": ["neural network", "pytorch"],
        "active": False,
    },
    {
        "num": "08",
        "title": "Quantile Regression",
        "desc": "QR-LightGBM · FPQR — calibrated prediction intervals",
        "tags": ["quantile", "pinball loss"],
        "active": False,
    },
    {
        "num": "09",
        "title": "Parametric Distributions",
        "desc": "Normal · Laplace · Gamma heads — NLL training · CRPS evaluation",
        "tags": ["parametric", "NLL", "CRPS"],
        "active": True,
    },
    {
        "num": "10",
        "title": "Conformal Prediction",
        "desc": "Coverage-guaranteed intervals with CQR and CRC wrappers",
        "tags": ["conformal", "CQR", "coverage"],
        "active": False,
    },
    {
        "num": "11",
        "title": "Hyperparameter Tuning",
        "desc": "Optuna-backed HPO · search spaces · resumable SQLite",
        "tags": ["optuna", "HPO", "tuning"],
        "active": False,
    },
]
track_name = "Probabilistic Track"
footer = 'Next: wrap any model with <span style="color:#107591;font-weight:600;">Conformal Prediction</span> (10) for finite-sample coverage guarantees.'


def _b(t, bg, fg):
    return f'<span style="display:inline-block;background:{bg};color:{fg};font-size:10px;font-weight:600;padding:2px 7px;border-radius:10px;margin:2px 2px 0 0;">{t}</span>'


ch = ""
for i, s in enumerate(steps):
    a = s["active"]
    cb = _TEAL if a else _WHITE
    cbo = _TEAL if a else "#d1ecf1"
    nb = _TEAL_MID if a else _TEAL_LIGHT
    nf = _WHITE if a else _TEAL
    tf = _WHITE if a else _TEXT_DARK
    df = "#cce8ef" if a else _TEXT_MUTED
    bb = "#0d5f75" if a else _TEAL_BEST
    bf = "#b8e4ed" if a else _TEAL
    yh = (
        f'<span style="float:right;background:{_TEAL_MID};color:{_WHITE};font-size:10px;font-weight:700;padding:2px 10px;border-radius:12px;">★ you are here</span>'
        if a
        else ""
    )
    bdg = "".join(_b(t, bb, bf) for t in s["tags"])
    ch += f'<div style="background:{cb};border:2px solid {cbo};border-radius:12px;padding:16px 20px;display:flex;align-items:flex-start;gap:16px;box-shadow:{"0 4px 14px rgba(16,117,145,.25)" if a else "0 1px 4px rgba(0,0,0,.06)"};"><div style="min-width:44px;height:44px;background:{nb};color:{nf};border-radius:50%;display:flex;align-items:center;justify-content:center;font-size:15px;font-weight:800;flex-shrink:0;">{s["num"]}</div><div style="flex:1;"><div style="font-size:15px;font-weight:700;color:{tf};margin-bottom:4px;">{s["title"]}{yh}</div><div style="font-size:12.5px;color:{df};margin-bottom:8px;line-height:1.5;">{s["desc"]}</div><div>{bdg}</div></div></div>'
    if i < len(steps) - 1:
        ch += f'<div style="display:flex;justify-content:center;height:32px;"><svg width="24" height="32" viewBox="0 0 24 32" fill="none"><line x1="12" y1="0" x2="12" y2="24" stroke="{_TEAL_MID}" stroke-width="2" stroke-dasharray="4 3"/><polygon points="6,20 18,20 12,30" fill="{_TEAL_MID}"/></svg></div>'

HTML(
    f'<div style="font-family:Inter,\'Segoe UI\',sans-serif;max-width:640px;margin:8px 0;"><div style="background:linear-gradient(135deg,{_TEAL} 0%,{_TEAL_MID} 100%);border-radius:12px 12px 0 0;padding:14px 20px;display:flex;align-items:center;gap:10px;"><svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="{_WHITE}" stroke-width="2"><path d="M12 2L2 7l10 5 10-5-10-5z"/><path d="M2 17l10 5 10-5"/><path d="M2 12l10 5 10-5"/></svg><span style="color:{_WHITE};font-size:14px;font-weight:700;">Twiga Learning Path — {track_name}</span></div><div style="border:2px solid {_TEAL_LIGHT};border-top:none;border-radius:0 0 12px 12px;padding:20px 20px 16px;background:#f9fdfe;display:flex;flex-direction:column;">{ch}<div style="margin-top:16px;font-size:11.5px;color:{_TEXT_MUTED};text-align:center;border-top:1px solid {_TEAL_LIGHT};padding-top:12px;">{footer}</div></div></div>'
)