Parametric Distributions#
What you’ll build
Probabilistic neural network forecasters using parametric distribution heads (Normal, Laplace, Gamma) trained with negative log-likelihood loss, evaluated on CRPS and calibration diagrams.
Prerequisites
07 - Neural Networks (MLPF, NHiTS, Lightning training loop)
08 - Quantile Regression (interval metrics: PICP, NMPI, Winkler)
Python: basic probability distributions helpful
Learning objectives
By the end of this notebook you will be able to:
Choose a distribution family (Normal, Laplace, Gamma, LogNormal, Beta) based on signal characteristics
Configure and train probabilistic neural networks using negative log-likelihood loss
Evaluate probabilistic forecasts with CRPS and Winkler score
Interpret reliability diagrams to diagnose over- and under-coverage
Compare parametric distribution heads and select the best-calibrated model
1. Choosing the right distribution#
Different physical signals have fundamentally different statistical shapes. Picking a distribution that matches that shape is the most important modelling decision in parametric forecasting.
Key concept - parametric distributions
Instead of predicting a single number, a parametric model outputs the parameters of a probability distribution - for example, a mean µ and standard deviation σ for the Normal family. The model is trained by maximising the log-likelihood of the observed targets under the predicted distribution (equivalently, minimising the negative log-likelihood, NLL). This is fundamentally different from pinball/quantile loss, which directly targets specific quantile levels. NLL training uses all the information in the distributional shape, making it more data-efficient when the chosen family is a good match - but poorly calibrated when it is not.
Normal: symmetric, unbounded - the natural default for net load or temperature deltas.
Laplace: heavier tails than Normal - robust to outlier spikes (electricity prices, residual demand).
Gamma / LogNormal: strictly positive, right-skewed - ideal for PV generation or aggregate wind.
Beta: bounded in [0, 1] - suited to capacity factors and state-of-charge signals.
The forecastability profile from NB02 told us NetLoad(kW) is approximately symmetric and can go negative - Normal is the natural starting point.
import warnings
from great_tables import GT, md
from IPython.display import clear_output
from lets_plot import LetsPlot
import numpy as np
import pandas as pd
from sklearn.preprocessing import RobustScaler, StandardScaler
from twiga.core.plot.gt import twiga_gt, twiga_report
LetsPlot.setup_html()
from twiga.core.plot import (
plot_density,
plot_forecast,
plot_forecast_grid,
plot_metrics_bar,
plot_reliability_diagram,
)
from twiga.core.utils import configure, get_logger
warnings.filterwarnings("ignore")
configure()
log = get_logger("tutorials")
2. Setup#
Load data#
data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)
# Restrict to 2019-2020 to keep tutorial execution fast
data = data[(data["timestamp"] >= "2019-01-01") & (data["timestamp"] <= "2020-12-31")].reset_index(drop=True)
log.info("Shape: %s", data.shape)
twiga_gt(GT(data.head()))
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[2], line 1
----> 1 data = pd.read_parquet("../data/MLVS-PT.parquet")
2 data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
3 data["timestamp"] = pd.to_datetime(data["timestamp"])
4 data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)
File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
666 use_nullable_dtypes = False
667 check_dtype_backend(dtype_backend)
--> 669 return impl.read(
670 path,
671 columns=columns,
672 filters=filters,
673 storage_options=storage_options,
674 use_nullable_dtypes=use_nullable_dtypes,
675 dtype_backend=dtype_backend,
676 filesystem=filesystem,
677 **kwargs,
678 )
File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
256 if manager == "array":
257 to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
259 path,
260 filesystem,
261 storage_options=storage_options,
262 mode="rb",
263 )
264 try:
265 pa_table = self.api.parquet.read_table(
266 path_or_handle,
267 columns=columns,
(...) 270 **kwargs,
271 )
File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:141, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
131 handles = None
132 if (
133 not fs
134 and not is_dir
(...) 139 # fsspec resources can also point to directories
140 # this branch is used for example when reading from non-fsspec URLs
--> 141 handles = get_handle(
142 path_or_handle, mode, is_text=False, storage_options=storage_options
143 )
144 fs = None
145 path_or_handle = handles.handle
File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/common.py:882, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
873 handle = open(
874 handle,
875 ioargs.mode,
(...) 878 newline="",
879 )
880 else:
881 # Binary mode
--> 882 handle = open(handle, ioargs.mode)
883 handles.append(handle)
885 # Convert BytesIO or file objects passed with an encoding
FileNotFoundError: [Errno 2] No such file or directory: '../data/MLVS-PT.parquet'
Train / val / test splits#
We use the same fixed temporal split as all other tutorials.
df_splits = pd.DataFrame(
{
"Split": ["train", "val", "test"],
"Period": ["before 2020-01-01", "2020-01-01 – 2020-06-30", "2020-07-01 onwards"],
"Role": ["Model training", "Early stopping", "Final evaluation"],
}
)
twiga_gt(
GT(df_splits)
.tab_header(title=md("**Data splits**"), subtitle="Fixed temporal partition")
.cols_label(**{c: md(f"**{c}**") for c in df_splits.columns})
.tab_source_note("Twiga Forecast"),
n_rows=len(df_splits),
)
train_df = data[data["timestamp"] < "2020-01-01"].reset_index(drop=True)
val_df = data[(data["timestamp"] >= "2020-01-01") & (data["timestamp"] < "2020-07-01")].reset_index(drop=True)
test_df = data[data["timestamp"] >= "2020-07-01"].reset_index(drop=True)
log.info(
f"train : {train_df.shape[0]:,} rows ({train_df['timestamp'].min().date()} → {train_df['timestamp'].max().date()})"
)
log.info(f"val : {val_df.shape[0]:,} rows ({val_df['timestamp'].min().date()} → {val_df['timestamp'].max().date()})")
log.info(
f"test : {test_df.shape[0]:,} rows ({test_df['timestamp'].min().date()} → {test_df['timestamp'].max().date()})"
)
Data and training configs#
from twiga.core.config import ConformalConfig, DataPipelineConfig, ForecasterConfig
data_config = DataPipelineConfig(
target_feature="NetLoad(kW)",
period="30min",
latitude=32.371666,
longitude=-16.274998,
calendar_features=["hour", "day_night"],
exogenous_features=["Ghi"],
forecast_horizon=48,
stride=48,
lookback_window_size=96,
input_scaler=StandardScaler(),
target_scaler=RobustScaler(),
)
train_config = ForecasterConfig(project_name="Experiment-parametric")
conformal_config = ConformalConfig(method="residual", alpha=0.1)
data_config
df_dist = pd.DataFrame(
{
"Signal characteristic": [
"Symmetric, can go negative (net load, temp delta)",
"Heavy-tailed, outlier-prone (price spikes)",
"Strictly positive, right-skewed (PV, wind)",
"Bounded [0, 1] (capacity factor, SoC)",
],
"Distribution": ["Normal", "Laplace / StudentT", "LogNormal / Gamma", "Beta"],
"Config shorthand": [
'MLPFConfig(distribution="normal")',
'MLPGAMConfig(distribution="laplace")',
'MLPFConfig(distribution="lognormal")',
'MLPGAMConfig(distribution="beta")',
],
}
)
twiga_gt(
GT(df_dist)
.tab_header(
title=md("**Distribution family selector**"),
subtitle="Match the distribution to the physical shape of your signal",
)
.cols_label(**{c: md(f"**{c}**") for c in df_dist.columns})
.tab_source_note("Twiga Forecast"),
n_rows=len(df_dist),
)
3. The parametric head interface#
Every distribution in Twiga is an nn.Module that wraps a lightweight linear projection on top of the backbone’s latent vector. They all share the same three-method contract:
forward(z)→ distribution parameters as tensorsget_distribution(*params)→ atorch.distributionsobjectget_log_likelihood(*params, targets)→ negative log-likelihood scalar (the training loss)
The DISTRIBUTIONS registry maps string names to classes, and build_distribution instantiates them by name.
import torch
from twiga.distributions.nn import DISTRIBUTIONS, build_distribution
log.info("Available distributions: %s", list(DISTRIBUTIONS.keys()))
# Peek at one head
head = build_distribution("normal", num_target_output=1, hidden_size=64, forecast_horizon=48)
z = torch.randn(4, 64) # batch of 4 samples
mu, sigma = head(z)
log.info("mu shape : %s", mu.shape) # (4, 48, 1)
log.info("sigma shape: %s", sigma.shape)
dist = head.get_distribution(mu, sigma)
samples = dist.sample()
log.info("sample shape: %s", samples.shape)
Shape convention - all parametric heads output tensors of shape (B, forecast_horizon, num_target_output), where B is the batch size. This matches the target tensor shape used throughout the training loop, so no reshaping is needed before computing the NLL loss.
4. Normal distribution: NetLoad (MLPF backbone)#
The Normal head predicts a mean mu and a standard deviation sigma for every horizon step. The 90 % prediction interval is [mu − 1.645σ, mu + 1.645σ]. Because NetLoad(kW) is approximately symmetric and can go negative, Normal is the textbook choice.
from twiga import TwigaForecaster
from twiga.models.nn import MLPFConfig
normal_config = MLPFConfig(distribution="normal", max_epochs=5, rich_progress_bar=False)
forecaster_normal = TwigaForecaster(
data_params=data_config,
model_params=[normal_config],
train_params=train_config,
conformal_params=conformal_config,
)
forecaster_normal.fit(train_df=train_df, val_df=val_df)
clear_output()
pred_normal, metric_normal = forecaster_normal.evaluate_parametric_forecast(test_df=test_df)
clear_output()
def get_metric_table(metric_df):
res = metric_df.groupby("Model")[["mae", "corr", "nll", "crps", "dss"]].mean().round(2).reset_index()
res = res.rename(columns={"mae": "MAE", "corr": "Corr", "nll": "NLL", "crps": "CRPS", "dss": "DSS"})
metric_name = ["MAE", "Corr", "CRPS", "NLL", "DSS"]
minimize_cols = ["MAE", "CRPS", "NLL", "DSS"]
maximize_cols = ["Corr"]
return twiga_report(res, metric_name, minimize_cols, maximize_cols)
get_metric_table(metric_normal)
p = plot_forecast(
pred_normal.iloc[: 7 * 48],
actual_col="Actual",
forecast_col="forecast",
title="MLPFNormal — point forecast",
y_label="Net Load (kW)",
x_label="Step (30 min)",
)
p
6. Laplace: heavier tails#
Laplace has heavier tails than Normal - it assigns more probability to extreme events. For signals with frequent, sharp spikes (spot electricity prices, residual net load during demand response events), Laplace can produce better-calibrated intervals than Normal while using the same MLPGAM backbone’s additive structure.
from twiga.models.nn import MLPFConfig
laplace_config = MLPFConfig(distribution="laplace", max_epochs=5, rich_progress_bar=False)
forecaster_laplace = TwigaForecaster(
data_params=data_config,
model_params=[laplace_config],
train_params=train_config,
conformal_params=conformal_config,
)
forecaster_laplace.fit(train_df=train_df, val_df=val_df)
clear_output()
pred_laplace, metric_laplace = forecaster_laplace.evaluate_parametric_forecast(test_df=test_df)
clear_output()
get_metric_table(metric_laplace)
8. Evaluating distributional quality#
Point metrics (MAE, RMSE) only assess the mean forecast. To evaluate the full predicted distribution we need interval-aware metrics. We use the Normal model’s outputs here since it was trained on the original (unclipped) data.
Key concept - CRPS
The Continuous Ranked Probability Score (CRPS) measures the entire predictive distribution against the observed outcome in a single number. Formally it is the integrated squared difference between the predicted CDF F and the empirical step-function at the observation y:
CRPS(F, y) = ∫ (F(z) − 𝟙[z ≥ y])² dz
It unifies point and interval evaluation: a degenerate (point) forecast recovers MAE, while a perfectly calibrated distribution achieves the minimum possible CRPS. Lower is better. Unlike interval metrics that target a fixed coverage level, CRPS evaluates the full distributional shape, making it the standard metric for comparing parametric models.
Key concept - aleatoric vs. epistemic uncertainty
Parametric heads capture aleatoric uncertainty - the irreducible randomness in the signal itself (e.g. weather-driven demand variability). The predicted σ grows where the data is intrinsically noisy regardless of how much more training data you add. Epistemic uncertainty - uncertainty due to limited data or model capacity - is not directly represented by a single parametric head; ensemble methods or Bayesian approaches are needed for that. When interpreting prediction intervals here, you are reading aleatoric uncertainty only.
Interval metrics from get_interval_metrics#
Reliability diagram#
A reliability diagram checks calibration: for every nominal coverage level 1 − α, the empirical fraction of test points inside the predicted interval should equal 1 − α. A perfectly calibrated model lies on the diagonal.
How to read the plot:
Points on the diagonal → the model is perfectly calibrated at that level
Points above the diagonal → over-coverage (intervals are wider than necessary - safe but inefficient)
Points below the diagonal → under-coverage (intervals are too narrow - the nominal guarantee is not met)
A systematic upward bow → the distribution family has too heavy tails (e.g. Laplace overestimates spread for a Normal signal)
A systematic downward bow → the distribution family is too light-tailed, underestimating extreme events
collection_n = forecaster_normal.forecast(test_df)
results_n = next(iter(collection_n.results.values()))
collection_l = forecaster_laplace.forecast(test_df)
results_l = next(iter(collection_l.results.values()))
clear_output()
mu_n = results_n.loc.flatten()
sigma_n = results_n.scale.flatten()
actual = results_n.ground_truth.flatten()
mu_l = results_l.loc.flatten()
b_l = results_l.scale.flatten()
from scipy.stats import laplace as scipy_laplace, norm as scipy_norm
alphas_rd = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5]
cov_normal = []
cov_laplace = []
for a in alphas_rd:
z = scipy_norm.ppf(1 - a / 2)
cov_normal.append(((actual >= mu_n - z * sigma_n) & (actual <= mu_n + z * sigma_n)).mean())
b_ppf = -np.log(a / 2)
cov_laplace.append(((actual >= mu_l - b_ppf * b_l) & (actual <= mu_l + b_ppf * b_l)).mean())
nominal_levels = [1 - a for a in alphas_rd]
empirical_data = pd.DataFrame(
{
"nominal": nominal_levels * 2,
"empirical": cov_normal + cov_laplace,
"group": ["Normal (MLPFNormal)"] * len(nominal_levels) + ["Laplace (MLPGAMLaplace)"] * len(nominal_levels),
}
)
p = plot_reliability_diagram(
nominal=nominal_levels,
empirical=cov_normal,
group_col="group",
groups=["Normal (MLPFNormal)", "Laplace (MLPGAMLaplace)"],
title="Reliability diagram",
)
p
Residual Distribution Diagnostics#
plot_kde and plot_cdf let you inspect the shape of the forecast residuals -
a quick sanity check that errors are roughly symmetric and unimodal before
committing to a Gaussian distributional assumption.
from twiga.core.plot import plot_cdf, plot_kde
residuals_df = pd.DataFrame(
{
"residual": pred_normal["Actual"].values - pred_normal["forecast"].values,
}
)
p_kde = plot_kde(
residuals_df,
x_col="residual",
title="Residual Density — MLPFNormal",
x_label="Residual (kW)",
)
p_kde
p_cdf = plot_cdf(
residuals_df,
x_col="residual",
title="Residual Cumulative Distribution — MLPFNormal",
x_label="Residual (kW)",
)
p_cdf
Forecast Error Distribution by Step#
plot_distribution groups by an x-axis column and renders the per-group
mean ± 1σ ribbon - useful for diagnosing whether errors are uniformly
distributed across the forecast horizon or worsen at specific steps.
from twiga.core.plot import plot_distribution
dist_df = pred_normal.copy()
dist_df["step"] = np.arange(len(dist_df)) % data_config.forecast_horizon
dist_df["residual"] = dist_df["Actual"] - dist_df["forecast"]
p_dist = plot_distribution(
dist_df,
x_col="step",
y_col="residual",
title="Residual Distribution by Forecast Step — MLPFNormal",
x_label="Step (30 min)",
y_label="Residual (kW)",
)
p_dist
PIT Histogram#
The Probability Integral Transform (PIT) histogram is the gold-standard calibration diagnostic for parametric forecasts (Gneiting et al., 2007). A flat histogram indicates perfect calibration; a hump shape means the model is overconfident (intervals too narrow); a U-shape means it is over-dispersed.
from twiga.core.plot import plot_pit_histogram
# Recover sigma from the pre-computed 90 % interval: upper = mu + z_0.95 * sigma
mu_pit = mu_n
sigma_pit = sigma_n
y_pit = actual
pit_values = scipy_norm.cdf(y_pit, loc=mu_pit, scale=sigma_pit)
p_pit = plot_pit_histogram(
pit_values,
n_bins=20,
title="PIT Histogram — MLPFNormal",
)
p_pit
Reliability Diagram#
Plots empirical coverage against nominal coverage levels. Points on the diagonal = perfectly calibrated. Points above = conservative (intervals too wide). Points below = anti-conservative (intervals too narrow).
from twiga.core.plot import plot_reliability_diagram
nominal_levels = np.linspace(0.10, 0.95, 18)
mu_rd = mu_n
sigma_rd = sigma_n
y_rd = actual
empirical_coverage = np.array(
[
(
(y_rd >= mu_rd - scipy_norm.ppf(1 - (1 - lvl) / 2) * sigma_rd)
& (y_rd <= mu_rd + scipy_norm.ppf(1 - (1 - lvl) / 2) * sigma_rd)
).mean()
for lvl in nominal_levels
]
)
p_rel = plot_reliability_diagram(
nominal=nominal_levels,
empirical=empirical_coverage,
title="Reliability Diagram — MLPFNormal",
)
p_rel
Wrapping up#
What you did
Chose distribution families based on signal characteristics (Normal, Laplace, Gamma)
Configured parametric distribution heads on MLPF and MLPGAM backbone architectures
Trained models end-to-end with NLL loss and extracted predictive mean and credible intervals
Compared Normal vs. Laplace tails with density plots and residual diagnostics
Evaluated distributional quality using CRPS, PIT histograms, and reliability diagrams
Key takeaways
Parametric models output distribution parameters (µ, σ) - not point predictions - and are trained by minimising NLL rather than a point loss.
The choice of distribution family matters: Normal for symmetric signals, Laplace for heavy-tailed ones, Gamma/LogNormal for strictly positive targets.
CRPS is the gold-standard metric for comparing parametric models - it evaluates the full distributional shape in a single score.
A well-calibrated model tracks the diagonal of the reliability diagram; systematic deviations reveal distributional mismatch.
Parametric heads capture aleatoric (irreducible) uncertainty - epistemic uncertainty requires ensembles or Bayesian approaches.
What’s next?#
NB09 - Conformal Prediction (09-conformal-prediction.ipynb)
Learn how to wrap any fitted Twiga model with a coverage-guaranteed conformal calibration step. Conformal prediction requires no distributional assumption and works with any base model - including the parametric models you trained here.
# ruff: noqa: E501, E701, E702
from IPython.display import HTML
_TEAL = "#107591"
_TEAL_MID = "#069fac"
_TEAL_LIGHT = "#e8f5f8"
_TEAL_BEST = "#d0ecf1"
_TEXT_DARK = "#2d3748"
_TEXT_MUTED = "#718096"
_WHITE = "#ffffff"
steps = [
{
"num": "07",
"title": "Neural Networks",
"desc": "MLPF · N-HiTS · Lightning training · sequence embeddings",
"tags": ["neural network", "pytorch"],
"active": False,
},
{
"num": "08",
"title": "Quantile Regression",
"desc": "QR-LightGBM · FPQR — calibrated prediction intervals",
"tags": ["quantile", "pinball loss"],
"active": False,
},
{
"num": "09",
"title": "Parametric Distributions",
"desc": "Normal · Laplace · Gamma heads — NLL training · CRPS evaluation",
"tags": ["parametric", "NLL", "CRPS"],
"active": True,
},
{
"num": "10",
"title": "Conformal Prediction",
"desc": "Coverage-guaranteed intervals with CQR and CRC wrappers",
"tags": ["conformal", "CQR", "coverage"],
"active": False,
},
{
"num": "11",
"title": "Hyperparameter Tuning",
"desc": "Optuna-backed HPO · search spaces · resumable SQLite",
"tags": ["optuna", "HPO", "tuning"],
"active": False,
},
]
track_name = "Probabilistic Track"
footer = 'Next: wrap any model with <span style="color:#107591;font-weight:600;">Conformal Prediction</span> (10) for finite-sample coverage guarantees.'
def _b(t, bg, fg):
return f'<span style="display:inline-block;background:{bg};color:{fg};font-size:10px;font-weight:600;padding:2px 7px;border-radius:10px;margin:2px 2px 0 0;">{t}</span>'
ch = ""
for i, s in enumerate(steps):
a = s["active"]
cb = _TEAL if a else _WHITE
cbo = _TEAL if a else "#d1ecf1"
nb = _TEAL_MID if a else _TEAL_LIGHT
nf = _WHITE if a else _TEAL
tf = _WHITE if a else _TEXT_DARK
df = "#cce8ef" if a else _TEXT_MUTED
bb = "#0d5f75" if a else _TEAL_BEST
bf = "#b8e4ed" if a else _TEAL
yh = (
f'<span style="float:right;background:{_TEAL_MID};color:{_WHITE};font-size:10px;font-weight:700;padding:2px 10px;border-radius:12px;">★ you are here</span>'
if a
else ""
)
bdg = "".join(_b(t, bb, bf) for t in s["tags"])
ch += f'<div style="background:{cb};border:2px solid {cbo};border-radius:12px;padding:16px 20px;display:flex;align-items:flex-start;gap:16px;box-shadow:{"0 4px 14px rgba(16,117,145,.25)" if a else "0 1px 4px rgba(0,0,0,.06)"};"><div style="min-width:44px;height:44px;background:{nb};color:{nf};border-radius:50%;display:flex;align-items:center;justify-content:center;font-size:15px;font-weight:800;flex-shrink:0;">{s["num"]}</div><div style="flex:1;"><div style="font-size:15px;font-weight:700;color:{tf};margin-bottom:4px;">{s["title"]}{yh}</div><div style="font-size:12.5px;color:{df};margin-bottom:8px;line-height:1.5;">{s["desc"]}</div><div>{bdg}</div></div></div>'
if i < len(steps) - 1:
ch += f'<div style="display:flex;justify-content:center;height:32px;"><svg width="24" height="32" viewBox="0 0 24 32" fill="none"><line x1="12" y1="0" x2="12" y2="24" stroke="{_TEAL_MID}" stroke-width="2" stroke-dasharray="4 3"/><polygon points="6,20 18,20 12,30" fill="{_TEAL_MID}"/></svg></div>'
HTML(
f'<div style="font-family:Inter,\'Segoe UI\',sans-serif;max-width:640px;margin:8px 0;"><div style="background:linear-gradient(135deg,{_TEAL} 0%,{_TEAL_MID} 100%);border-radius:12px 12px 0 0;padding:14px 20px;display:flex;align-items:center;gap:10px;"><svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="{_WHITE}" stroke-width="2"><path d="M12 2L2 7l10 5 10-5-10-5z"/><path d="M2 17l10 5 10-5"/><path d="M2 12l10 5 10-5"/></svg><span style="color:{_WHITE};font-size:14px;font-weight:700;">Twiga Learning Path — {track_name}</span></div><div style="border:2px solid {_TEAL_LIGHT};border-top:none;border-radius:0 0 12px 12px;padding:20px 20px 16px;background:#f9fdfe;display:flex;flex-direction:column;">{ch}<div style="margin-top:16px;font-size:11.5px;color:{_TEXT_MUTED};text-align:center;border-top:1px solid {_TEAL_LIGHT};padding-top:12px;">{footer}</div></div></div>'
)