Ensemble Strategies#

Level Python Twiga Time


What you’ll build

A 6-model ensemble (LinearReg, LightGBM, XGBoost, CatBoost, MLPF, N-HiTS) evaluated with mean, median, and inverse-error weighted-mean strategies, compared on MAE and RMSE to identify when ensembling beats individual models.

Prerequisites

  • 05 - ML Point Forecasting (multi-model setup)

  • 07 - Neural Networks (MLPF, NHiTS)

  • 11 - Hyperparameter Tuning (optional, for tuned component models)

  • Python: basic statistics

Learning objectives

By the end of this notebook you will be able to:

  1. Explain bias-variance decomposition and why diverse models ensemble better than similar ones

  2. Configure a multi-model TwigaForecaster with ML and NN components together

  3. Apply mean, median, and weighted-mean ensemble strategies via ensemble_strategy

  4. Derive the inverse-error weighting formula and implement it manually

  5. Identify when an ensemble adds value and when it is not worth the inference cost

1. Motivation#

A single model can fail for many reasons: a poorly tuned learning rate, a bad random seed, a blind spot in its inductive bias. When several models make uncorrelated errors, averaging them cancels noise and reduces variance - a classic result from the bias-variance decomposition.

The key condition is diversity: if two models fail on exactly the same days, averaging does nothing. When a tree-based model and a neural network fail on different days, their average is better than either alone.

Rule of thumb: if the pairwise correlation of individual model errors is below ~0.9, an ensemble will likely improve over the best single model.

Key concept - ensemble learning

The bias-variance decomposition tells us that total prediction error = bias² + variance + irreducible noise. A single high-capacity model often has low bias but high variance (it overfits specific training patterns). Averaging multiple models keeps bias low while reducing variance - provided the models make different errors.

This is the “wisdom of crowds” effect: a crowd of diverse, independently reasoning individuals is collectively more accurate than any single expert, even if that expert is slightly better on average. In machine learning terms:

  • Similar models (same family, same features) share correlated errors → averaging them reduces variance only marginally.

  • Diverse models (different families, different inductive biases) have lower error correlation → averaging them achieves larger variance reduction.

Mixing tree-based models (which partition feature space with axis-aligned splits) with neural networks (which learn smooth, distributed representations) is one of the most reliable diversity strategies available.

2. Setup#

import warnings

from great_tables import GT
from lets_plot import LetsPlot
import pandas as pd

LetsPlot.setup_html()

from twiga.core.plot import (
    plot_forecast_grid,
    plot_metrics_bar,
)
from twiga.core.utils import configure, get_logger

warnings.filterwarnings("ignore")

configure()
log = get_logger("tutorials")

Load data#

The dataset covers Madeira, Portugal (32.37°N, 16.27°W) at 30-minute resolution. We load only the columns we need.

data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)
# Restrict to 2019-2020 to keep tutorial execution fast
data = data[(data["timestamp"] >= "2019-01-01") & (data["timestamp"] <= "2020-12-31")].reset_index(drop=True)

log.info("Shape: %s", data.shape)
GT(data.head())
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 1
----> 1 data = pd.read_parquet("../data/MLVS-PT.parquet")
      2 data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
      3 data["timestamp"] = pd.to_datetime(data["timestamp"])
      4 data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
    666     use_nullable_dtypes = False
    667 check_dtype_backend(dtype_backend)
--> 669 return impl.read(
    670     path,
    671     columns=columns,
    672     filters=filters,
    673     storage_options=storage_options,
    674     use_nullable_dtypes=use_nullable_dtypes,
    675     dtype_backend=dtype_backend,
    676     filesystem=filesystem,
    677     **kwargs,
    678 )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
    256 if manager == "array":
    257     to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
    259     path,
    260     filesystem,
    261     storage_options=storage_options,
    262     mode="rb",
    263 )
    264 try:
    265     pa_table = self.api.parquet.read_table(
    266         path_or_handle,
    267         columns=columns,
   (...)    270         **kwargs,
    271     )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:141, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
    131 handles = None
    132 if (
    133     not fs
    134     and not is_dir
   (...)    139     # fsspec resources can also point to directories
    140     # this branch is used for example when reading from non-fsspec URLs
--> 141     handles = get_handle(
    142         path_or_handle, mode, is_text=False, storage_options=storage_options
    143     )
    144     fs = None
    145     path_or_handle = handles.handle

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/common.py:882, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    873         handle = open(
    874             handle,
    875             ioargs.mode,
   (...)    878             newline="",
    879         )
    880     else:
    881         # Binary mode
--> 882         handle = open(handle, ioargs.mode)
    883     handles.append(handle)
    885 # Convert BytesIO or file objects passed with an encoding

FileNotFoundError: [Errno 2] No such file or directory: '../data/MLVS-PT.parquet'

Train / val / test splits#

from great_tables import GT, md
import pandas as pd

from twiga.core.plot.gt import twiga_gt

splits_df = pd.DataFrame(
    {
        "Split": ["Train", "Validation", "Test"],
        "Period": ["before 2020-01-01", "2020-01-01 – 2020-06-30", "2020-07-01 onwards"],
        "Purpose": [
            "Model learning across CV folds",
            "Weighted ensemble weight calibration",
            "Final honest evaluation",
        ],
    }
)

twiga_gt(
    GT(splits_df)
    .tab_header(
        title=md("**Dataset Splits**"),
        subtitle="Chronological — no shuffling, no overlap",
    )
    .cols_label(**{c: md(f"**{c}**") for c in splits_df.columns})
    .tab_source_note("MLVS-PT dataset · Madeira, Portugal · 30-min resolution"),
    n_rows=len(splits_df),
)
train_df = data[data["timestamp"] < "2020-01-01"].reset_index(drop=True)
val_df = data[(data["timestamp"] >= "2020-01-01") & (data["timestamp"] < "2020-07-01")].reset_index(drop=True)
test_df = data[data["timestamp"] >= "2020-07-01"].reset_index(drop=True)

log.info(
    f"train : {train_df.shape[0]:,} rows  ({train_df['timestamp'].min().date()}{train_df['timestamp'].max().date()})"
)
log.info(f"val   : {val_df.shape[0]:,} rows  ({val_df['timestamp'].min().date()}{val_df['timestamp'].max().date()})")
log.info(
    f"test  : {test_df.shape[0]:,} rows  ({test_df['timestamp'].min().date()}{test_df['timestamp'].max().date()})"
)

Shared data config#

All models share the same DataPipelineConfig - identical target, resolution, location, features, and horizon - ensuring a fair comparison.

from sklearn.preprocessing import RobustScaler, StandardScaler

from twiga.core.config import DataPipelineConfig, ForecasterConfig

data_config = DataPipelineConfig(
    target_feature="NetLoad(kW)",
    period="30min",
    latitude=32.371666,
    longitude=-16.274998,
    calendar_features=["hour", "day_night"],
    exogenous_features=["Ghi"],
    forecast_horizon=48,
    lookback_window_size=96,
    input_scaler=StandardScaler(),
    target_scaler=RobustScaler(),
)
train_config = ForecasterConfig(split_freq="months", train_size=3, test_size=1, num_splits=2)

data_config

3. Building a multi-model forecaster#

Passing multiple model configs to TwigaForecaster trains all of them in a single fit() call. Here we combine four ML tree models with two NN models.

Why mix ML and NN? Tree models (LightGBM, XGBoost, CatBoost) are strong on tabular feature interactions; NN models (MLPF, MLPGAM) learn sequential patterns and temporal structure. Their error patterns tend to be complementary - ideal for ensembling.

from IPython.display import clear_output

from twiga import TwigaForecaster
from twiga.models.ml import CATBOOSTConfig, LIGHTGBMConfig, LINEAREGConfig, XGBOOSTConfig
from twiga.models.nn import MLPFConfig, MLPGAMConfig

mlpf_config = MLPFConfig.from_data_config(data_config)
mlpf_config.max_epochs = 2
mlpf_config.rich_progress_bar = False

mlpgam_config = MLPGAMConfig.from_data_config(data_config)
mlpgam_config.max_epochs = 2
mlpgam_config.rich_progress_bar = False

forecaster = TwigaForecaster(
    data_params=data_config,
    model_params=[
        LINEAREGConfig(),
        LIGHTGBMConfig(),
        XGBOOSTConfig(device="cpu"),
        CATBOOSTConfig(task_type="CPU"),
        mlpf_config,
        mlpgam_config,
    ],
    train_params=train_config,
)
forecaster.fit(train_df=train_df, val_df=val_df)
clear_output()
log.info("All models trained.")

4. Strategy 1: Mean#

The simplest strategy: average predictions from all models equally. Works well when models are roughly equally accurate and errors are independent.

pred_mean, metric_mean = forecaster.evaluate_point_forecast(test_df=test_df, ensemble_strategy="mean")
clear_output()

agg_mean = metric_mean.groupby("Model")[["mae", "rmse", "corr"]].mean().round(3)
log.info("Mean ensemble — metrics by model:")
log.info("\n%s", agg_mean.to_string())

5. Strategy 2: Median#

The median is robust to outlier models. If one model produces a wildly inaccurate forecast on a given day, the median ignores it - unlike the mean, which is pulled toward extreme values.

Use median when you suspect that one or more models may occasionally produce large errors. Use mean when all models are well-calibrated and you want to exploit all available information.

pred_median, metric_median = forecaster.evaluate_point_forecast(test_df=test_df, ensemble_strategy="median")
clear_output()

agg_median = metric_median.groupby("Model")[["mae", "rmse", "corr"]].mean().round(3)
log.info("Median ensemble — metrics by model:")
log.info("\n%s", agg_median.to_string())

6. Strategy 3: Weighted#

In the weighted ensemble, each model’s contribution is scaled by its validation performance. Models that achieved lower validation error receive a higher weight - in effect, the ensemble automatically up-weights the best models.

Weights are computed internally from the validation data passed to fit() and are inversely proportional to each model’s validation MAE. You do not need to specify weights manually.

Key concept - weighted ensemble

In the weighted strategy, each model’s contribution is scaled inversely to its validation MAE: a model with half the MAE of another gets twice the weight. Concretely, if model i has validation MAE = eᵢ, its weight is:

wᵢ = (1 / eᵢ) / Σⱼ (1 / eⱼ)

When to prefer weighted over simple mean:

  • When models differ significantly in quality (one model is clearly better than others)

  • When you want the ensemble to adapt automatically without manual tuning

When to prefer mean:

  • When all models have similar validation errors - weighting adds noise without benefit

  • When the validation set is small and MAE estimates are noisy (weighting amplifies noise)

Important caveat: the weights are derived from the validation set passed to fit(). If the validation period is not representative of the test period, the weights may be miscalibrated. Always verify that the weighted ensemble outperforms the mean on a hold-out test set before committing to it in production.

pred_weighted, metric_weighted = forecaster.evaluate_point_forecast(test_df=test_df, ensemble_strategy="weighted")
clear_output()

agg_weighted = metric_weighted.groupby("Model")[["mae", "rmse", "corr"]].mean().round(3)
log.info("Weighted ensemble — metrics by model:")
log.info("\n%s", agg_weighted.to_string())

7. Per-model breakdown#

The metrics DataFrame contains a row for every model - including the ENSEMBLE row. Let’s isolate individual model performance and compare it directly to the ensemble.

# Find the ensemble label (it is stored as the uppercased string "ENSEMBLE")
log.info("Unique Model labels in metric_mean: %s", metric_mean["Model"].unique().tolist())
ensemble_label = "ENSEMBLE"

individual = (
    metric_mean[metric_mean["Model"] != ensemble_label].groupby("Model")[["mae", "rmse", "corr"]].mean().round(3)
)
ensemble_row = metric_mean[metric_mean["Model"] == ensemble_label][["mae", "rmse", "corr"]].mean().round(3)

log.info("Individual models (mean across folds):")
log.info("\n%s", individual.to_string())
log.info("\nEnsemble — Mean strategy:")
log.info("\n%s", ensemble_row.to_frame().T.to_string(index=False))

Bar chart: individual models vs ensemble#

bar_data = individual["mae"].copy()
bar_data[ensemble_label] = ensemble_row["mae"]
bar_df = bar_data.sort_values().reset_index()
bar_df.columns = ["Model", "MAE"]

p = plot_metrics_bar(
    bar_df,
    metric_col="MAE",
    model_col="Model",
    lower_is_better=True,
    title="Individual models vs Ensemble (Mean strategy) — MAE (lower is better)",
    x_label="MAE (mean across folds)",
    horizontal=True,
)
p

8. Strategy comparison table#

Aggregate the ensemble row from each strategy into a single comparison table.

from twiga.core.plot.gt import twiga_report


def ensemble_summary(metric_df, strategy_label):
    row = metric_df[metric_df["Model"] == ensemble_label][["mae", "rmse", "corr", "wmape", "smape"]].mean().round(3)
    return {
        "Strategy": strategy_label,
        "MAE": row["mae"],
        "RMSE": row["rmse"],
        "Corr": row["corr"],
        "WMAPE": row["wmape"],
        "SMAPE": row["smape"],
    }


comparison = pd.DataFrame(
    [
        ensemble_summary(metric_mean, "Mean"),
        ensemble_summary(metric_median, "Median"),
        ensemble_summary(metric_weighted, "Weighted"),
    ]
)

log.info("Ensemble strategy comparison:")
log.info("\n%s", comparison.to_string(index=False))
twiga_report(
    comparison.rename(columns={"Strategy": "Model"}),
    ["MAE", "RMSE", "Corr", "SMAPE"],
    minimize_cols=["MAE", "RMSE", "SMAPE"],
    maximize_cols=["Corr"],
)

9. Forecast plot: ensemble vs best individual model#

Visual comparison over the first 7 days of the test set.

n_steps = 7 * 48  # 7 days at 30-min resolution

# Best individual model by MAE
best_model = individual["mae"].idxmin()

df_ensemble = pred_mean[pred_mean["Model"] == ensemble_label].iloc[:n_steps].copy()
df_ensemble["Model"] = "ENSEMBLE"
df_best = pred_mean[pred_mean["Model"] == best_model].iloc[:n_steps].copy()
df_best["Model"] = f"Best: {best_model}"

combined_plot = pd.concat([df_ensemble, df_best], ignore_index=True)

p = plot_forecast_grid(
    combined_plot,
    actual_col="Actual",
    forecast_col="forecast",
    model_col="Model",
    n_samples_per_model=n_steps,
    title="Ensemble vs best individual model — first 7 days of test set",
    y_label="Net Load (kW)",
)
p

10. When ensembles help vs don’t#

Ensembles help when:

  • Models have diverse error patterns (e.g. ML + NN combined)

  • Individual models are accurate but not on the same days

  • You want robustness without hyperparameter tuning a single model

Ensembles don’t help much when:

  • All models overfit in the same way (same family, same features)

  • All models are underfitting - averaging underfits does not help

  • Models make identical errors (pairwise error correlation > 0.9)

Quick error correlation check#

Pivot per-model errors and compute pairwise correlation. High correlation (> 0.9) between two models means combining them adds little.

# Compute per-model absolute errors relative to actuals
indiv_preds = pred_mean[pred_mean["Model"] != ensemble_label].copy()
indiv_preds["error"] = (indiv_preds["forecast"] - indiv_preds["Actual"]).abs()

# Reset index to get timestamp as column, then pivot
indiv_reset = indiv_preds.reset_index()
ts_col = indiv_reset.columns[0]  # first column is the timestamp index
pivot = indiv_reset.pivot_table(index=ts_col, columns="Model", values="error")

log.info("Pairwise error correlation (individual models):")
log.info("\n%s", pivot.corr().round(2).to_string())

Error correlation heatmap#

from twiga.core.plot import correlation_heatmap

corr_matrix = pivot.corr()

p = correlation_heatmap(
    corr_matrix,
    title="Pairwise absolute error correlation\n(lower = more diverse = more ensemble benefit)",
)
p

11. Mixing ML and NN#

Tree-based ML models and NN models have complementary inductive biases - a key reason this combination produces lower-correlated errors than within-family ensembles.

from great_tables import GT, md
import pandas as pd

from twiga.core.plot.gt import twiga_gt

model_family_df = pd.DataFrame(
    {
        "Model family": ["LightGBM / XGBoost / CatBoost", "MLPF / MLPGAM", "Linear Regression"],
        "Inductive bias": [
            "Axis-aligned tree splits, tabular feature interactions",
            "Smooth distributed representations, sequential temporal patterns",
            "Global linear relationship, interpretable coefficients",
        ],
        "Strength": [
            "Non-monotonic splits, fast training, handles missing values natively",
            "Learns temporal structure and latent embeddings across horizons",
            "Fast, interpretable baseline; regularises ensemble toward linear fit",
        ],
        "Diversity vs. others": ["High (vs NN)", "High (vs ML)", "Medium"],
    }
)

twiga_gt(
    GT(model_family_df)
    .tab_header(
        title=md("**ML vs NN — Inductive Bias & Ensemble Diversity**"),
        subtitle="Different learning mechanisms → lower error correlation → more ensemble benefit",
    )
    .cols_label(**{c: md(f"**{c}**") for c in model_family_df.columns})
    .tab_source_note("With max_epochs=5, NN models may underfit; increase to 50–100 for production"),
    n_rows=len(model_family_df),
)

12. Strategy comparison: twiga_gt summary#

from great_tables import GT, md
import pandas as pd

from twiga.core.plot.gt import twiga_gt

strategy_guide_df = pd.DataFrame(
    {
        "Strategy": ["`mean`", "`median`", "`weighted`"],
        "Behaviour": [
            "Equal-weight average of all model predictions",
            "Median across models — robust to outlier predictions",
            "Validation-MAE-proportional weights; better models contribute more",
        ],
        "Use when": [
            "All models are well-calibrated and errors are roughly independent",
            "One or more models may occasionally spike — robustness is more important than optimality",
            "Models differ significantly in quality and validation set is representative",
        ],
        "Pitfall": [
            "One bad model drags the average down",
            "Ignores all available quality information",
            "Miscalibrated if validation period is unrepresentative of test",
        ],
    }
)

twiga_gt(
    GT(strategy_guide_df)
    .tab_header(
        title=md("**Ensemble Strategy Selection Guide**"),
        subtitle="Choose based on model quality spread and robustness requirements",
    )
    .cols_label(**{c: md(f"**{c}**") for c in strategy_guide_df.columns})
    .tab_source_note("Twiga Forecast · evaluate_point_forecast(ensemble_strategy=...)"),
    n_rows=len(strategy_guide_df),
)

Wrapping up#

What you did

  • Registered 6 models (LinearReg, LightGBM, XGBoost, CatBoost, MLPF, MLPGAM) in one TwigaForecaster

  • Applied mean, median, and weighted ensemble strategies in a single evaluate_point_forecast() call

  • Compared ensemble vs. best individual model with a styled metric table and bar chart

  • Visualised forecast plots to see where the ensemble outperforms individual models

  • Diagnosed ensemble benefit by computing pairwise error correlations and a heatmap

  • Reviewed inductive-bias differences between ML and NN model families

Key takeaways

  1. Diversity beats quantity - two uncorrelated models ensemble better than five correlated ones. Check pairwise error correlation before adding more models.

  2. Mixing ML and NN is the most reliable diversity strategy: tree splits vs. smooth learned representations produce complementary errors.

  3. Mean is the safest default; switch to weighted only when models differ substantially in quality and your validation set is representative.

  4. Median is preferable when you suspect occasional catastrophic failures in one model - it ignores extreme values.

  5. If pairwise error correlation is > 0.9 for all pairs, tuning a single model (NB10) will outperform any ensemble strategy.


What’s next?#

Notebook 12 shows how to register a completely custom model class - including any sklearn-compatible estimator - in the Twiga model registry, giving you full control over the model architecture while keeping the rest of the pipeline unchanged.

# ruff: noqa: E501, E701, E702
from IPython.display import HTML

_TEAL = "#107591"
_TEAL_MID = "#069fac"
_TEAL_LIGHT = "#e8f5f8"
_TEAL_BEST = "#d0ecf1"
_TEXT_DARK = "#2d3748"
_TEXT_MUTED = "#718096"
_WHITE = "#ffffff"

steps = [
    {
        "num": "06",
        "title": "Backtesting & Evaluation",
        "desc": "Rolling-window backtesting · fold-level metrics",
        "tags": ["backtesting", "evaluation"],
        "active": False,
    },
    {
        "num": "10",
        "title": "Conformal Prediction",
        "desc": "CQR · CRC — coverage-guaranteed intervals",
        "tags": ["conformal", "CQR"],
        "active": False,
    },
    {
        "num": "11",
        "title": "Hyperparameter Tuning",
        "desc": "Optuna TPE · search spaces · resumable SQLite",
        "tags": ["optuna", "HPO", "tuning"],
        "active": False,
    },
    {
        "num": "12",
        "title": "Ensemble Strategies",
        "desc": "Mean · median · inverse-error weighted — reduce forecast variance",
        "tags": ["ensemble", "weighted", "aggregation"],
        "active": True,
    },
    {
        "num": "13",
        "title": "Custom Models",
        "desc": "Register your own model class in the Twiga registry",
        "tags": ["custom", "registry", "sklearn"],
        "active": False,
    },
]
track_name = "Advanced Track"
footer = 'Next: extend the library further with <span style="color:#107591;font-weight:600;">Custom Models</span> (13) or add explainability with <span style="color:#107591;font-weight:600;">SHAP Attribution</span> (16).'


def _b(t, bg, fg):
    return f'<span style="display:inline-block;background:{bg};color:{fg};font-size:10px;font-weight:600;padding:2px 7px;border-radius:10px;margin:2px 2px 0 0;">{t}</span>'


ch = ""
for i, s in enumerate(steps):
    a = s["active"]
    cb = _TEAL if a else _WHITE
    cbo = _TEAL if a else "#d1ecf1"
    nb = _TEAL_MID if a else _TEAL_LIGHT
    nf = _WHITE if a else _TEAL
    tf = _WHITE if a else _TEXT_DARK
    df = "#cce8ef" if a else _TEXT_MUTED
    bb = "#0d5f75" if a else _TEAL_BEST
    bf = "#b8e4ed" if a else _TEAL
    yh = (
        f'<span style="float:right;background:{_TEAL_MID};color:{_WHITE};font-size:10px;font-weight:700;padding:2px 10px;border-radius:12px;">★ you are here</span>'
        if a
        else ""
    )
    bdg = "".join(_b(t, bb, bf) for t in s["tags"])
    ch += f'<div style="background:{cb};border:2px solid {cbo};border-radius:12px;padding:16px 20px;display:flex;align-items:flex-start;gap:16px;box-shadow:{"0 4px 14px rgba(16,117,145,.25)" if a else "0 1px 4px rgba(0,0,0,.06)"};"><div style="min-width:44px;height:44px;background:{nb};color:{nf};border-radius:50%;display:flex;align-items:center;justify-content:center;font-size:15px;font-weight:800;flex-shrink:0;">{s["num"]}</div><div style="flex:1;"><div style="font-size:15px;font-weight:700;color:{tf};margin-bottom:4px;">{s["title"]}{yh}</div><div style="font-size:12.5px;color:{df};margin-bottom:8px;line-height:1.5;">{s["desc"]}</div><div>{bdg}</div></div></div>'
    if i < len(steps) - 1:
        ch += f'<div style="display:flex;justify-content:center;height:32px;"><svg width="24" height="32" viewBox="0 0 24 32" fill="none"><line x1="12" y1="0" x2="12" y2="24" stroke="{_TEAL_MID}" stroke-width="2" stroke-dasharray="4 3"/><polygon points="6,20 18,20 12,30" fill="{_TEAL_MID}"/></svg></div>'

HTML(
    f'<div style="font-family:Inter,\'Segoe UI\',sans-serif;max-width:640px;margin:8px 0;"><div style="background:linear-gradient(135deg,{_TEAL} 0%,{_TEAL_MID} 100%);border-radius:12px 12px 0 0;padding:14px 20px;display:flex;align-items:center;gap:10px;"><svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="{_WHITE}" stroke-width="2"><path d="M12 2L2 7l10 5 10-5-10-5z"/><path d="M2 17l10 5 10-5"/><path d="M2 12l10 5 10-5"/></svg><span style="color:{_WHITE};font-size:14px;font-weight:700;">Twiga Learning Path — {track_name}</span></div><div style="border:2px solid {_TEAL_LIGHT};border-top:none;border-radius:0 0 12px 12px;padding:20px 20px 16px;background:#f9fdfe;display:flex;flex-direction:column;">{ch}<div style="margin-top:16px;font-size:11.5px;color:{_TEXT_MUTED};text-align:center;border-top:1px solid {_TEAL_LIGHT};padding-top:12px;">{footer}</div></div></div>'
)