Neural Network Models#

What you’ll build

Two neural network forecasters - a multi-layer perceptron (MLPF) and N-HiTS - trained with PyTorch Lightning on the MLVS-PT net-load dataset, benchmarked against the LightGBM baseline from NB05, and compared on MAE and training speed.

Prerequisites

01 - Getting Started (DataPipelineConfig, ForecasterConfig, TwigaForecaster.fit)
03 - Feature Engineering (understanding the (B, L, F) tensor)
05 - ML Point Forecasting (ML baseline to beat)
06 - Backtesting & Evaluation (metric interpretation)
Python: basic PyTorch awareness (not required to write any)

Learning objectives

By the end of this notebook you will be able to:

Explain when neural networks outperform gradient-boosted trees and when they do not
Configure MLPFConfig and NHiTSConfig including embedding types and sequence dimensions
Train a neural network forecaster with PyTorch Lightning using early stopping
Compare neural and ML model metrics fairly using a shared DataPipelineConfig
Interpret training curves and understand what 5-epoch results mean vs. fully converged results

Key concept - why neural networks?

Gradient boosting (LightGBM, XGBoost, CatBoost) is hard to beat on tabular data with hand-crafted features. Neural networks earn their keep when:

The dataset is large - NNs scale better with data volume; tree models plateau.

Raw sequences matter - NNs can learn from the full look-back window without manual lag selection.

You need probabilistic outputs - distribution heads (NB07 - 08) attach naturally to NN backbones.

Transfer learning is on the table - pretrained NN weights can be fine-tuned on new sites.

With only max_epochs=5 in this tutorial, LightGBM will likely win. That is intentional - it illustrates the training-budget trade-off. With ≥ 50 epochs and a proper learning-rate schedule the NN models typically match or surpass tree-based models on this dataset.

1. Setup#

import os
import warnings

from great_tables import GT, md
from IPython.display import clear_output
from lets_plot import LetsPlot
import pandas as pd

LetsPlot.setup_html()

from twiga.core.plot import (
    plot_forecast,
    plot_forecast_grid,
    plot_metrics_bar,
)
from twiga.core.plot.gt import twiga_gt, twiga_report
from twiga.core.utils import configure, get_logger

warnings.filterwarnings("ignore")

configure()
log = get_logger("tutorials")

Load data#

The dataset covers Madeira, Portugal (32.37°N, 16.27°W) at 30-minute resolution. We only load the columns we need.

data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)
# Restrict to 2019-2020 to keep tutorial execution fast
data = data[(data["timestamp"] >= "2019-01-01") & (data["timestamp"] <= "2020-12-31")].reset_index(drop=True)

log.info("Shape: %s", data.shape)
twiga_gt(GT(data.head().round(2)))

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 1
----> 1 data = pd.read_parquet("../data/MLVS-PT.parquet")
data = data[["timestamp", "NetLoad(kW)", "Ghi", "Temperature"]]
data["timestamp"] = pd.to_datetime(data["timestamp"])
data = data.drop_duplicates(subset="timestamp").reset_index(drop=True)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
   use_nullable_dtypes = False
check_dtype_backend(dtype_backend)
--> 669 return impl.read(
   path,
   columns=columns,
   filters=filters,
   storage_options=storage_options,
   use_nullable_dtypes=use_nullable_dtypes,
   dtype_backend=dtype_backend,
   filesystem=filesystem,
   **kwargs,
)

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:258, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
if manager == "array":
   to_pandas_kwargs["split_blocks"] = True
--> 258 path_or_handle, handles, filesystem = _get_path_or_handle(
   path,
   filesystem,
   storage_options=storage_options,
   mode="rb",
)
try:
   pa_table = self.api.parquet.read_table(
       path_or_handle,
       columns=columns,
   (...)    270         **kwargs,
   )

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/parquet.py:141, in _get_path_or_handle(path, fs, storage_options, mode, is_dir)
handles = None
if (
   not fs
   and not is_dir
   (...)    139     # fsspec resources can also point to directories
   # this branch is used for example when reading from non-fsspec URLs
--> 141     handles = get_handle(
       path_or_handle, mode, is_text=False, storage_options=storage_options
   )
   fs = None
   path_or_handle = handles.handle

File ~/work/twiga-forecast/twiga-forecast/.venv/lib/python3.12/site-packages/pandas/io/common.py:882, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
       handle = open(
           handle,
           ioargs.mode,
   (...)    878             newline="",
       )
   else:
       # Binary mode
--> 882         handle = open(handle, ioargs.mode)
   handles.append(handle)
# Convert BytesIO or file objects passed with an encoding

FileNotFoundError: [Errno 2] No such file or directory: '../data/MLVS-PT.parquet'

Train / val / test splits#

splits_df = pd.DataFrame(
    {
        "Split": ["Train", "Validation", "Test"],
        "Period": ["before 2020-01-01", "2020-01-01-2020-06-30", "2020-07-01 onwards"],
        "Purpose": ["Model learning", "Early-stopping / overfitting guard", "Final honest evaluation"],
    }
)

twiga_gt(
    GT(splits_df)
    .tab_header(title=md("**Data Splits**"), subtitle="Chronological - no shuffling, no overlap")
    .cols_label(**{c: md(f"**{c}**") for c in splits_df.columns})
    .tab_source_note("Twiga Forecast"),
    n_rows=len(splits_df),
)

train_df = data[data["timestamp"] < "2020-01-01"].reset_index(drop=True)
val_df = data[(data["timestamp"] >= "2020-01-01") & (data["timestamp"] < "2020-07-01")].reset_index(drop=True)
test_df = data[data["timestamp"] >= "2020-07-01"].reset_index(drop=True)

log.info(
    f"train : {train_df.shape[0]:,} rows  ({train_df['timestamp'].min().date()} → {train_df['timestamp'].max().date()})"
)
log.info(f"val   : {val_df.shape[0]:,} rows  ({val_df['timestamp'].min().date()} → {val_df['timestamp'].max().date()})")
log.info(
    f"test  : {test_df.shape[0]:,} rows  ({test_df['timestamp'].min().date()} → {test_df['timestamp'].max().date()})"
)

2. Data config#

DataPipelineConfig is identical to previous notebooks - same target, resolution, location, features, and horizon. This ensures that all models are evaluated on the exact same problem setup.

from sklearn.preprocessing import RobustScaler, StandardScaler

from twiga.core.config import DataPipelineConfig, ForecasterConfig

data_config = DataPipelineConfig(
    target_feature="NetLoad(kW)",
    period="30min",
    latitude=32.371666,
    longitude=-16.274998,
    calendar_features=["hour", "day_night"],
    exogenous_features=["Ghi"],
    forecast_horizon=48,
    stride=48,
    lookback_window_size=48,
    input_scaler=StandardScaler(),
    target_scaler=RobustScaler(),
)

train_config = ForecasterConfig(project_name="neural-network-tutorial")

data_config

3. NN Config Dimensions: Auto-populated#

ML model configs (e.g. LIGHTGBMConfig) need no knowledge of input shape - the library infers it at training time.

NN configs are different. A neural network must know:

num_target_feature: number of target variables to forecast
forecast_horizon: how many steps ahead to predict
lookback_window_size: length of the input sequence
num_historical_features, num_calendar_features, num_exogenous_features, num_future_covariates

In the current API, all of these default to 0 and are auto-populated by TwigaForecaster from DataPipelineConfig.
You simply construct the config with any training hyperparameters you want to override:

from twiga.models.nn import MLPFConfig

# Dims are filled automatically - just set training knobs
mlpf_config = MLPFConfig(max_epochs=5, rich_progress_bar=False)

The legacy MLPFConfig.from_data_config(data_config) class method still works and is useful when you need a standalone config object outside a TwigaForecaster (e.g., inspection or debugging).

Key concept - sequence embedding and the (B, L, F) tensor

Every Twiga NN model receives inputs as a 3-D tensor of shape (B, L, F):

Axis

Meaning

Example (this notebook)

B

Batch size - number of windows processed simultaneously

32 windows

L

Lookback length - time steps in the input sequence

96 steps = 48 h

F

Feature count - target + calendar + exogenous features per step

~5 features

Before entering the MLP encoder, each time step can optionally be projected into a richer latent space via a value embedding (e.g. LinearEmb applies a shared linear layer across all L steps), and positional information is injected via a positional embedding (e.g. LearnPosEmb adds a trained vector to each position). Section 8 of this notebook shows how to toggle these with config knobs.

4. MLPF: Plain MLP backbone#

MLPF (MLP Fusion) is the baseline neural architecture. It encodes past and future covariates with separate MLP branches and fuses them via attention, weighted sum, or addition before predicting the horizon.

Key config knob: combination_type ("attn-comb" | "weighted-comb" | "addition-comb")

Key concept - Lightning training loop

Twiga NN models are trained with PyTorch Lightning, which manages the boilerplate (device placement, gradient steps, logging) so you only set high-level knobs:

max_epochs - how many full passes over the training set. More epochs = more learning time, but also more risk of overfitting. In production, 50 - 200 epochs is typical.

Early stopping - Lightning monitors the validation loss after each epoch. If it does not improve for patience consecutive epochs the run terminates early, saving the best checkpoint automatically.

Checkpoint - the weights at the epoch with the lowest validation loss are saved to checkpoints/<project_name>/<model>/best_*.ckpt. You can reload them at any time (Section 9).

rich_progress_bar=False - we disable the progress bar here to keep notebook output clean. Set it to True to watch loss values epoch-by-epoch during development.

from twiga import TwigaForecaster
from twiga.models.nn import MLPFConfig

mlpf_config = MLPFConfig(max_epochs=5, rich_progress_bar=False)

forecaster_mlpf = TwigaForecaster(
    data_params=data_config,
    model_params=[mlpf_config],
    train_params=train_config,
)
forecaster_mlpf.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_mlpf, metric_mlpf = forecaster_mlpf.evaluate_point_forecast(test_df=test_df)
log.info("MLPF-mean metrics across folds:")

def get_metric_table(metric_df):
    res = metric_df.groupby("Model")[["mae", "corr", "nbias", "rmse", "wmape", "smape"]].mean().round(2).reset_index()
    res = res.rename(
        columns={"mae": "MAE", "corr": "Corr", "wmape": "WMAPE", "smape": "SMAPE", "nbias": "NBIAS", "rmse": "RMSE"}
    )

    metric_name = ["MAE", "Corr", "SMAPE", "RMSE"]
    minimize_cols = ["MAE", "SMAPE", "RMSE"]
    maximize_cols = ["Corr"]

    return twiga_report(res, metric_name, minimize_cols, maximize_cols)

get_metric_table(metric_mlpf)

Reading the MLPF metrics

With max_epochs=5, MLPF is severely under-trained. A Pearson correlation near 0 means the forecast is almost uncorrelated with the actuals - the model has not yet learned the daily cycle. This is expected at 5 epochs and will improve significantly with more training. Use it as a baseline lower bound, not a performance ceiling.

5. MLPGAM: MLP + Group Additive Model#

MLPGAM augments the MLP backbone with a Additive Model (GAM) branch. The GAM branch learns per-feature additive effects (similar in spirit to classical GAMs), which are then combined with the MLP’s global representation. This often improves interpretability and generalisation on structured tabular-temporal data.

The key addition is a Lasso penalty on the final projection weights, encouraging sparse feature selection.

from twiga.models.nn import MLPGAMConfig

mlpgam_config = MLPGAMConfig(max_epochs=5, rich_progress_bar=False)

forecaster_mlpgam = TwigaForecaster(
    data_params=data_config,
    model_params=[mlpgam_config],
    train_params=train_config,
)
forecaster_mlpgam.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_mlpgam, metric_mlpgam = forecaster_mlpgam.evaluate_point_forecast(test_df=test_df)
clear_output()

get_metric_table(metric_mlpgam)

6. MLPGAF: MLP + Group Additive Neural Forecast#

MLPGAF replaces the simple combination step with a Gated Attention Fusion (GAF) mechanism. A learned gate decides, for each position and feature group, how much weight to give to each input stream. This can capture non-linear inter-feature interactions that the plain MLP fusion misses.

from twiga.models.nn import MLPGAFConfig

mlpgaf_config = MLPGAFConfig(max_epochs=5, rich_progress_bar=False)

forecaster_mlpgaf = TwigaForecaster(
    data_params=data_config,
    model_params=[mlpgaf_config],
    train_params=train_config,
)
forecaster_mlpgaf.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_mlpgaf, metric_mlpgaf = forecaster_mlpgaf.evaluate_point_forecast(test_df=test_df)
clear_output()
get_metric_table(metric_mlpgaf)

7. N-HiTS: Hierarchical interpolation#

N-HiTS (Neural Hierarchical Interpolation for Time Series) decomposes the forecast horizon into multiple scales using a stack of MLP blocks, each operating at a different temporal resolution. Long-range trends are captured by blocks with low sampling rates; short-range patterns by blocks with high sampling rates. The outputs are summed (residual connections) to produce the final forecast.

from twiga.models.nn import NHITSConfig

nhits_config = NHITSConfig(max_epochs=5, rich_progress_bar=False)

forecaster_nhits = TwigaForecaster(
    data_params=data_config,
    model_params=[nhits_config],
    train_params=train_config,
)
forecaster_nhits.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_nhits, metric_nhits = forecaster_nhits.evaluate_point_forecast(test_df=test_df)
clear_output()

get_metric_table(metric_nhits)

Reading the N-HiTS metrics

N-HiTS typically converges faster than plain MLPF because its multi-scale decomposition provides an implicit curriculum: coarse blocks learn long-range trends early while fine-grained blocks refine short-term patterns. Even at 5 epochs you should see a non-trivial correlation and an MAE well below the naive mean forecast. MLPGAM and N-HiTS are the recommended starting architectures for new energy datasets.

RNN#

from twiga.models.nn import RNNConfig

rnn_config = RNNConfig(max_epochs=5, rich_progress_bar=False)

forecaster_rnn = TwigaForecaster(
    data_params=data_config,
    model_params=[rnn_config],
    train_params=train_config,
)
forecaster_rnn.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_rnn, metric_rnn = forecaster_rnn.evaluate_point_forecast(test_df=test_df)
clear_output()

get_metric_table(metric_rnn)

9. Embedding options#

All MLP-based configs expose two embedding knobs that control how raw input features are represented before entering the network. These are config-level settings - no architecture changes required.

from great_tables import GT, md
import pandas as pd

from twiga.core.plot.gt import twiga_gt

value_emb_df = pd.DataFrame(
    {
        "value_embed_type": ['"LinearEmb"', '"ConvEmb"', '"PatchEmb"', "None"],
        "What it does": [
            "Simple linear projection per time step",
            "1-D convolution — captures local temporal patterns",
            "Splits the sequence into non-overlapping patches (requires patch_len and stride)",
            "No value embedding; raw features passed directly",
        ],
    }
)

pos_emb_df = pd.DataFrame(
    {
        "embedding_type": ['"LearnPosEmb"', '"RotaryEmb"', '"TimeEmb"', "None"],
        "What it does": [
            "Learnable positional encoding (trained end-to-end)",
            "Rotary positional encoding (RoPE) — encodes relative positions",
            "Projects calendar/time features into embedding space",
            "No positional encoding",
        ],
    }
)

print("Value embedding (value_embed_type)")
display(
    twiga_gt(
        GT(value_emb_df)
        .tab_header(title=md("**Value Embedding**"), subtitle="How raw features are projected before the MLP encoder")
        .cols_label(**{c: md(f"**{c}**") for c in value_emb_df.columns})
        .tab_source_note("Twiga Forecast"),
        n_rows=len(value_emb_df),
    )
)

print("Positional embedding (embedding_type)")
display(
    twiga_gt(
        GT(pos_emb_df)
        .tab_header(
            title=md("**Positional Embedding**"), subtitle="How position information is injected into the sequence"
        )
        .cols_label(**{c: md(f"**{c}**") for c in pos_emb_df.columns})
        .tab_source_note("Twiga Forecast"),
        n_rows=len(pos_emb_df),
    )
)

from twiga.models.nn import MLPGAMConfig

mlpgam_config_emb = MLPGAMConfig(
    max_epochs=5,
    rich_progress_bar=False,
    value_embed_type="LinearEmb",
    embedding_type="LearnPosEmb",
)

log.info("value_embed_type : %s", mlpgam_config_emb.value_embed_type)
log.info("embedding_type   : %s", mlpgam_config_emb.embedding_type)

train_config.project_name = "MLPGAF_Linear_emb"

forecaster_mlpgam_emb = TwigaForecaster(
    data_params=data_config,
    model_params=[mlpgam_config_emb],
    train_params=train_config,
)
forecaster_mlpgam_emb.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_mlpgam_emb, metric_mlpgam_emb = forecaster_mlpgam_emb.evaluate_point_forecast(test_df=test_df)
clear_output()

get_metric_table(metric_mlpgam_emb)

9. Checkpoint loading#

After training, Twiga saves a Lightning checkpoint automatically. You can reload it at any time - this is useful when you want to avoid retraining or when resuming a session.

forecaster_mlpf.model.load_checkpoint()
log.info("Model in eval mode: %s", not forecaster_mlpf.model.model.training)

10. Results comparison: all NN models#

Concatenate per-fold metrics from each model and display a unified comparison table.

# Assign human-readable model labels
metric_mlpgam_emb["Model"] = "MLPGAM+Emb"

all_metrics = pd.concat(
    [metric_mlpf, metric_mlpgam, metric_mlpgaf, metric_rnn, metric_nhits, metric_mlpgam_emb],
    ignore_index=True,
)


get_metric_table(all_metrics)

Forecast plot: first 7 days of test set#

all_preds = pd.concat([pred_mlpf, pred_mlpgam, pred_nhits, pred_mlpgaf, pred_rnn], ignore_index=True)

p = plot_forecast_grid(
    all_preds,
    actual_col="Actual",
    forecast_col="forecast",
    model_col="Model",
    n_samples_per_model=7 * 48,
    y_label="Net Load (kW)",
    title="NN point forecasts — first 7 days of test set (max_epochs=5)",
    fig_width=920,
    panel_height=400,
)
p

11. NN vs ML comparison#

With only 5 epochs, NNs may not outperform a well-tuned gradient boosting model. The cell below adds a LightGBM baseline so you can see the gap and understand how much more training the NNs would need.

With max_epochs >= 50 and a proper learning-rate schedule, the NN models typically match or surpass tree-based models on this dataset.

from twiga.models.ml import LIGHTGBMConfig

lgbm_config = LIGHTGBMConfig()

forecaster_lgbm = TwigaForecaster(
    data_params=data_config,
    model_params=[lgbm_config],
    train_params=train_config,
)
forecaster_lgbm.fit(train_df=train_df, val_df=val_df)
clear_output()

pred_lgbm, metric_lgbm = forecaster_lgbm.evaluate_point_forecast(test_df=test_df)
clear_output()

get_metric_table(metric_lgbm)

Wrapping up#

What you did

Understood when and why neural networks beat gradient boosting (and vice versa)
Learned the (B, L, F) tensor format consumed by all Twiga NN models
Trained four NN architectures - MLPF, MLPGAM, MLPGAF, N-HiTS - using the Lightning training loop
Configured value embeddings (LinearEmb, ConvEmb, PatchEmb) and positional embeddings (LearnPosEmb, RotaryEmb) via config knobs
Compared NN vs ML (LightGBM) performance on the same test data and understood the epoch-budget trade-off
Reloaded a saved Lightning checkpoint without retraining

Key takeaways

At low epoch counts, gradient boosting almost always wins - NNs need sufficient training time to beat tabular baselines.
MLPGAM and N-HiTS tend to converge faster than plain MLPF; they are the recommended starting architectures for energy data.
All NN dims (forecast_horizon, lookback_window_size, feature counts) are auto-populated by TwigaForecaster - you only need to set training hyperparameters in the config.
Lightning checkpointing is automatic: the best-validation-loss weights are saved and can be reloaded with .load_checkpoint().
Value and positional embeddings are orthogonal knobs - experiment with them independently before combining.

What’s next?#

07 - Quantile Regression - Add probabilistic outputs to your forecasts by training QR-LightGBM, QR-XGBoost, and FPQR models that produce calibrated prediction intervals instead of single point values.

# ruff: noqa: E501, E701, E702
from IPython.display import HTML

_TEAL = "#107591"
_TEAL_MID = "#069fac"
_TEAL_LIGHT = "#e8f5f8"
_TEAL_BEST = "#d0ecf1"
_TEXT_DARK = "#2d3748"
_TEXT_MUTED = "#718096"
_WHITE = "#ffffff"

steps = [
    {
        "num": "05",
        "title": "ML Point Forecasting",
        "desc": "CatBoost · XGBoost · LightGBM — ML baseline to beat",
        "tags": ["catboost", "xgboost", "lightgbm"],
        "active": False,
    },
    {
        "num": "06",
        "title": "Backtesting & Evaluation",
        "desc": "Rolling-window backtesting · fold-level metrics",
        "tags": ["backtesting", "evaluation"],
        "active": False,
    },
    {
        "num": "07",
        "title": "Neural Networks",
        "desc": "MLPF · N-HiTS · Lightning training · sequence embeddings",
        "tags": ["neural network", "pytorch", "lightning"],
        "active": True,
    },
    {
        "num": "08",
        "title": "Quantile Regression",
        "desc": "First probabilistic step — prediction intervals",
        "tags": ["probabilistic", "quantile", "intervals"],
        "active": False,
    },
    {
        "num": "09",
        "title": "Parametric Distributions",
        "desc": "Normal · Laplace · Gamma heads — NLL training",
        "tags": ["parametric", "NLL", "distributions"],
        "active": False,
    },
]
track_name = "Neural Network Track"
footer = 'Next: add uncertainty to your NN with <span style="color:#107591;font-weight:600;">Quantile Regression</span> (08) or <span style="color:#107591;font-weight:600;">Parametric Distributions</span> (09).'


def _b(t, bg, fg):
    return f'<span style="display:inline-block;background:{bg};color:{fg};font-size:10px;font-weight:600;padding:2px 7px;border-radius:10px;margin:2px 2px 0 0;">{t}</span>'


ch = ""
for i, s in enumerate(steps):
    a = s["active"]
    cb = _TEAL if a else _WHITE
    cbo = _TEAL if a else "#d1ecf1"
    nb = _TEAL_MID if a else _TEAL_LIGHT
    nf = _WHITE if a else _TEAL
    tf = _WHITE if a else _TEXT_DARK
    df = "#cce8ef" if a else _TEXT_MUTED
    bb = "#0d5f75" if a else _TEAL_BEST
    bf = "#b8e4ed" if a else _TEAL
    yh = (
        f'<span style="float:right;background:{_TEAL_MID};color:{_WHITE};font-size:10px;font-weight:700;padding:2px 10px;border-radius:12px;">★ you are here</span>'
        if a
        else ""
    )
    bdg = "".join(_b(t, bb, bf) for t in s["tags"])
    ch += f'<div style="background:{cb};border:2px solid {cbo};border-radius:12px;padding:16px 20px;display:flex;align-items:flex-start;gap:16px;box-shadow:{"0 4px 14px rgba(16,117,145,.25)" if a else "0 1px 4px rgba(0,0,0,.06)"};"><div style="min-width:44px;height:44px;background:{nb};color:{nf};border-radius:50%;display:flex;align-items:center;justify-content:center;font-size:15px;font-weight:800;flex-shrink:0;">{s["num"]}</div><div style="flex:1;"><div style="font-size:15px;font-weight:700;color:{tf};margin-bottom:4px;">{s["title"]}{yh}</div><div style="font-size:12.5px;color:{df};margin-bottom:8px;line-height:1.5;">{s["desc"]}</div><div>{bdg}</div></div></div>'
    if i < len(steps) - 1:
        ch += f'<div style="display:flex;justify-content:center;height:32px;"><svg width="24" height="32" viewBox="0 0 24 32" fill="none"><line x1="12" y1="0" x2="12" y2="24" stroke="{_TEAL_MID}" stroke-width="2" stroke-dasharray="4 3"/><polygon points="6,20 18,20 12,30" fill="{_TEAL_MID}"/></svg></div>'

HTML(
    f'<div style="font-family:Inter,\'Segoe UI\',sans-serif;max-width:640px;margin:8px 0;"><div style="background:linear-gradient(135deg,{_TEAL} 0%,{_TEAL_MID} 100%);border-radius:12px 12px 0 0;padding:14px 20px;display:flex;align-items:center;gap:10px;"><svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="{_WHITE}" stroke-width="2"><path d="M12 2L2 7l10 5 10-5-10-5z"/><path d="M2 17l10 5 10-5"/><path d="M2 12l10 5 10-5"/></svg><span style="color:{_WHITE};font-size:14px;font-weight:700;">Twiga Learning Path — {track_name}</span></div><div style="border:2px solid {_TEAL_LIGHT};border-top:none;border-radius:0 0 12px 12px;padding:20px 20px 16px;background:#f9fdfe;display:flex;flex-direction:column;">{ch}<div style="margin-top:16px;font-size:11.5px;color:{_TEXT_MUTED};text-align:center;border-top:1px solid {_TEAL_LIGHT};padding-top:12px;">{footer}</div></div></div>'
)

Axis	Meaning	Example (this notebook)
B	Batch size - number of windows processed simultaneously	32 windows
L	Lookback length - time steps in the input sequence	96 steps = 48 h
F	Feature count - target + calendar + exogenous features per step	~5 features