Backtesting & Time-Based Cross-Validation#
Source Files
twiga/core/backtester.pytwiga/forecaster/core.py
Standard k-fold cross-validation does not work for time series because it breaks temporal ordering - a model could train on future data and predict the past. Twiga implements time-based cross-validation through the TimeBasedCV class, which generates chronologically ordered train/test splits.
How It Works#
graph TD
subgraph "Rolling Window Strategy"
A["Fold 1: Train [t0..t3] → Test [t3..t4]"]
B["Fold 2: Train [t1..t4] → Test [t4..t5]"]
C["Fold 3: Train [t2..t5] → Test [t5..t6]"]
end
subgraph "Expanding Window Strategy"
D["Fold 1: Train [t0..t3] → Test [t3..t4]"]
E["Fold 2: Train [t0..t4] → Test [t4..t5]"]
F["Fold 3: Train [t0..t5] → Test [t5..t6]"]
end
Rolling window: Training window has a fixed size and slides forward each fold
Expanding window: Training window starts from the beginning and grows each fold
Class Hierarchy#
classDiagram
class TimeBasedSplit {
<<abstract>>
+split_freq: str
+train_size: int
+test_size: int
+gap: int
+stride: int
+window: str
+train_delta: relativedelta
+forecast_delta: relativedelta
+gap_delta: relativedelta
+stride_delta: relativedelta
#_splits_from_period()
+split()*
}
class TimeBasedCV {
+date_column: str
+num_splits: int
+split(data, start_dt, end_dt)
+get_splits(data)
+set_split_scheme()
+get_scheme()
+plot_split_scheme()
}
class SplitState {
+train_start: Timestamp
+train_end: Timestamp
+forecast_start: Timestamp
+forecast_end: Timestamp
}
TimeBasedSplit <|-- TimeBasedCV
TimeBasedSplit ..> SplitState : creates
Configuration#
Backtesting behavior is controlled by ForecasterConfig parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Unit for train/test/gap/stride sizes |
|
|
|
Training window length (in |
|
|
|
Test window length (in |
|
|
|
Gap between training end and test start |
|
|
|
Step size between folds (defaults to |
|
|
|
Window strategy |
|
|
|
Maximum number of splits (None = all possible) |
Example Configuration#
from twiga.core.config import ForecasterConfig
# Monthly backtesting: 6 months train, 1 month test, expanding window
config = ForecasterConfig(
split_freq="months",
train_size=6,
test_size=1,
gap=0,
window="expanding",
)
# Daily backtesting: 14 days train, 7 days test, rolling window
config = ForecasterConfig(
split_freq="days",
train_size=14,
test_size=7,
gap=0,
window="rolling",
stride=7, # move 7 days between folds
)
Using TimeBasedCV Directly#
The TimeBasedCV class can be used independently for custom splitting logic:
from twiga.core.backtester import TimeBasedCV
cv = TimeBasedCV(
split_freq="months",
train_size=6,
test_size=1,
gap=0,
window="expanding",
date_column="timestamp",
)
for train_df, test_df, scheme, fold_idx in cv.split(data):
print(f"Fold {fold_idx + 1}:")
print(f" Train: {scheme['train_period'][0]} to {scheme['train_period'][1]}")
print(f" Test: {scheme['test_period'][0]} to {scheme['test_period'][1]}")
The split() method yields tuples of (train_df, test_df, scheme_dict, fold_index) where scheme_dict contains:
{
"train_idx": np.ndarray, # indices into original DataFrame
"test_idx": np.ndarray,
"train_period": (start_dt, end_dt),
"test_period": (start_dt, end_dt),
}
Backtesting with TwigaForecaster#
The TwigaForecaster.backtesting() method runs the full train → evaluate cycle over each fold:
from twiga.core.config import DataPipelineConfig, ForecasterConfig
from twiga.forecaster.core import TwigaForecaster
from twiga.models.ml.xgboost_model import XGBOOSTConfig
data_config = DataPipelineConfig(
target_feature="load_mw",
period="1h",
lookback_window_size=168,
forecast_horizon=48,
)
train_config = ForecasterConfig(
split_freq="months",
train_size=3,
test_size=1,
window="expanding",
)
forecaster = TwigaForecaster(
data_params=data_config,
model_params=[XGBOOSTConfig()],
train_params=train_config,
)
predictions_df, metrics_df = forecaster.backtesting(
data=full_dataset,
train_ratio=1.0,
verbose=True,
ensemble_strategy="mean",
)
What Happens per Fold#
For each fold, backtesting():
Calls
self.fit(train_df)- fits the data pipeline and all modelsCalls
self.evaluate_point_forecast(test_df)- generates predictions and computes metricsAdds a
Foldscolumn to track which fold produced each resultConcatenates all results across folds
sequenceDiagram
participant B as backtesting()
participant CV as TimeBasedCV.split()
participant F as fit()
participant E as evaluate_point_forecast()
B->>CV: Generate train/test splits
loop For each fold
CV-->>B: (train_df, test_df, scheme, fold_idx)
B->>F: fit(train_df)
F-->>B: Models trained
B->>E: evaluate_point_forecast(test_df)
E-->>B: (predictions_df, metrics_df)
B->>B: Append fold results
end
B-->>B: pd.concat(all_predictions), pd.concat(all_metrics)
Aggregating Results#
# Average metrics across folds
avg_metrics = metrics_df.groupby("Model")[["mae", "rmse", "smape"]].mean().round(3)
# Metrics per fold
fold_metrics = metrics_df.groupby(["Model", "Folds"])[["mae", "rmse"]].mean()
SplitState#
The SplitState class holds the time boundaries for a single split:
class SplitState:
train_start: pd.Timestamp
train_end: pd.Timestamp
forecast_start: pd.Timestamp
forecast_end: pd.Timestamp
The gap between train_end and forecast_start is controlled by the gap parameter.
Visualizing Splits#
TimeBasedCV provides a built-in visualization method (requires the plots dependency group):
cv.plot_split_scheme(data, train_ratio=1.0)
API Reference#
- class twiga.core.backtester.TimeBasedCV(split_freq, test_size, train_size=None, gap=0, stride=None, window='rolling', date_column='timestamp', num_splits=None)#
Bases:
TimeBasedSplitConcrete time-based cross-validation implementation for pandas DataFrames.
This class creates splits based on a datetime column and returns train/test indices, along with the corresponding time periods.
- Variables:
- duration_in_units(start, end, split_freq)#
Compute the duration between start and end in the specified split_freq units.
For ‘days’, ‘minutes’, ‘hours’, and ‘weeks’, a simple conversion based on timedelta is used. For ‘months’ and ‘years’, relativedelta is used to account for variable lengths.
- Parameters:
- Return type:
- Returns:
int – Duration in the specified units.
- Raises:
ValueError – If split_freq is unsupported.
- get_scheme()#
Return the current split configuration.
- Return type:
- Returns:
dict – A dictionary containing the train/test split indices and periods.
- Raises:
ValueError – If the split scheme has not been initialized.
- plot_split_scheme(data=None, train_ratio=1.0, start_dt=None, end_dt=None, title='Cross-validation split scheme', colors=None, alpha=0.88, x_ticks=6, font_size=10, line_width=0.8, x_axis_angle=30, legend_pos='top')#
Visualize the time series cross-validation split scheme.
Renders a Gantt-style plot with one horizontal bar per fold, colour-coded by segment (Train / Val / Test), styled with the Twiga theme.
- Parameters:
data (
DataFrame|None) – Input DataFrame containing temporal data. Used to derive the split scheme when it has not been pre-computed.train_ratio (
float) – Proportion of training indices used for training; the remainder becomes a validation segment.start_dt (
Timestamp|None) – Optional start timestamp passed toset_split_scheme.end_dt (
Timestamp|None) – Optional end timestamp passed toset_split_scheme.title (
str) – Plot title.colors (
dict[str,str] |None) – Custom colour mapping for segments. Keys must be title-case:"Train","Val","Test".alpha (
float) – Bar transparency (0–1).x_ticks (
int) – Number of date ticks on the x-axis.font_size (
int) – Base font size in points.line_width (
float) – Axis line stroke width.x_axis_angle (
int) – Rotation angle for x-axis tick labels.legend_pos (
str) – Legend position -"top","bottom","left","right", or"none".
- Returns:
A Lets-Plot
ggplotobject.- Raises:
ValueError – If
train_ratiois outside [0, 1] or the split scheme is not initialised and nodatais provided.
Example
>>> splitter = TimeBasedCV(split_freq="days", test_size=5, train_size=20, date_column="date") >>> splitter.set_split_scheme(data["date"]) >>> splitter.plot_split_scheme(data, train_ratio=0.8, title="CV Scheme")
- set_split_scheme(time_values, start_dt=None, end_dt=None)#
Calculate split indices from time series data.
The method sorts the datetime series, determines the time range to use, and computes indices for training and forecast periods based on the provided parameters. If num_splits is set, it adjusts train_size accordingly.
- split(data, start_dt=None, end_dt=None)#
Generate validated train/test splits.
- Parameters:
- Yields:
tuple – (train_df, test_df, scheme, split_key).
- Raises:
ValueError – If the required date column is missing or if the computed indices exceed data bounds.
- class twiga.core.backtester.TimeBasedSplit(split_freq, train_size, test_size, gap=0, stride=None, window='rolling')#
Bases:
ABCAbstract base class implementing core time-based splitting logic.
This class validates split parameters and provides properties to compute time deltas for the training period, forecast period, gap, and stride.
- Variables:
split_freq (str) – Time unit for splits (e.g., ‘days’, ‘months’).
train_size (int) – Training period length in split_freq units.
test_size (int) – Forecast period length in split_freq units.
gap (int) – Gap between train and forecast periods.
stride (int) – Step size between splits.
window (str) – Window type (‘rolling’ or ‘expanding’).
- __init__(split_freq, train_size, test_size, gap=0, stride=None, window='rolling')#
Initialize time-based split parameters.
- Parameters:
split_freq (
str) – Time unit for splits (e.g., ‘days’, ‘months’).train_size (
int) – Training period length in split_freq units.test_size (
int) – Forecast period length in split_freq units.gap (
int) – Gap between training and forecast periods (default: 0).stride (
int|None) – Step size between splits (default: test_size).window (
str) – Window type (‘rolling’ or ‘expanding’) (default: ‘rolling’).
- Raises:
ValueError – If any parameter is invalid. In particular, train_size must be a positive integer that is greater than or equal to test_size.
- property forecast_delta: relativedelta#
Calculate forecast period duration.
- property gap_delta: relativedelta#
Calculate gap duration.
- abstractmethod split(data)#
Generate train/test splits from data.
- property stride_delta: relativedelta#
Calculate stride duration.
- property train_delta: relativedelta#
Calculate training period duration.