Testing#
Source Files
tests/unit/pyproject.toml([tool.pytest.ini_options]).github/workflows/check_pullrequest.yaml
Twiga uses pytest for testing with a minimum coverage requirement of 80%. Tests are organized as unit tests mirroring the source package structure.
Test Structure#
tests/
└── unit/
├── config/ # Configuration validation
│ └── test_config_base.py
├── conformal/ # Conformal prediction methods
│ ├── base.py
│ ├── split.py
│ ├── cqr.py
│ ├── residual_conformal
│ └── test_residual_fitting.py
├── data/ # Data pipeline & feature engineering
│ ├── test_autores.py
│ ├── test_data_loader.py
│ ├── test_data_pipeline.py
│ ├── test_feature.py
│ ├── test_feature_engeering.py
│ ├── test_processing.py
│ └── test_temporal.py
├── eval/ # Backtesting
│ └── test_backtester.py
├── forecaster/ # Forecaster orchestration
│ ├── test_abstract_forecaster.py
│ ├── test_base_forecaster.py
│ ├── test_core_forecaster.py
│ ├── test_ensemble.py
│ ├── test_registry.py
│ └── test_utils.py
├── metrics/ # Evaluation metrics
│ ├── test_point_metrics.py
│ ├── test_core_metrics_interval.py
│ └── test_core_metrics_prob.py
├── models/ # Model implementations
│ ├── ml/
│ │ ├── test_base_regressor.py
│ │ ├── test_catboost.py
│ │ ├── test_catboostgauss.py
│ │ ├── test_lightgbm.py
│ │ ├── test_lineareg.py
│ │ └── test_xgboost.py
│ └── nn/
│ ├── core/
│ │ ├── test_base.py
│ │ ├── test_base_arch.py
│ │ ├── test_base_model.py
│ │ ├── test_embending.py
│ │ └── test_linear.py
│ ├── mlpf/
│ │ ├── test_mlpf.py
│ │ ├── test_mlpf_model.py
│ │ ├── test_mlpfgam.py
│ │ └── test_mlpfgam_model.py
│ ├── nhits/
│ │ ├── test_nhits.py
│ │ └── test_nhits_model.py
│ ├── ganf/
│ │ └── test_ganf_network.py
│ └── prob/
│ └── test_mplfqr_model.py
├── quantile/ # Quantile distributions & losses
│ ├── test_fpquantile.py
│ ├── test_quantile.py
│ └── test_quantile_loss.py
├── plot/ # Visualisation suite
│ ├── conftest.py # Shared fixtures (forecast_df, acf_series, ...)
│ ├── test_timeseries.py
│ ├── test_exploration.py
│ ├── test_distribution.py
│ ├── test_stats_plots.py
│ └── test_residuals.py
└── stats/ # Statistical analysis
├── test_corr.py # rank_correlation, compute_anova_f, compute_chi2
├── test_mutual_information.py
├── test_ppscore.py
└── test_xicorr.py
Test Categories#
Category |
Tests |
Covers |
|---|---|---|
config |
1 file |
Pydantic config validation, search space inference |
conformal |
5 files |
Split conformal, CQR, CRC, residual fitting |
data |
9 files |
Pipeline, autoregressive features, temporal features, data loader, feature selection, AssociationAnalyzer |
eval |
1 file |
Time-based cross-validation splits |
forecaster |
6 files |
Abstract, base, core forecaster, ensemble, registry, utilities |
metrics |
3 files |
Point, interval, and probabilistic metrics |
models/ml |
6 files |
BaseRegressor, CatBoost, XGBoost, LightGBM, Linear, Gaussian CatBoost |
models/nn |
10 files |
Base classes, embeddings, MLPF, MLPGAM, NHITS, GANF, quantile |
plot |
5 files + conftest |
lets-plot forecast, exploration, distribution, stats, and residual plots |
quantile |
3 files |
Quantile distributions, losses, interpolation |
stats |
4 files |
Association, mutual information, PPS, xi-correlation |
Running Tests#
Full Test Suite#
uv run pytest
This uses the default configuration from pyproject.toml:
[tool.pytest.ini_options]
norecursedirs = ["examples"]
addopts = "--cov=twiga --cov-report=term-missing --cov-report=xml --cov-config=.coveragerc --cov-fail-under=80"
Default behavior:
Measures coverage of the
twigapackageReports missing lines in terminal output
Generates
coverage.xmlfor CI uploadFails if coverage drops below 80%
Run Specific Tests#
# Single test file
uv run pytest tests/unit/metrics/test_point_metrics.py
# Single test category
uv run pytest tests/unit/models/ml/
# Pattern matching
uv run pytest -k "test_catboost"
# Verbose output
uv run pytest -v tests/unit/forecaster/
Coverage Report#
# Terminal report with missing lines
uv run pytest --cov=twiga --cov-report=term-missing
# HTML report for detailed browsing
uv run pytest --cov=twiga --cov-report=html
# Open htmlcov/index.html in a browser
CI/CD Pipeline#
The GitHub Actions workflow (.github/workflows/check_pullrequest.yaml) runs on every push to main and all pull requests.
Pipeline Architecture#
graph TD
A[Push / Pull Request] --> B{Event Type}
B -->|PR or non-main push| C[pre-commit Job]
B -->|Any| D[test-and-lint Job]
C --> C1[Checkout + Python 3.11]
C1 --> C2[Install uv]
C2 --> C3[Cache pre-commit envs]
C3 --> C4[Run pre-commit hooks]
D --> D1[Checkout + Python 3.11]
D1 --> D2[Install uv]
D2 --> D3[Cache .venv dependencies]
D3 --> D4["uv sync --all-extras --dev"]
D4 --> D5[Ruff linting + format check]
D5 --> D6[ty type checking]
D6 --> D7["pytest --cov-fail-under=80"]
D7 --> D8[Upload coverage to Codecov]
Jobs#
1. pre-commit#
Runs on PRs and non-main pushes. Executes all pre-commit hooks defined in .pre-commit-config.yaml:
Commitizen validation
YAML/JSON/TOML checks
Large file detection
Merge conflict detection
Codespell
Ruff linting & formatting
Notebook cleaning
Interrogate docstring check
2. test-and-lint#
Runs on all events:
Step |
Command |
|---|---|
Install dependencies |
|
Lint |
|
Format check |
|
Type check |
|
Test |
|
Coverage upload |
Codecov with |
Writing Tests#
Conventions#
Test files follow the naming pattern
test_*.pyTest directory structure mirrors the source package structure
Use
pytestfixtures for shared setupThe
examples/directory is excluded from test discovery (norecursedirs = ["examples"])Security-related assertions (
S101) are allowed in test files viaruffper-file ignore
Example Test#
import numpy as np
import pytest
from twiga.core.metrics.point import mae, rmse
class TestPointMetrics:
"""Tests for point forecast metrics."""
def test_mae_perfect_forecast(self):
y_true = np.array([1.0, 2.0, 3.0])
y_pred = np.array([1.0, 2.0, 3.0])
assert mae(y_true, y_pred) == 0.0
def test_mae_with_errors(self):
y_true = np.array([1.0, 2.0, 3.0])
y_pred = np.array([1.5, 2.5, 3.5])
assert mae(y_true, y_pred) == pytest.approx(0.5)
def test_rmse_greater_than_mae(self):
y_true = np.array([1.0, 2.0, 3.0])
y_pred = np.array([1.1, 2.5, 3.0])
assert rmse(y_true, y_pred) >= mae(y_true, y_pred)
Testing Plot Functions#
Twiga uses two complementary approaches for plot tests:
1. PlotSpec assertion (lets-plot functions)
All lets-plot functions return a PlotSpec object. Tests import it from the internal module and assert the return type:
from lets_plot.plot.core import PlotSpec
from twiga.core.plot import plot_forecast
def test_returns_plot_spec(forecast_df):
p = plot_forecast(forecast_df, actual_col="Actual", forecast_col="forecast")
assert isinstance(p, PlotSpec)
2. pytest-mpl image comparison (Matplotlib functions)
Matplotlib-based functions (e.g., plot_prediction in matplot_theme.py) use pytest-mpl for pixel-level regression testing:
import pytest
import matplotlib.pyplot as plt
from twiga.core.plot.matplot_theme import plot_prediction
@pytest.mark.mpl_image_compare(baseline_dir="baseline", tolerance=10, style="default")
def test_plot_prediction_basic():
fig, ax = plt.subplots(figsize=(6, 3))
plot_prediction(ax, true=[1, 2, 3, 4, 5], mu=[1.1, 2.0, 2.9, 4.2, 5.1])
return fig
Workflow for pytest-mpl:
Generate baseline images once:
uv run pytest --mpl-generate-path=tests/unit/plot/baseline tests/unit/plot/test_residuals.py
Commit the
baseline/directory to version control.On subsequent runs, compare against baselines:
uv run pytest --mpl tests/unit/plot/test_residuals.py
Note
pytest-mpl image comparison tests are skipped unless --mpl is explicitly passed. This means uv run pytest (no flag) always passes on missing baselines - generate baselines before reviewing changes.
Testing Models#
When adding a new model, create tests that verify:
Config validation - required fields, type checking, search space defaults
Fit/predict cycle - model trains without errors and produces correct output shapes
Forecast format -
forecast()returns the expected dictionary structureFeature formatting -
format_features()produces correct array shapesSerialization - model can be saved and loaded (if applicable)
Coverage Requirements#
Metric |
Threshold |
|---|---|
Overall line coverage |
≥ 80% |
Coverage tool |
|
CI enforcement |
|
Reporting |
Codecov (XML upload) |
Warning
Pull requests that reduce coverage below 80% will fail the CI pipeline. Always add tests when introducing new functionality.
Useful Commands#
Task |
Command |
|---|---|
Run all tests |
|
Run with verbose output |
|
Run specific file |
|
Run by keyword |
|
Show coverage in terminal |
|
Generate HTML coverage |
|
Run linter |
|
Run formatter |
|
Run type checker |
|
Generate plot baselines |
|
Run image comparison tests |
|
Run all pre-commit hooks |
|