Forecast Results#

Source Files
  • twiga/forecaster/result.py - ForecastResult, ForecastCollection, ForecastKind, RawPrediction

When you call forecast(), predict(), or evaluate_*_forecast(), the forecaster returns forecast results as typed data structures. Understanding these containers helps you work with predictions, convert to DataFrames, and extract specific forecast kinds.

ForecastKind#

Defines the type of forecast output:

from twiga.forecaster.result import ForecastKind

print(ForecastKind.POINT)           # "point"
print(ForecastKind.PARAMETRIC)      # "parametric"
print(ForecastKind.QUANTILE)        # "quantile"
print(ForecastKind.SAMPLES)         # "samples"
print(ForecastKind.INTERVAL)        # "interval"

Kind

Description

Required Fields

POINT

Single point prediction (mean or median)

loc

PARAMETRIC

Mean and standard deviation

loc, scale

QUANTILE

Multiple quantile levels

loc, quantiles, quantile_levels

SAMPLES

Monte Carlo samples from the predictive distribution

loc, samples

INTERVAL

Lower and upper bounds (conformal or otherwise)

loc, lower, upper

RawPrediction#

An intermediate typed container for model output before inverse-scaling. Sits between the model’s output and the final ForecastResult.

Fields#

@dataclass
class RawPrediction:
    loc: np.ndarray                              # Required: (B, H, T)
    kind: ForecastKind = ForecastKind.POINT
    lower: np.ndarray | None = None              # For INTERVAL
    upper: np.ndarray | None = None              # For INTERVAL
    scale: np.ndarray | None = None              # For PARAMETRIC
    quantiles: np.ndarray | None = None          # For QUANTILE
    quantile_levels: list[float] | np.ndarray = None
    conf_level: list[float] | np.ndarray = None
    samples: np.ndarray | None = None            # For SAMPLES

Creating a RawPrediction#

from twiga.forecaster.result import RawPrediction, ForecastKind
import numpy as np

# From a point forecast (ndarray)
raw = RawPrediction.from_model_output(np.random.randn(10, 24, 1))
# kind = POINT, loc shape = (10, 24, 1)

# From a dict with mean and scale
raw = RawPrediction.from_model_output({
    "loc": np.random.randn(10, 24, 1),
    "scale": np.abs(np.random.randn(10, 24, 1)),
})
# kind = PARAMETRIC

# From a dict with quantiles
raw = RawPrediction.from_model_output({
    "loc": np.random.randn(10, 24, 1),
    "quantiles": np.random.randn(10, 5, 24, 1),
    "quantile_levels": [0.1, 0.25, 0.5, 0.75, 0.9],
})
# kind = QUANTILE

# From a conformal tuple (lower, loc, upper)
raw = RawPrediction.from_model_output((
    lower_array,
    point_array,
    upper_array,
))
# kind = INTERVAL

ForecastResult#

The main output container for a single model’s forecast. Stores predictions, optional ground truth, metadata, and provides conversion methods.

Fields#

Field

Type

Description

timestamps

np.ndarray (B, H, T)

Forecast timestamps

loc

np.ndarray (B, H, T)

Point predictions

targets

list[str]

Target variable names

model_name

str

Human-readable model identifier

kind

ForecastKind

Type of forecast

ground_truth

np.ndarray or None

Optional actual values

scale

np.ndarray or None

For PARAMETRIC: standard deviation

quantiles

np.ndarray or None

For QUANTILE: (B, N_q, H, T)

quantile_levels

list[float] or None

For QUANTILE: probability levels

samples

np.ndarray or None

For SAMPLES: (B, N_samples, H, T)

lower

np.ndarray or None

For INTERVAL: lower bound

upper

np.ndarray or None

For INTERVAL: upper bound

inference_time

float

Inference duration in seconds

Creating a ForecastResult#

from twiga.forecaster.result import ForecastResult, ForecastKind
import numpy as np

# Point forecast
result = ForecastResult(
    timestamps=timestamps_array,  # (B, H, T)
    loc=predictions_array,        # (B, H, T)
    targets=["load_mw"],
    model_name="xgboost",
    kind=ForecastKind.POINT,
)

# Quantile forecast
result = ForecastResult(
    timestamps=timestamps_array,
    loc=median_quantile,          # (B, H, T)
    targets=["load_mw"],
    model_name="qr_xgboost",
    kind=ForecastKind.QUANTILE,
    quantiles=all_quantiles,      # (B, 5, H, T)
    quantile_levels=[0.1, 0.25, 0.5, 0.75, 0.9],
)

# Parametric forecast
result = ForecastResult(
    timestamps=timestamps_array,
    loc=mean_array,
    targets=["load_mw"],
    model_name="mlpf_normal",
    kind=ForecastKind.PARAMETRIC,
    scale=std_array,              # (B, H, T)
)

# With ground truth
result = ForecastResult(
    timestamps=timestamps_array,
    loc=predictions_array,
    targets=["load_mw"],
    model_name="xgboost",
    kind=ForecastKind.POINT,
    ground_truth=actual_values,   # (B, H, T)
)

Converting to DataFrame#

Convert a result to tidy long-format DataFrame:

# Long format (default)
df = result.to_dataframe(fmt="long")
# Columns: timestamp, target, model, forecast, [actual], [scale|lower/upper|q_level/quantile_forecast]

# Wide format for quantiles
df = result.to_dataframe(fmt="wide")
# Columns: timestamp, target, model, forecast, [actual], q_0.10, q_0.25, q_0.50, q_0.75, q_0.90

Evaluating a Result#

If ground truth is attached, evaluate it directly:

metrics_df = result.evaluate()
# Returns DataFrame with point metrics (MAE, RMSE, etc.)

Or pass new ground truth:

metrics_df = result.evaluate(ground_truth=new_actual_values)

ForecastCollection#

Container for multiple ForecastResult objects, one per model.

Fields#

@dataclass
class ForecastCollection:
    results: dict[str, ForecastResult]

Creating and Using a Collection#

from twiga.forecaster.result import ForecastCollection

collection = ForecastCollection()
collection.add(result_model_1)
collection.add(result_model_2)

# Access by model name
result = collection["xgboost"]

# Iterate
for result in collection:
    print(result.model_name, result.loc.shape)

# Get all model names
names = collection.model_names
# Output: ["xgboost", "mlpf_normal", ...]

# Check if a model exists
if "qr_xgboost" in collection:
    print("Found")

Converting to DataFrame#

Combine all results into one DataFrame:

# Long format
df = collection.to_dataframe(fmt="long")
# Rows: all forecasts from all models

# Wide format
df = collection.to_dataframe(fmt="wide")
# Columns: timestamp, target, model, forecast, q_0.10, q_0.25, ...

Evaluating All Models#

metrics_df = collection.evaluate()
# Returns DataFrame with one row per model, columns for all metrics

Shape Conventions#

All arrays follow this shape convention:

  • B = batch size (number of forecast sequences)

  • H = horizon (number of forecast steps)

  • T = number of targets (usually 1 for univariate forecasting)

  • N_q = number of quantiles

  • N_samples = number of Monte Carlo samples

Array

Shape

Meaning

loc

(B, H, T)

Point forecast for each step and target

ground_truth

(B, H, T)

Actual observed values

scale

(B, H, T)

Parametric std (one value per step)

quantiles

(B, N_q, H, T)

Quantile forecasts in order: [q_0.1, q_0.5, q_0.9, …]

samples

(B, N_samples, H, T)

MC samples; summarized to quantiles for DataFrames

lower, upper

(B, H, T)

Interval bounds

timestamps

(B, H, T)

Forecast datetime for each step (stored as nanoseconds)

DataFrame Output Examples#

Point Forecast#

  timestamp  target  model  forecast  actual
0 2024-01-01  load_mw xgboost  1234.5  1250.2
1 2024-01-02  load_mw xgboost  1256.3  1270.1

Quantile Forecast (Wide)#

  timestamp  target  model  forecast  actual  q_0.10  q_0.25  q_0.50  q_0.75  q_0.90
0 2024-01-01  load_mw   qrxgb  1234.5  1250.2  1150.0  1200.0  1234.5  1270.0  1320.0
1 2024-01-02  load_mw   qrxgb  1256.3  1270.1  1180.0  1220.0  1256.3  1290.0  1340.0

Quantile Forecast (Long)#

  timestamp  target  model  forecast  actual  q_level  quantile_forecast
0 2024-01-01  load_mw   qrxgb  1234.5  1250.2     0.10             1150.0
1 2024-01-01  load_mw   qrxgb  1234.5  1250.2     0.50             1234.5
2 2024-01-01  load_mw   qrxgb  1234.5  1250.2     0.90             1320.0

Interval Forecast#

  timestamp  target  model  forecast  actual  lower  upper
0 2024-01-01  load_mw  conformal  1234.5  1250.2  1150  1320
1 2024-01-02  load_mw  conformal  1256.3  1270.1  1180  1340

Accessing Results from TwigaForecaster#

Methods that return results:

# Returns a ForecastCollection with one result per model
collection = forecaster.forecast(test_df)
result = collection["xgboost"]

# Returns results_df and metrics_df
results_df, metrics_df = forecaster.evaluate_point_forecast(test_df)

results_df, metrics_df = forecaster.evaluate_interval_forecast(test_df)

results_df, metrics_df = forecaster.evaluate_quantile_forecast(test_df)

API Reference#

ForecastKind#

class twiga.forecaster.result.ForecastKind(*values)#

Bases: StrEnum

Supported forecast output types.

Values are strings and can be used directly as dict keys.

INTERVAL = 'interval'#
PARAMETRIC = 'parametric'#
POINT = 'point'#
QUANTILE = 'quantile'#
SAMPLES = 'samples'#

RawPrediction#

class twiga.forecaster.result.RawPrediction(loc, kind=ForecastKind.POINT, lower=None, upper=None, scale=None, quantiles=None, quantile_levels=None, conf_level=None, samples=None)#

Bases: object

Typed intermediate container for unscaled model output.

Sits between the model’s forward() / forecast() call and the inverse-scaling step, replacing the untyped np.ndarray | dict | tuple union. Only loc is required; the remaining fields are populated according to the model’s output kind.

Variables:
  • loc – Point predictions (mean/median), shape (B, H, T).

  • kind – Determines which optional arrays are expected.

  • scale – Parametric std-dev / scale, same shape as loc.

  • quantiles – shape (B, N_q, H, T).

  • quantile_levels – Corresponding probability levels.

  • samples – shape (B, N_samples, H, T).

  • conf_level – Coverage levels for conformal methods.

conf_level: list[float] | ndarray | None = None#
classmethod from_model_output(output)#

Build a RawPrediction from a raw model forecast() output.

Supports: * np.ndarray - point forecast. * tuple[lower, loc, upper] - conformal interval (3-element). * dict with key "loc" - parametric / quantile / sample / point.

Parameters:

output (ndarray | dict[str, ndarray] | tuple[ndarray, ...]) – Raw model output in one of the supported formats.

Return type:

RawPrediction

Returns:

A typed RawPrediction instance.

Raises:
  • TypeError – If output is not a supported type.

  • ValueError – If a dict lacks "loc", or a tuple does not have exactly three elements.

kind: ForecastKind = 'point'#
loc: ndarray#
lower: ndarray | None = None#
quantile_levels: list[float] | ndarray | None = None#
quantiles: ndarray | None = None#
samples: ndarray | None = None#
scale: ndarray | None = None#
upper: ndarray | None = None#

ForecastResult#

class twiga.forecaster.result.ForecastResult(timestamps, loc, targets, model_name, kind, ground_truth=None, scale=None, quantiles=None, quantile_levels=None, conf_level=None, samples=None, lower=None, upper=None, inference_time=0.0)#

Bases: object

Container for one model’s forecast output.

Variables:
  • timestamps – shape (n_batch, n_horizon, n_targets)

  • loc – point predictions (mean/median), shape (n_batch, n_horizon, n_targets)

  • targets – ordered list of target variable names

  • model_name – human-readable model identifier

  • kind – determines which optional arrays are expected and how to convert

  • ground_truth – optional, same shape as loc

  • scale – parametric std-dev / scale, same shape as loc

  • quantiles – shape (n_batch, n_q, n_horizon, n_targets)

  • quantile_levels – corresponding probability levels (e.g. [0.1, 0.5, 0.9])

  • samples – shape (n_batch, n_samples, n_horizon, n_targets)

  • lower – lower bound, same shape as loc

  • upper – upper bound, same shape as loc

  • inference_time – inference duration in seconds

  • conf_level

  • metric_name

conf_level: list[float] | ndarray | None = None#
evaluate(ground_truth=None, **kwargs)#

Evaluate forecast against ground truth using kind-appropriate metrics.

Forwards to twiga.core.metrics.evaluate_forecast().

Parameters:
  • ground_truth (ndarray | None) – shape (n_batch, n_horizon, n_targets). When omitted the ground_truth stored on the result is used.

  • **kwargs – forwarded to the underlying evaluate function.

Return type:

DataFrame

Returns:

DataFrame of per-day, per-target metrics.

Raises:

ValueError – if no ground truth is available.

ground_truth: ndarray | None = None#
inference_time: float = 0.0#
kind: ForecastKind#
loc: ndarray#
lower: ndarray | None = None#
model_name: str#
quantile_levels: list[float] | ndarray | None = None#
quantiles: ndarray | None = None#
samples: ndarray | None = None#
scale: ndarray | None = None#
targets: list[str]#
timestamps: ndarray#
to_dataframe(fmt='long')#

Convert forecast to tidy DataFrame.

Always includes: timestamp, target, model, forecast. Optional: actual (when ground_truth is present).

Additional columns depend on forecast kind:

  • POINT: no extra columns

  • PARAMETRIC: scale

  • INTERVAL: lower, upper

  • QUANTILE (fmt=”wide”): q_0.10, q_0.50, …

  • QUANTILE (fmt=”long”): q_level, quantile_forecast

  • SAMPLES: q_0.10, q_0.50, q_0.90 (empirical quantiles)

Parameters:

fmt (str) – “long” (default) or “wide” - only affects QUANTILE

Return type:

DataFrame

Returns:

pandas DataFrame in long or wide format

Raises:

ValueError – if fmt is invalid

upper: ndarray | None = None#

ForecastCollection#

class twiga.forecaster.result.ForecastCollection(results=<factory>)#

Bases: object

Collection of ForecastResult objects from multiple models.

add(result)#

Add or replace result using its model_name as key.

Return type:

None

evaluate(**kwargs)#

Evaluate all models and return a combined metrics DataFrame.

Calls ForecastResult.evaluate() on each result and concatenates the output, adding a "Model" column derived from each result’s model_name. Ground truth must be attached to each result (i.e. forecast() must have been called with test data that contains the target column).

Parameters:

**kwargs – Forwarded to each ForecastResult.evaluate() call (e.g. metric_names, freq).

Return type:

DataFrame

Returns:

Combined metrics DataFrame with a "Model" column.

Raises:

ValueError – If the collection is empty or any result lacks ground truth.

property model_names: list[str]#
results: dict[str, ForecastResult]#
to_dataframe(fmt='long')#

Concatenate all model forecasts into one DataFrame.

Parameters:

fmt (str) – passed to each ForecastResult.to_dataframe()

Return type:

DataFrame

Returns:

Combined long-format DataFrame

Raises:

ValueError – if collection is empty

See Also#