Forecast Results#

Source Files

twiga/forecaster/result.py - ForecastResult, ForecastCollection, ForecastKind, RawPrediction

When you call forecast(), predict(), or evaluate_*_forecast(), the forecaster returns forecast results as typed data structures. Understanding these containers helps you work with predictions, convert to DataFrames, and extract specific forecast kinds.

ForecastKind#

Defines the type of forecast output:

from twiga.forecaster.result import ForecastKind

print(ForecastKind.POINT)           # "point"
print(ForecastKind.PARAMETRIC)      # "parametric"
print(ForecastKind.QUANTILE)        # "quantile"
print(ForecastKind.SAMPLES)         # "samples"
print(ForecastKind.INTERVAL)        # "interval"

Kind	Description	Required Fields
`POINT`	Single point prediction (mean or median)	`loc`
`PARAMETRIC`	Mean and standard deviation	`loc`, `scale`
`QUANTILE`	Multiple quantile levels	`loc`, `quantiles`, `quantile_levels`
`SAMPLES`	Monte Carlo samples from the predictive distribution	`loc`, `samples`
`INTERVAL`	Lower and upper bounds (conformal or otherwise)	`loc`, `lower`, `upper`

RawPrediction#

An intermediate typed container for model output before inverse-scaling. Sits between the model’s output and the final ForecastResult.

Fields#

@dataclass
class RawPrediction:
    loc: np.ndarray                              # Required: (B, H, T)
    kind: ForecastKind = ForecastKind.POINT
    lower: np.ndarray | None = None              # For INTERVAL
    upper: np.ndarray | None = None              # For INTERVAL
    scale: np.ndarray | None = None              # For PARAMETRIC
    quantiles: np.ndarray | None = None          # For QUANTILE
    quantile_levels: list[float] | np.ndarray = None
    conf_level: list[float] | np.ndarray = None
    samples: np.ndarray | None = None            # For SAMPLES

Creating a RawPrediction#

from twiga.forecaster.result import RawPrediction, ForecastKind
import numpy as np

# From a point forecast (ndarray)
raw = RawPrediction.from_model_output(np.random.randn(10, 24, 1))
# kind = POINT, loc shape = (10, 24, 1)

# From a dict with mean and scale
raw = RawPrediction.from_model_output({
    "loc": np.random.randn(10, 24, 1),
    "scale": np.abs(np.random.randn(10, 24, 1)),
})
# kind = PARAMETRIC

# From a dict with quantiles
raw = RawPrediction.from_model_output({
    "loc": np.random.randn(10, 24, 1),
    "quantiles": np.random.randn(10, 5, 24, 1),
    "quantile_levels": [0.1, 0.25, 0.5, 0.75, 0.9],
})
# kind = QUANTILE

# From a conformal tuple (lower, loc, upper)
raw = RawPrediction.from_model_output((
    lower_array,
    point_array,
    upper_array,
))
# kind = INTERVAL

ForecastResult#

The main output container for a single model’s forecast. Stores predictions, optional ground truth, metadata, and provides conversion methods.

Fields#

Field	Type	Description
`timestamps`	`np.ndarray` (B, H, T)	Forecast timestamps
`loc`	`np.ndarray` (B, H, T)	Point predictions
`targets`	`list[str]`	Target variable names
`model_name`	`str`	Human-readable model identifier
`kind`	`ForecastKind`	Type of forecast
`ground_truth`	`np.ndarray` or `None`	Optional actual values
`scale`	`np.ndarray` or `None`	For PARAMETRIC: standard deviation
`quantiles`	`np.ndarray` or `None`	For QUANTILE: (B, N_q, H, T)
`quantile_levels`	`list[float]` or `None`	For QUANTILE: probability levels
`samples`	`np.ndarray` or `None`	For SAMPLES: (B, N_samples, H, T)
`lower`	`np.ndarray` or `None`	For INTERVAL: lower bound
`upper`	`np.ndarray` or `None`	For INTERVAL: upper bound
`inference_time`	`float`	Inference duration in seconds

Creating a ForecastResult#

from twiga.forecaster.result import ForecastResult, ForecastKind
import numpy as np

# Point forecast
result = ForecastResult(
    timestamps=timestamps_array,  # (B, H, T)
    loc=predictions_array,        # (B, H, T)
    targets=["load_mw"],
    model_name="xgboost",
    kind=ForecastKind.POINT,
)

# Quantile forecast
result = ForecastResult(
    timestamps=timestamps_array,
    loc=median_quantile,          # (B, H, T)
    targets=["load_mw"],
    model_name="qr_xgboost",
    kind=ForecastKind.QUANTILE,
    quantiles=all_quantiles,      # (B, 5, H, T)
    quantile_levels=[0.1, 0.25, 0.5, 0.75, 0.9],
)

# Parametric forecast
result = ForecastResult(
    timestamps=timestamps_array,
    loc=mean_array,
    targets=["load_mw"],
    model_name="mlpf_normal",
    kind=ForecastKind.PARAMETRIC,
    scale=std_array,              # (B, H, T)
)

# With ground truth
result = ForecastResult(
    timestamps=timestamps_array,
    loc=predictions_array,
    targets=["load_mw"],
    model_name="xgboost",
    kind=ForecastKind.POINT,
    ground_truth=actual_values,   # (B, H, T)
)

Converting to DataFrame#

Convert a result to tidy long-format DataFrame:

# Long format (default)
df = result.to_dataframe(fmt="long")
# Columns: timestamp, target, model, forecast, [actual], [scale|lower/upper|q_level/quantile_forecast]

# Wide format for quantiles
df = result.to_dataframe(fmt="wide")
# Columns: timestamp, target, model, forecast, [actual], q_0.10, q_0.25, q_0.50, q_0.75, q_0.90

Evaluating a Result#

If ground truth is attached, evaluate it directly:

metrics_df = result.evaluate()
# Returns DataFrame with point metrics (MAE, RMSE, etc.)

Or pass new ground truth:

metrics_df = result.evaluate(ground_truth=new_actual_values)

ForecastCollection#

Container for multiple ForecastResult objects, one per model.

Fields#

@dataclass
class ForecastCollection:
    results: dict[str, ForecastResult]

Creating and Using a Collection#

from twiga.forecaster.result import ForecastCollection

collection = ForecastCollection()
collection.add(result_model_1)
collection.add(result_model_2)

# Access by model name
result = collection["xgboost"]

# Iterate
for result in collection:
    print(result.model_name, result.loc.shape)

# Get all model names
names = collection.model_names
# Output: ["xgboost", "mlpf_normal", ...]

# Check if a model exists
if "qr_xgboost" in collection:
    print("Found")

Converting to DataFrame#

Combine all results into one DataFrame:

# Long format
df = collection.to_dataframe(fmt="long")
# Rows: all forecasts from all models

# Wide format
df = collection.to_dataframe(fmt="wide")
# Columns: timestamp, target, model, forecast, q_0.10, q_0.25, ...

Evaluating All Models#

metrics_df = collection.evaluate()
# Returns DataFrame with one row per model, columns for all metrics

Shape Conventions#

All arrays follow this shape convention:

B = batch size (number of forecast sequences)
H = horizon (number of forecast steps)
T = number of targets (usually 1 for univariate forecasting)
N_q = number of quantiles
N_samples = number of Monte Carlo samples

Array	Shape	Meaning
`loc`	(B, H, T)	Point forecast for each step and target
`ground_truth`	(B, H, T)	Actual observed values
`scale`	(B, H, T)	Parametric std (one value per step)
`quantiles`	(B, N_q, H, T)	Quantile forecasts in order: [q_0.1, q_0.5, q_0.9, …]
`samples`	(B, N_samples, H, T)	MC samples; summarized to quantiles for DataFrames
`lower`, `upper`	(B, H, T)	Interval bounds
`timestamps`	(B, H, T)	Forecast datetime for each step (stored as nanoseconds)

DataFrame Output Examples#

Point Forecast#

  timestamp  target  model  forecast  actual
0 2024-01-01  load_mw xgboost  1234.5  1250.2
1 2024-01-02  load_mw xgboost  1256.3  1270.1

Quantile Forecast (Wide)#

  timestamp  target  model  forecast  actual  q_0.10  q_0.25  q_0.50  q_0.75  q_0.90
0 2024-01-01  load_mw   qrxgb  1234.5  1250.2  1150.0  1200.0  1234.5  1270.0  1320.0
1 2024-01-02  load_mw   qrxgb  1256.3  1270.1  1180.0  1220.0  1256.3  1290.0  1340.0

Quantile Forecast (Long)#

  timestamp  target  model  forecast  actual  q_level  quantile_forecast
2024-01-01  load_mw   qrxgb  1234.5  1250.2     0.10             1150.0
2024-01-01  load_mw   qrxgb  1234.5  1250.2     0.50             1234.5
2024-01-01  load_mw   qrxgb  1234.5  1250.2     0.90             1320.0

Interval Forecast#

  timestamp  target  model  forecast  actual  lower  upper
0 2024-01-01  load_mw  conformal  1234.5  1250.2  1150  1320
1 2024-01-02  load_mw  conformal  1256.3  1270.1  1180  1340

Accessing Results from TwigaForecaster#

Methods that return results:

# Returns a ForecastCollection with one result per model
collection = forecaster.forecast(test_df)
result = collection["xgboost"]

# Returns results_df and metrics_df
results_df, metrics_df = forecaster.evaluate_point_forecast(test_df)

results_df, metrics_df = forecaster.evaluate_interval_forecast(test_df)

results_df, metrics_df = forecaster.evaluate_quantile_forecast(test_df)

API Reference#

ForecastKind#

class twiga.forecaster.result.ForecastKind(*values)#

Bases: StrEnum

Supported forecast output types.

Values are strings and can be used directly as dict keys.

INTERVAL = 'interval'#

PARAMETRIC = 'parametric'#

POINT = 'point'#

QUANTILE = 'quantile'#

SAMPLES = 'samples'#

RawPrediction#

class twiga.forecaster.result.RawPrediction(loc, kind=ForecastKind.POINT, lower=None, upper=None, scale=None, quantiles=None, quantile_levels=None, conf_level=None, samples=None)#

Bases: object

Typed intermediate container for unscaled model output.

Sits between the model’s forward() / forecast() call and the inverse-scaling step, replacing the untyped np.ndarray | dict | tuple union. Only loc is required; the remaining fields are populated according to the model’s output kind.

Variables:

loc – Point predictions (mean/median), shape (B, H, T).
kind – Determines which optional arrays are expected.
scale – Parametric std-dev / scale, same shape as loc.
quantiles – shape (B, N_q, H, T).
quantile_levels – Corresponding probability levels.
samples – shape (B, N_samples, H, T).
conf_level – Coverage levels for conformal methods.

conf_level: list[float] | ndarray | None = None#

classmethod from_model_output(output)#

Build a RawPrediction from a raw model forecast() output.

Supports: * np.ndarray - point forecast. * tuple[lower, loc, upper] - conformal interval (3-element). * dict with key "loc" - parametric / quantile / sample / point.

Parameters:

output (ndarray | dict[str, ndarray] | tuple[ndarray, ...]) – Raw model output in one of the supported formats.

Return type:

RawPrediction

Returns:

A typed RawPrediction instance.

Raises:

TypeError – If output is not a supported type.
ValueError – If a dict lacks "loc", or a tuple does not have exactly three elements.

kind: ForecastKind = 'point'#

loc: ndarray#

lower: ndarray | None = None#

quantile_levels: list[float] | ndarray | None = None#

quantiles: ndarray | None = None#

samples: ndarray | None = None#

scale: ndarray | None = None#

upper: ndarray | None = None#

ForecastResult#

class twiga.forecaster.result.ForecastResult(timestamps, loc, targets, model_name, kind, ground_truth=None, scale=None, quantiles=None, quantile_levels=None, conf_level=None, samples=None, lower=None, upper=None, inference_time=0.0)#

Bases: object

Container for one model’s forecast output.

Variables:

timestamps – shape (n_batch, n_horizon, n_targets)
loc – point predictions (mean/median), shape (n_batch, n_horizon, n_targets)
targets – ordered list of target variable names
model_name – human-readable model identifier
kind – determines which optional arrays are expected and how to convert
ground_truth – optional, same shape as loc
scale – parametric std-dev / scale, same shape as loc
quantiles – shape (n_batch, n_q, n_horizon, n_targets)
quantile_levels – corresponding probability levels (e.g. [0.1, 0.5, 0.9])
samples – shape (n_batch, n_samples, n_horizon, n_targets)
lower – lower bound, same shape as loc
upper – upper bound, same shape as loc
inference_time – inference duration in seconds
conf_level
metric_name

conf_level: list[float] | ndarray | None = None#

evaluate(ground_truth=None, **kwargs)#

Evaluate forecast against ground truth using kind-appropriate metrics.

Forwards to twiga.core.metrics.evaluate_forecast().

Parameters:

ground_truth (ndarray | None) – shape (n_batch, n_horizon, n_targets). When omitted the ground_truth stored on the result is used.
**kwargs – forwarded to the underlying evaluate function.

Return type:

DataFrame

Returns:

DataFrame of per-day, per-target metrics.

Raises:

ValueError – if no ground truth is available.

ground_truth: ndarray | None = None#

inference_time: float = 0.0#

kind: ForecastKind#

loc: ndarray#

lower: ndarray | None = None#

model_name: str#

quantile_levels: list[float] | ndarray | None = None#

quantiles: ndarray | None = None#

samples: ndarray | None = None#

scale: ndarray | None = None#

targets: list[str]#

timestamps: ndarray#

to_dataframe(fmt='long')#

Convert forecast to tidy DataFrame.

Always includes: timestamp, target, model, forecast. Optional: actual (when ground_truth is present).

Additional columns depend on forecast kind:

POINT: no extra columns
PARAMETRIC: scale
INTERVAL: lower, upper
QUANTILE (fmt=”wide”): q_0.10, q_0.50, …
QUANTILE (fmt=”long”): q_level, quantile_forecast
SAMPLES: q_0.10, q_0.50, q_0.90 (empirical quantiles)

Parameters:: fmt (str) – “long” (default) or “wide” - only affects QUANTILE
Return type:: DataFrame
Returns:: pandas DataFrame in long or wide format
Raises:: ValueError – if fmt is invalid

upper: ndarray | None = None#

ForecastCollection#

class twiga.forecaster.result.ForecastCollection(results=<factory>)#

Bases: object

Collection of ForecastResult objects from multiple models.

add(result)#

Add or replace result using its model_name as key.

Return type:: None

evaluate(**kwargs)#

Evaluate all models and return a combined metrics DataFrame.

Calls ForecastResult.evaluate() on each result and concatenates the output, adding a "Model" column derived from each result’s model_name. Ground truth must be attached to each result (i.e. forecast() must have been called with test data that contains the target column).

Parameters:: **kwargs – Forwarded to each ForecastResult.evaluate() call (e.g. metric_names, freq).
Return type:: DataFrame
Returns:: Combined metrics DataFrame with a "Model" column.
Raises:: ValueError – If the collection is empty or any result lacks ground truth.

property model_names: list[str]#

results: dict[str, ForecastResult]#

to_dataframe(fmt='long')#

Concatenate all model forecasts into one DataFrame.

Parameters:: fmt (str) – passed to each ForecastResult.to_dataframe()
Return type:: DataFrame
Returns:: Combined long-format DataFrame
Raises:: ValueError – if collection is empty

Forecast Results#

ForecastKind#

RawPrediction#

Fields#

Creating a RawPrediction#

ForecastResult#

Fields#

Creating a ForecastResult#

Converting to DataFrame#

Evaluating a Result#

ForecastCollection#

Fields#

Creating and Using a Collection#

Converting to DataFrame#

Evaluating All Models#

Shape Conventions#

DataFrame Output Examples#

Point Forecast#

Quantile Forecast (Wide)#

Quantile Forecast (Long)#

Interval Forecast#

Accessing Results from TwigaForecaster#

API Reference#

ForecastKind#

RawPrediction#

ForecastResult#

ForecastCollection#

See Also#