Forecast Results#
Source Files
twiga/forecaster/result.py-ForecastResult,ForecastCollection,ForecastKind,RawPrediction
When you call forecast(), predict(), or evaluate_*_forecast(), the forecaster returns forecast results as typed data structures. Understanding these containers helps you work with predictions, convert to DataFrames, and extract specific forecast kinds.
ForecastKind#
Defines the type of forecast output:
from twiga.forecaster.result import ForecastKind
print(ForecastKind.POINT) # "point"
print(ForecastKind.PARAMETRIC) # "parametric"
print(ForecastKind.QUANTILE) # "quantile"
print(ForecastKind.SAMPLES) # "samples"
print(ForecastKind.INTERVAL) # "interval"
Kind |
Description |
Required Fields |
|---|---|---|
|
Single point prediction (mean or median) |
|
|
Mean and standard deviation |
|
|
Multiple quantile levels |
|
|
Monte Carlo samples from the predictive distribution |
|
|
Lower and upper bounds (conformal or otherwise) |
|
RawPrediction#
An intermediate typed container for model output before inverse-scaling. Sits between the model’s output and the final ForecastResult.
Fields#
@dataclass
class RawPrediction:
loc: np.ndarray # Required: (B, H, T)
kind: ForecastKind = ForecastKind.POINT
lower: np.ndarray | None = None # For INTERVAL
upper: np.ndarray | None = None # For INTERVAL
scale: np.ndarray | None = None # For PARAMETRIC
quantiles: np.ndarray | None = None # For QUANTILE
quantile_levels: list[float] | np.ndarray = None
conf_level: list[float] | np.ndarray = None
samples: np.ndarray | None = None # For SAMPLES
Creating a RawPrediction#
from twiga.forecaster.result import RawPrediction, ForecastKind
import numpy as np
# From a point forecast (ndarray)
raw = RawPrediction.from_model_output(np.random.randn(10, 24, 1))
# kind = POINT, loc shape = (10, 24, 1)
# From a dict with mean and scale
raw = RawPrediction.from_model_output({
"loc": np.random.randn(10, 24, 1),
"scale": np.abs(np.random.randn(10, 24, 1)),
})
# kind = PARAMETRIC
# From a dict with quantiles
raw = RawPrediction.from_model_output({
"loc": np.random.randn(10, 24, 1),
"quantiles": np.random.randn(10, 5, 24, 1),
"quantile_levels": [0.1, 0.25, 0.5, 0.75, 0.9],
})
# kind = QUANTILE
# From a conformal tuple (lower, loc, upper)
raw = RawPrediction.from_model_output((
lower_array,
point_array,
upper_array,
))
# kind = INTERVAL
ForecastResult#
The main output container for a single model’s forecast. Stores predictions, optional ground truth, metadata, and provides conversion methods.
Fields#
Field |
Type |
Description |
|---|---|---|
|
|
Forecast timestamps |
|
|
Point predictions |
|
|
Target variable names |
|
|
Human-readable model identifier |
|
|
Type of forecast |
|
|
Optional actual values |
|
|
For PARAMETRIC: standard deviation |
|
|
For QUANTILE: (B, N_q, H, T) |
|
|
For QUANTILE: probability levels |
|
|
For SAMPLES: (B, N_samples, H, T) |
|
|
For INTERVAL: lower bound |
|
|
For INTERVAL: upper bound |
|
|
Inference duration in seconds |
Creating a ForecastResult#
from twiga.forecaster.result import ForecastResult, ForecastKind
import numpy as np
# Point forecast
result = ForecastResult(
timestamps=timestamps_array, # (B, H, T)
loc=predictions_array, # (B, H, T)
targets=["load_mw"],
model_name="xgboost",
kind=ForecastKind.POINT,
)
# Quantile forecast
result = ForecastResult(
timestamps=timestamps_array,
loc=median_quantile, # (B, H, T)
targets=["load_mw"],
model_name="qr_xgboost",
kind=ForecastKind.QUANTILE,
quantiles=all_quantiles, # (B, 5, H, T)
quantile_levels=[0.1, 0.25, 0.5, 0.75, 0.9],
)
# Parametric forecast
result = ForecastResult(
timestamps=timestamps_array,
loc=mean_array,
targets=["load_mw"],
model_name="mlpf_normal",
kind=ForecastKind.PARAMETRIC,
scale=std_array, # (B, H, T)
)
# With ground truth
result = ForecastResult(
timestamps=timestamps_array,
loc=predictions_array,
targets=["load_mw"],
model_name="xgboost",
kind=ForecastKind.POINT,
ground_truth=actual_values, # (B, H, T)
)
Converting to DataFrame#
Convert a result to tidy long-format DataFrame:
# Long format (default)
df = result.to_dataframe(fmt="long")
# Columns: timestamp, target, model, forecast, [actual], [scale|lower/upper|q_level/quantile_forecast]
# Wide format for quantiles
df = result.to_dataframe(fmt="wide")
# Columns: timestamp, target, model, forecast, [actual], q_0.10, q_0.25, q_0.50, q_0.75, q_0.90
Evaluating a Result#
If ground truth is attached, evaluate it directly:
metrics_df = result.evaluate()
# Returns DataFrame with point metrics (MAE, RMSE, etc.)
Or pass new ground truth:
metrics_df = result.evaluate(ground_truth=new_actual_values)
ForecastCollection#
Container for multiple ForecastResult objects, one per model.
Fields#
@dataclass
class ForecastCollection:
results: dict[str, ForecastResult]
Creating and Using a Collection#
from twiga.forecaster.result import ForecastCollection
collection = ForecastCollection()
collection.add(result_model_1)
collection.add(result_model_2)
# Access by model name
result = collection["xgboost"]
# Iterate
for result in collection:
print(result.model_name, result.loc.shape)
# Get all model names
names = collection.model_names
# Output: ["xgboost", "mlpf_normal", ...]
# Check if a model exists
if "qr_xgboost" in collection:
print("Found")
Converting to DataFrame#
Combine all results into one DataFrame:
# Long format
df = collection.to_dataframe(fmt="long")
# Rows: all forecasts from all models
# Wide format
df = collection.to_dataframe(fmt="wide")
# Columns: timestamp, target, model, forecast, q_0.10, q_0.25, ...
Evaluating All Models#
metrics_df = collection.evaluate()
# Returns DataFrame with one row per model, columns for all metrics
Shape Conventions#
All arrays follow this shape convention:
B = batch size (number of forecast sequences)
H = horizon (number of forecast steps)
T = number of targets (usually 1 for univariate forecasting)
N_q = number of quantiles
N_samples = number of Monte Carlo samples
Array |
Shape |
Meaning |
|---|---|---|
|
(B, H, T) |
Point forecast for each step and target |
|
(B, H, T) |
Actual observed values |
|
(B, H, T) |
Parametric std (one value per step) |
|
(B, N_q, H, T) |
Quantile forecasts in order: [q_0.1, q_0.5, q_0.9, …] |
|
(B, N_samples, H, T) |
MC samples; summarized to quantiles for DataFrames |
|
(B, H, T) |
Interval bounds |
|
(B, H, T) |
Forecast datetime for each step (stored as nanoseconds) |
DataFrame Output Examples#
Point Forecast#
timestamp target model forecast actual
0 2024-01-01 load_mw xgboost 1234.5 1250.2
1 2024-01-02 load_mw xgboost 1256.3 1270.1
Quantile Forecast (Wide)#
timestamp target model forecast actual q_0.10 q_0.25 q_0.50 q_0.75 q_0.90
0 2024-01-01 load_mw qrxgb 1234.5 1250.2 1150.0 1200.0 1234.5 1270.0 1320.0
1 2024-01-02 load_mw qrxgb 1256.3 1270.1 1180.0 1220.0 1256.3 1290.0 1340.0
Quantile Forecast (Long)#
timestamp target model forecast actual q_level quantile_forecast
0 2024-01-01 load_mw qrxgb 1234.5 1250.2 0.10 1150.0
1 2024-01-01 load_mw qrxgb 1234.5 1250.2 0.50 1234.5
2 2024-01-01 load_mw qrxgb 1234.5 1250.2 0.90 1320.0
Interval Forecast#
timestamp target model forecast actual lower upper
0 2024-01-01 load_mw conformal 1234.5 1250.2 1150 1320
1 2024-01-02 load_mw conformal 1256.3 1270.1 1180 1340
Accessing Results from TwigaForecaster#
Methods that return results:
# Returns a ForecastCollection with one result per model
collection = forecaster.forecast(test_df)
result = collection["xgboost"]
# Returns results_df and metrics_df
results_df, metrics_df = forecaster.evaluate_point_forecast(test_df)
results_df, metrics_df = forecaster.evaluate_interval_forecast(test_df)
results_df, metrics_df = forecaster.evaluate_quantile_forecast(test_df)
API Reference#
ForecastKind#
RawPrediction#
- class twiga.forecaster.result.RawPrediction(loc, kind=ForecastKind.POINT, lower=None, upper=None, scale=None, quantiles=None, quantile_levels=None, conf_level=None, samples=None)#
Bases:
objectTyped intermediate container for unscaled model output.
Sits between the model’s
forward()/forecast()call and the inverse-scaling step, replacing the untypednp.ndarray | dict | tupleunion. Onlylocis required; the remaining fields are populated according to the model’s output kind.- Variables:
loc – Point predictions (mean/median), shape
(B, H, T).kind – Determines which optional arrays are expected.
scale – Parametric std-dev / scale, same shape as
loc.quantiles – shape
(B, N_q, H, T).quantile_levels – Corresponding probability levels.
samples – shape
(B, N_samples, H, T).conf_level – Coverage levels for conformal methods.
- classmethod from_model_output(output)#
Build a
RawPredictionfrom a raw modelforecast()output.Supports: *
np.ndarray- point forecast. *tuple[lower, loc, upper]- conformal interval (3-element). *dictwith key"loc"- parametric / quantile / sample / point.- Parameters:
output (
ndarray|dict[str,ndarray] |tuple[ndarray,...]) – Raw model output in one of the supported formats.- Return type:
- Returns:
A typed
RawPredictioninstance.- Raises:
TypeError – If output is not a supported type.
ValueError – If a dict lacks
"loc", or a tuple does not have exactly three elements.
- kind: ForecastKind = 'point'#
ForecastResult#
- class twiga.forecaster.result.ForecastResult(timestamps, loc, targets, model_name, kind, ground_truth=None, scale=None, quantiles=None, quantile_levels=None, conf_level=None, samples=None, lower=None, upper=None, inference_time=0.0)#
Bases:
objectContainer for one model’s forecast output.
- Variables:
timestamps – shape (n_batch, n_horizon, n_targets)
loc – point predictions (mean/median), shape (n_batch, n_horizon, n_targets)
targets – ordered list of target variable names
model_name – human-readable model identifier
kind – determines which optional arrays are expected and how to convert
ground_truth – optional, same shape as loc
scale – parametric std-dev / scale, same shape as loc
quantiles – shape (n_batch, n_q, n_horizon, n_targets)
quantile_levels – corresponding probability levels (e.g. [0.1, 0.5, 0.9])
samples – shape (n_batch, n_samples, n_horizon, n_targets)
lower – lower bound, same shape as loc
upper – upper bound, same shape as loc
inference_time – inference duration in seconds
conf_level
metric_name
- evaluate(ground_truth=None, **kwargs)#
Evaluate forecast against ground truth using kind-appropriate metrics.
Forwards to
twiga.core.metrics.evaluate_forecast().- Parameters:
- Return type:
- Returns:
DataFrame of per-day, per-target metrics.
- Raises:
ValueError – if no ground truth is available.
- kind: ForecastKind#
- to_dataframe(fmt='long')#
Convert forecast to tidy DataFrame.
Always includes: timestamp, target, model, forecast. Optional: actual (when ground_truth is present).
Additional columns depend on forecast kind:
POINT: no extra columnsPARAMETRIC: scaleINTERVAL: lower, upperQUANTILE(fmt=”wide”): q_0.10, q_0.50, …QUANTILE(fmt=”long”): q_level, quantile_forecastSAMPLES: q_0.10, q_0.50, q_0.90 (empirical quantiles)
- Parameters:
fmt (
str) – “long” (default) or “wide” - only affects QUANTILE- Return type:
- Returns:
pandas DataFrame in long or wide format
- Raises:
ValueError – if fmt is invalid
ForecastCollection#
- class twiga.forecaster.result.ForecastCollection(results=<factory>)#
Bases:
objectCollection of ForecastResult objects from multiple models.
- evaluate(**kwargs)#
Evaluate all models and return a combined metrics DataFrame.
Calls
ForecastResult.evaluate()on each result and concatenates the output, adding a"Model"column derived from each result’smodel_name. Ground truth must be attached to each result (i.e.forecast()must have been called with test data that contains the target column).- Parameters:
**kwargs – Forwarded to each
ForecastResult.evaluate()call (e.g.metric_names,freq).- Return type:
- Returns:
Combined metrics DataFrame with a
"Model"column.- Raises:
ValueError – If the collection is empty or any result lacks ground truth.
- results: dict[str, ForecastResult]#
- to_dataframe(fmt='long')#
Concatenate all model forecasts into one DataFrame.
- Parameters:
fmt (
str) – passed to each ForecastResult.to_dataframe()- Return type:
- Returns:
Combined long-format DataFrame
- Raises:
ValueError – if collection is empty
See Also#
Metrics - Evaluate forecasts
Forecaster - Main API for training and prediction
Result Plotting - Visualize forecast results