This part of the project documentation focuses on an information-oriented approach. Use it as a reference for the technical implementation of the mlpForecaster project code.

MLPBlock

MLPBlock(
    in_size=1,
    latent_dim=32,
    features_start=16,
    expansion_factor=1,
    residual=False,
    num_layers=4,
    context_size=96,
    activation=nn.ReLU(),
    bn=True,
)

Bases: Module

Multi-Layer Perceptron (MLP) block with configurable layers and options.

Attributes:

mlp_network (ModuleList) –

List of layers in the MLP network.
in_size (int) –

Size of the input after flattening.
context_size (int) –

Size of the context.
residual (bool) –

If True, adds residual connections.

Parameters:

in_size (int, default: 1 ) –

Size of the input. Defaults to 1.
latent_dim (int, default: 32 ) –

Dimensionality of the latent space. Defaults to 32.
features_start (int, default: 16 ) –

Number of features in the initial layer. Defaults to 16.
num_layers (int, default: 4 ) –

Number of layers in the MLP. Defaults to 4.
context_size (int, default: 96 ) –

Size of the context. Defaults to 96.
activation (Module, default: ReLU() ) –

Activation function. Defaults to ReLU().
bn (bool, default: True ) –

If True, adds batch normalization. Defaults to True.

Source code in mlpforecast/net/layers.py

def __init__(
    self,
    in_size=1,
    latent_dim=32,
    features_start=16,
    expansion_factor=1,
    residual=False,
    num_layers=4,
    context_size=96,
    activation=nn.ReLU(),
    bn=True,
):
    """
    Multi-Layer Perceptron (MLP) block with configurable layers and options.

    Parameters:
        in_size (int, optional): Size of the input. Defaults to 1.
        latent_dim (int, optional): Dimensionality of the latent space. Defaults to 32.
        features_start (int, optional): Number of features in the initial layer. Defaults to 16.
        num_layers (int, optional): Number of layers in the MLP. Defaults to 4.
        context_size (int, optional): Size of the context. Defaults to 96.
        activation (torch.nn.Module, optional): Activation function. Defaults to ReLU().
        bn (bool, optional): If True, adds batch normalization. Defaults to True.
    """
    super().__init__()

    # Calculate the size of the input after flattening
    self.in_size = in_size * context_size
    self.context_size = context_size
    self.residual = residual
    if residual:
        expansion_factor = 1

    # Initialize a list to store the layers of the MLP
    layers = [
        nn.Sequential(
            create_linear(self.in_size, features_start, bn=False),
            activation,
        )
    ]
    feats = features_start

    # Create the specified number of layers in the MLP
    for i in range(num_layers - 1):
        layers.append(
            nn.Sequential(
                create_linear(feats, feats * expansion_factor, bn=bn), activation
            )
        )
        feats = feats * expansion_factor

    # Add the final layer with latent_dim and activation, without batch normalization
    layers.append(
        nn.Sequential(create_linear(feats, latent_dim, bn=False), activation)
    )

    # Create a ModuleList to store the layers
    self.mlp_network = nn.ModuleList(layers)

forward

forward(x)

Forward pass of the MLP block.

Parameters:

x (Tensor) –

Input tensor.

Returns:

Tensor –

Output tensor after passing through the MLP block.

Source code in mlpforecast/net/layers.py

def forward(self, x):
    """
    Forward pass of the MLP block.

    Parameters:
        x (torch.Tensor): Input tensor.

    Returns:
        (torch.Tensor): Output tensor after passing through the MLP block.
    """
    # Flatten the input along dimensions 1 and 2
    if x.ndim == 3:
        x = x.flatten(1, 2)

    # Pass the input through each layer in the MLP
    x = self.mlp_network[0](x)
    for i in range(1, len(self.mlp_network) - 1):
        if self.residual:
            x += self.mlp_network[i](x)
        else:
            x = self.mlp_network[i](x)

    x = self.mlp_network[-1](x)
    return x

MLPForecastNetwork

MLPForecastNetwork(
    n_target_series: int,
    n_unknown_features: int,
    n_known_calendar_features: int,
    n_known_continuous_features: int,
    embedding_size: int = 28,
    embedding_type: str = None,
    combination_type: str = "attn-comb",
    expansion_factor: int = 2,
    residual: bool = False,
    hidden_size: int = 256,
    num_layers: int = 2,
    forecast_horizon: int = 48,
    input_window_size: int = 96,
    activation_function: str = "SiLU",
    out_activation_function: str = "Identity",
    dropout_rate: float = 0.25,
    alpha: float = 0.1,
    num_attention_heads: int = 4,
)

Bases: Module

Multilayer Perceptron (MLP) Forecast Network for time series forecasting.

Attributes:

n_out (int) –

Number of target series.
n_unknown (int) –

Number of unknown time-varying features.
n_covariates (int) –

Number of known time-varying features.
n_channels (int) –

Number of channels in the input.
input_window_size (int) –

Size of the input window.
forecast_horizon (int) –

Number of future time steps to forecast.
out_activation (Module) –

Output activation function.
activation (Module) –

Activation function.
encoder (PastFutureEncoder) –

Encoder module.
horizon (PastFutureEncoder) –

Horizon encoder module.
combination_type (str) –

Type of combination to use.
alpha (float) –

Alpha parameter for the loss.
attention (MultiheadAttention) –

Multi-head attention module.
gate (Linear) –

Linear layer for weighted combination.
decoder (Sequential) –

Decoder module.
mu (Linear) –

Linear layer for output.

Parameters:

n_target_series (int) –

Number of target series.
n_unknown_features (int) –

Number of unknown time-varying features.
n_known_calendar_features (int) –

Number of known categorical time-varying features.
n_known_continuous_features (int) –

Number of known continuous time-varying features.
embedding_size (int, default: 28 ) –

Dimensionality of the embedding space. Defaults to 28.
embedding_type (str, default: None ) –

Type of embedding to use. Defaults to None. Options: 'PosEmb', 'RotaryEmb', 'CombinedEmb'.
combination_type (str, default: 'attn-comb' ) –

Type of combination to use.Defaults to 'attn-comb'. Options: 'attn-comb', 'weighted-comb', 'addition-comb'.
expansion_factor (int, default: 2 ) –

Expansion factor for the encoder. Defaults to 2.
residual (bool, default: False ) –

Whether to use residual connections in the encoder. Defaults to False.
hidden_size (int, default: 256 ) –

Dimensionality of the hidden layers. Defaults to 256.
num_layers (int, default: 2 ) –

Number of layers in the MLP. Defaults to 2.
forecast_horizon (int, default: 48 ) –

Number of future time steps to forecast. Defaults to 48.
input_window_size (int, default: 96 ) –

Size of the input window. Defaults to 96.
activation_function (str, default: 'SiLU' ) –

Activation function. Defaults to 'SiLU'.
out_activation_function (str, default: 'Identity' ) –

Output activation function. Defaults to 'Identity'.
dropout_rate (float, default: 0.25 ) –

Dropout probability. Defaults to 0.25.
alpha (float, default: 0.1 ) –

Alpha parameter for the loss. Defaults to 0.1.
num_attention_heads (int, default: 4 ) –

Number of heads in the multi-head attention. Defaults to 4.

Source code in mlpforecast/net/layers.py

def __init__(
    self,
    n_target_series: int,
    n_unknown_features: int,
    n_known_calendar_features: int,
    n_known_continuous_features: int,
    embedding_size: int = 28,
    embedding_type: str = None,
    combination_type: str = "attn-comb",
    expansion_factor: int = 2,
    residual: bool = False,
    hidden_size: int = 256,
    num_layers: int = 2,
    forecast_horizon: int = 48,
    input_window_size: int = 96,
    activation_function: str = "SiLU",
    out_activation_function: str = "Identity",
    dropout_rate: float = 0.25,
    alpha: float = 0.1,
    num_attention_heads: int = 4,
):
    """
    Multilayer Perceptron (MLP) Forecast Network for time series forecasting.

    Args:
        n_target_series (int): Number of target series.
        n_unknown_features (int): Number of unknown time-varying features.
        n_known_calendar_features (int): Number of known categorical time-varying features.
        n_known_continuous_features (int): Number of known continuous time-varying features.
        embedding_size (int, optional): Dimensionality of the embedding space. Defaults to 28.
        embedding_type (str, optional): Type of embedding to use. Defaults to None. Options: 'PosEmb', 'RotaryEmb', 'CombinedEmb'.
        combination_type (str, optional): Type of combination to use.Defaults to 'attn-comb'. Options: 'attn-comb', 'weighted-comb', 'addition-comb'.
        expansion_factor (int, optional): Expansion factor for the encoder. Defaults to 2.
        residual (bool, optional): Whether to use residual connections in the encoder. Defaults to False.
        hidden_size (int, optional): Dimensionality of the hidden layers. Defaults to 256.
        num_layers (int, optional): Number of layers in the MLP. Defaults to 2.
        forecast_horizon (int, optional): Number of future time steps to forecast. Defaults to 48.
        input_window_size (int, optional): Size of the input window. Defaults to 96.
        activation_function (str, optional): Activation function. Defaults to 'SiLU'.
        out_activation_function (str, optional): Output activation function. Defaults to 'Identity'.
        dropout_rate (float, optional): Dropout probability. Defaults to 0.25.
        alpha (float, optional): Alpha parameter for the loss. Defaults to 0.1.
        num_attention_heads (int, optional): Number of heads in the multi-head attention. Defaults to 4.
    """
    super().__init__()

    # Ensure valid activation and embedding types
    assert (
        activation_function in ACTIVATIONS
    ), f"Invalid activation_function. Please select from: {ACTIVATIONS}"
    assert (
        out_activation_function in ACTIVATIONS
    ), f"Invalid out_activation_function. Please select from: {ACTIVATIONS}"
    assert (
        embedding_type
        in [
            None,
            "PosEmb",
            "RotaryEmb",
            "CombinedEmb",
        ]
    ), "Invalid embedding type, choose from: None, 'PosEmb', 'RotaryEmb', 'CombinedEmb'"

    self.n_out = n_target_series
    self.n_unknown = n_unknown_features + self.n_out
    self.n_covariates = n_known_calendar_features + n_known_continuous_features
    self.n_channels = self.n_unknown + self.n_covariates
    self.input_window_size = input_window_size
    self.forecast_horizon = forecast_horizon
    self.out_activation = getattr(nn, out_activation_function)()
    self.activation = getattr(nn, activation_function)()

    self.encoder = PastFutureEncoder(
        embedding_size=embedding_size,
        embedding_type=embedding_type,
        latent_size=hidden_size,
        num_layers=num_layers,
        residual=residual,
        expansion_factor=expansion_factor,
        context_size=input_window_size,
        activation=self.activation,
        dropout_rate=dropout_rate,
        n_channels=self.n_channels,
    )

    if self.n_covariates > 0:
        self.horizon = PastFutureEncoder(
            embedding_size=embedding_size,
            embedding_type=embedding_type,
            latent_size=hidden_size,
            num_layers=num_layers,
            residual=residual,
            expansion_factor=expansion_factor,
            context_size=forecast_horizon,
            activation=self.activation,
            dropout_rate=dropout_rate,
            n_channels=self.n_covariates,
        )

    self.combination_type = combination_type
    self.alpha = alpha

    assert (
        combination_type
        in [
            "attn-comb",
            "weighted-comb",
            "addition-comb",
        ]
    ), "Invalid combination type, choose from: 'attn-comb', 'weighted-comb', 'addition-comb'"

    if combination_type == "attn-comb":
        self.attention = nn.MultiheadAttention(
            hidden_size, num_attention_heads, dropout=dropout_rate
        )

    if combination_type == "weighted-comb":
        self.gate = nn.Linear(2 * hidden_size, hidden_size)

    self.decoder = nn.Sequential(
        FeedForward(
            hidden_size,
            expansion_factor=1,
            dropout=dropout_rate,
            activation=self.activation,
            bn=True,
        )
    )

    self.mu = nn.Linear(hidden_size, self.n_out * forecast_horizon)

forecast

forecast(x: Tensor) -> dict

Generates forecasts for the input sequences.

Parameters:

x (Tensor) –

Input tensor.

Returns:

dict ( dict ) –

Dictionary containing the forecast predictions.

Source code in mlpforecast/net/layers.py

def forecast(self, x: torch.Tensor) -> dict:
    """
    Generates forecasts for the input sequences.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        dict: Dictionary containing the forecast predictions.
    """
    with torch.no_grad():
        pred = self(x)

    return {"pred": pred}

forward

forward(x: Tensor) -> Tensor

Forward pass of the MLPForecastNetwork.

Parameters:

x (Tensor) –

Input tensor.

Returns:

Tensor –

torch.Tensor: Output tensor after processing through the network.

Source code in mlpforecast/net/layers.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """
    Forward pass of the MLPForecastNetwork.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        torch.Tensor: Output tensor after processing through the network.
    """
    f = self.encoder(x[:, : self.input_window_size, :])

    if self.n_covariates > 0:
        h = self.horizon(x[:, self.input_window_size :, self.n_unknown :])
        if self.combination_type == "attn-comb":
            ph_hf = self.attention(h.unsqueeze(0), f.unsqueeze(0), f.unsqueeze(0))[
                0
            ].squeeze(0)
        elif self.combination_type == "weighted-comb":
            gate = self.gate(torch.cat((h, f), -1)).sigmoid()
            ph_hf = (1 - gate) * f + gate * h
        else:
            ph_hf = h + f
    else:
        ph_hf = f

    z = self.decoder(ph_hf)
    loc = self.out_activation(
        self.mu(z).reshape(z.size(0), self.forecast_horizon, self.n_out)
    )

    return loc

step

step(batch: tuple, metric_fn: callable) -> tuple

Training step for the MLPForecastNetwork.

Parameters:

batch (tuple) –

Tuple containing input and target tensors.
metric_fn (callable) –

Metric function to evaluate.

Returns:

tuple ( tuple ) –

Tuple containing the loss and computed metric.

Source code in mlpforecast/net/layers.py

def step(self, batch: tuple, metric_fn: callable) -> tuple:
    """
    Training step for the MLPForecastNetwork.

    Args:
        batch (tuple): Tuple containing input and target tensors.
        metric_fn (callable): Metric function to evaluate.

    Returns:
        tuple: Tuple containing the loss and computed metric.
    """
    x, y = batch

    y_pred = self(x)

    loss = (
        self.alpha * F.mse_loss(y_pred, y, reduction="none").sum(dim=(1, 2)).mean()
        + (1 - self.alpha)
        * F.l1_loss(y_pred, y, reduction="none").sum(dim=(1, 2)).mean()
    )

    metric = metric_fn(y_pred, y)

    return loss, metric

PastFutureEncoder

PastFutureEncoder(
    embedding_size: int = 28,
    embedding_type: str = None,
    latent_size: int = 64,
    num_layers: int = 2,
    residual: bool = False,
    expansion_factor: int = 2,
    context_size: int = 96,
    activation: Module = nn.ReLU(),
    dropout_rate: float = 0.25,
    n_channels: int = 1,
)

Bases: Module

Encoder module for the PastFutureNetwork.

Attributes:

encoder (MLPBlock) –

MLP block for the encoder.
norm (LayerNorm) –

Layer normalization.
dropout (Dropout) –

Dropout layer.
embedding (Module) –

Embedding layer.
embedding_type (str) –

Type of embedding to use.
rotary_embedding (RotaryEmbedding) –

Rotary positional embedding.
pos_embedding (PosEmbedding) –

Positional embedding.

Parameters:

embedding_size (int, default: 28 ) –

Dimensionality of the embedding space. Defaults to 28.
embedding_type (str, default: None ) –

Type of embedding to use. Defaults to None.
latent_size (int, default: 64 ) –

Dimensionality of the latent space. Defaults to 64.
num_layers (int, default: 2 ) –

Number of layers in the encoder. Defaults to 2.
residual (bool, default: False ) –

Whether to use residual connections in the encoder. Defaults to False.
expansion_factor (int, default: 2 ) –

Expansion factor for the encoder. Defaults to 2.
context_size (int, default: 96 ) –

Size of the context. Defaults to 96.
activation (Module, default: ReLU() ) –

Activation function. Defaults to nn.ReLU().
dropout_rate (float, default: 0.25 ) –

Dropout probability. Defaults to 0.25.
n_channels (int, default: 1 ) –

Number of channels in the input. Defaults

Source code in mlpforecast/net/layers.py

def __init__(
    self,
    embedding_size: int = 28,
    embedding_type: str = None,
    latent_size: int = 64,
    num_layers: int = 2,
    residual: bool = False,
    expansion_factor: int = 2,
    context_size: int = 96,
    activation: nn.Module = nn.ReLU(),
    dropout_rate: float = 0.25,
    n_channels: int = 1,
):
    """
    Initializes the PastFutureEncoder module.

    Args:
        embedding_size (int, optional): Dimensionality of the embedding space. Defaults to 28.
        embedding_type (str, optional): Type of embedding to use. Defaults to None.
        latent_size (int, optional): Dimensionality of the latent space. Defaults to 64.
        num_layers (int, optional): Number of layers in the encoder. Defaults to 2.
        residual (bool, optional): Whether to use residual connections in the encoder. Defaults to False.
        expansion_factor (int, optional): Expansion factor for the encoder. Defaults to 2.
        context_size (int, optional): Size of the context. Defaults to 96.
        activation (nn.Module, optional): Activation function. Defaults to nn.ReLU().
        dropout_rate (float, optional): Dropout probability. Defaults to 0.25.
        n_channels (int, optional): Number of channels in the input. Defaults
    """
    super().__init__()

    self.encoder = MLPBlock(
        in_size=embedding_size if embedding_type is not None else n_channels,
        latent_dim=latent_size,
        features_start=latent_size,
        expansion_factor=expansion_factor,
        residual=residual,
        num_layers=num_layers,
        context_size=context_size,
        activation=activation,
    )

    # Normalize the input using LayerNorm
    self.norm = nn.LayerNorm(n_channels)

    # Apply dropout to the input
    self.dropout = nn.Dropout(dropout_rate)

    # Store hyperparameters
    self.embedding_type = embedding_type

    # Embedding based on the specified type
    if embedding_type == "PosEmb":
        self.embedding = PosEmbedding(
            n_channels, embedding_size, window_size=context_size
        )
    elif embedding_type == "RotaryEmb":
        self.embedding = RotaryEmbedding(embedding_size)
    elif embedding_type == "CombinedEmb":
        self.pos_embedding = PosEmbedding(
            n_channels, embedding_size, window_size=context_size
        )
        self.rotary_embedding = RotaryEmbedding(embedding_size)

forward

forward(x: Tensor) -> Tensor

Forward pass of the PastFutureEncoder module.

Parameters:

x (Tensor) –

Input tensor.

Returns:

Tensor –

torch.Tensor: Output tensor after processing through the encoder.

Source code in mlpforecast/net/layers.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """
    Forward pass of the PastFutureEncoder module.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        torch.Tensor: Output tensor after processing through the encoder.
    """
    # Normalize the input
    x = self.norm(x)

    # Apply embedding based on the specified type
    if self.embedding_type == "CombinedEmb":
        x = self.pos_embedding(x) + self.rotary_embedding(x)
        # Apply dropout to the embedded input
        x = self.dropout(x)
    elif self.embedding_type in ["PosEmb", "RotaryEmb"]:
        x = self.embedding(x)
        # Apply dropout to the embedded input
        x = self.dropout(x)

    # Pass the input through the encoder
    x = self.encoder(x)

    return x

FeedForward

FeedForward(
    dim,
    expansion_factor=2,
    dropout=0.0,
    activation=nn.GELU(),
    bn=True,
)

Creates a feedforward block composed of linear layers, activation function, and dropout.

Parameters:

dim (int) –

Dimensionality of the input.
expansion_factor (int, default: 2 ) –

Expansion factor for the intermediate hidden layer. Defaults to 2.
dropout (float, default: 0.0 ) –

Dropout probability. Defaults to 0.0 (no dropout).
activation (Module, default: GELU() ) –

Activation function. Defaults to GELU().
bn (bool, default: True ) –

If True, adds batch normalization. Defaults to True.

Returns:

Sequential –

Feedforward block.

Source code in mlpforecast/net/layers.py

def FeedForward(dim, expansion_factor=2, dropout=0.0, activation=nn.GELU(), bn=True):
    """
    Creates a feedforward block composed of linear layers, activation function, and dropout.

    Args:
        dim (int): Dimensionality of the input.
        expansion_factor (int, optional): Expansion factor for the intermediate hidden layer. Defaults to 2.
        dropout (float, optional): Dropout probability. Defaults to 0.0 (no dropout).
        activation (torch.nn.Module, optional): Activation function. Defaults to GELU().
        bn (bool, optional): If True, adds batch normalization. Defaults to True.

    Returns:
        (nn.Sequential): Feedforward block.
    """
    # Create a sequential block with linear layer, activation, and dropout
    block = nn.Sequential(
        create_linear(dim, dim * expansion_factor, bn),
        activation,
        nn.Dropout(dropout),
        create_linear(dim * expansion_factor, dim, bn),
        nn.Dropout(dropout),
    )

    return block

create_linear

create_linear(in_channels, out_channels, bn=False)

Creates a linear layer with optional batch normalization.

Parameters:

in_channels (int) –

Number of input channels.
out_channels (int) –

Number of output channels.
bn (bool, default: False ) –

If True, adds batch normalization. Defaults to False.

Returns:

Module –

Linear layer with optional batch normalization.

Source code in mlpforecast/net/layers.py

def create_linear(in_channels, out_channels, bn=False):
    """
    Creates a linear layer with optional batch normalization.

    Args:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        bn (bool, optional): If True, adds batch normalization. Defaults to False.

    Returns:
        (nn.Module): Linear layer with optional batch normalization.
    """
    # Create a linear layer
    m = nn.Linear(in_channels, out_channels)

    # Initialize the weights using Kaiming normal initialization with a ReLU nonlinearity
    nn.init.kaiming_normal_(m.weight, nonlinearity="relu")

    # Initialize the bias to zero if present
    if m.bias is not None:
        torch.nn.init.constant_(m.bias, 0)

    # Add batch normalization if requested
    if bn:
        # Create a batch normalization layer
        bn_layer = nn.BatchNorm1d(out_channels)

        # Combine the linear layer and batch normalization into a sequential module
        m = nn.Sequential(m, bn_layer)

    return m

PosEmbedding

PosEmbedding(n_channels, d_model, window_size)

Bases: Module

Positional Embedding module that combines convolutional and sinusoidal embeddings.

Attributes:

emb (Conv1DLayer) –

Convolutional positional embedding module.
d_model (int) –

Dimension of the model.

Parameters:

n_channels (int) –

Number of input channels.
d_model (int) –

Dimension of the model.
window_size (int) –

Size of the input window

Source code in mlpforecast/net/embending.py

def __init__(self, n_channels, d_model, window_size):
    """
    Initializes the PosEmbedding module.

    Args:
        n_channels (int): Number of input channels.
        d_model (int): Dimension of the model.
        window_size (int): Size of the input window
    """
    super().__init__()
    # Convolutional embedding layer
    self.emb = Conv1DLayer(n_channels, d_model)

    # Sinusoidal positional embedding
    self.register_buffer("positional_embedding", sinusoids(window_size, d_model))

    # Dimension of the model
    self.d_model = d_model

forward

forward(x)

Forward pass of the PosEmbedding module.

Parameters:

x (Tensor) –

Input tensor.

Returns:

Tensor –

Output tensor after applying positional embedding.

Source code in mlpforecast/net/embending.py

def forward(self, x):
    """
    Forward pass of the PosEmbedding module.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        (torch.Tensor): Output tensor after applying positional embedding.
    """
    # Apply convolutional embedding, ReLU activation, and scale by sqrt(d_model)
    x = F.relu(self.emb(x.permute(0, 2, 1)).permute(0, 2, 1)) * math.sqrt(
        self.d_model
    )

    # Add positional embedding
    x = (x + self.positional_embedding).to(x.dtype)
    return x

Rotary

Rotary(dim, base=10000)

Bases: Module

Rotary positional embedding module.

Attributes:

seq_len_cached (int) –

Cached sequence length.
cos_cached (Tensor) –

Cached cosine values.
sin_cached (Tensor) –

Cached sine values.

Parameters:

dim (int) –

Dimension of the input embeddings.
base (int, default: 10000 ) –

Base value for frequency calculation. Defaults to 10000.

Source code in mlpforecast/net/embending.py

def __init__(self, dim, base=10000):
    """
    Initializes the Rotary positional embedding module.

    Args:
        dim (int): Dimension of the input embeddings.
        base (int, optional): Base value for frequency calculation. Defaults to 10000.
    """
    super().__init__()
    inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float() / dim))
    self.register_buffer("inv_freq", inv_freq)
    self.seq_len_cached = None
    self.cos_cached = None
    self.sin_cached = None

forward

forward(inputs, seq_dim=1)

Forward pass of the rotary positional embedding module.

Parameters:

inputs (Tensor) –

Input tensor.
seq_dim (int, default: 1 ) –

Dimension representing the sequence length. Defaults to 1.

Returns:

Tensor –

Rotary positional embeddings.

Source code in mlpforecast/net/embending.py

def forward(self, inputs, seq_dim=1):
    """
    Forward pass of the rotary positional embedding module.

    Args:
        inputs (torch.Tensor): Input tensor.
        seq_dim (int, optional): Dimension representing the sequence length. Defaults to 1.

    Returns:
        (torch.Tensor): Rotary positional embeddings.
    """
    x = inputs.unsqueeze(2)
    seq_len = x.shape[seq_dim]
    if seq_len != self.seq_len_cached:
        self.seq_len_cached = seq_len
        t = torch.arange(x.shape[seq_dim], device=x.device).type_as(self.inv_freq)
        freqs = torch.einsum("i,j->ij", t, self.inv_freq)
        emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
        self.cos_cached = emb.cos()[:, None, None, :]
        self.sin_cached = emb.sin()[:, None, None, :]

    cos_half = self.cos_cached.squeeze(2).permute(1, 0, 2) * x.squeeze(2).mean(
        -1
    ).unsqueeze(2)
    sin_half = self.sin_cached.squeeze(2).permute(1, 0, 2) * rotate_half(x).squeeze(
        2
    ).mean(-1).unsqueeze(2)
    return cos_half + sin_half

RotaryEmbedding

RotaryEmbedding(d_model)

Bases: Module

Rotary Embedding module.

Attributes:

emb (Rotary) –

Rotary positional embedding module.

Parameters:

d_model (int) –

Dimension of the model.

Source code in mlpforecast/net/embending.py

def __init__(self, d_model):
    """
    Initializes the RotaryEmbedding module.

    Args:   
        d_model (int): Dimension of the model.
    """
    super().__init__()
    # Rotary embedding layer
    self.emb = Rotary(d_model)

forward

forward(x)

Forward pass of the RotaryEmbedding module.

Parameters:

x (Tensor) –

Input tensor.

Returns:

Tensor –

Output tensor after applying rotary embedding.

Source code in mlpforecast/net/embending.py

def forward(self, x):
    """
    Forward pass of the RotaryEmbedding module.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        (torch.Tensor): Output tensor after applying rotary embedding.
    """
    x = self.emb(x)
    return x

Conv1DLayer

Conv1DLayer(in_channels, out_channels, bias=True)

Creates a 1D convolutional layer with specified input and output channels.

Parameters:

in_channels (int) –

Number of input channels.
out_channels (int) –

Number of output channels.
bias (bool, default: True ) –

If True, adds a learnable bias to the output. Default is True.

Returns:

Module –

1D convolutional layer.

Source code in mlpforecast/net/embending.py

def Conv1DLayer(in_channels, out_channels, bias=True):
    """
    Creates a 1D convolutional layer with specified input and output channels.

    Args:
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        bias (bool, optional): If True, adds a learnable bias to the output. Default is True.

    Returns:
        (nn.Module): 1D convolutional layer.
    """
    # Create a 1D convolutional layer with specified parameters
    m = nn.Conv1d(in_channels, out_channels, kernel_size=3, padding=1, bias=bias)

    # Initialize weights using Kaiming normal initialization
    nn.init.kaiming_normal_(m.weight, nonlinearity="relu")

    # If bias is present, initialize it with zeros
    if m.bias is not None:
        m.bias.data.fill_(0.00)

    return m

rotate_half

rotate_half(x)

Rotate the input tensor along the last dimension by half.

Parameters:

x (Tensor) –

Input tensor.

Returns:

Tensor –

Rotated tensor.

Source code in mlpforecast/net/embending.py

def rotate_half(x):
    """
    Rotate the input tensor along the last dimension by half.

    Args:
        x (torch.Tensor): Input tensor.

    Returns:
        (torch.Tensor): Rotated tensor.
    """
    x1, x2 = x[..., : x.shape[-1] // 2], x[..., x.shape[-1] // 2 :]
    return torch.cat(
        (-x2, x1), dim=x1.ndim - 1
    )  # dim=-1 triggers a bug in torch < 1.8.0

sinusoids

sinusoids(length, channels, max_timescale=10000)

Returns sinusoids for positional embedding.

Parameters:

length (int) –

Length of the sequence.
channels (int) –

Number of channels in the positional embeddings. It should be an even number.
max_timescale (int, default: 10000 ) –

Maximum timescale for the sinusoids. Defaults to 10000.

Returns:

Tensor –

Sinusoidal positional embeddings.

Source code in mlpforecast/net/embending.py

def sinusoids(length, channels, max_timescale=10000):
    """
    Returns sinusoids for positional embedding.

    Args:
        length (int): Length of the sequence.
        channels (int): Number of channels in the positional embeddings. It should be an even number.
        max_timescale (int, optional): Maximum timescale for the sinusoids. Defaults to 10000.

    Returns:
        (torch.Tensor): Sinusoidal positional embeddings.
    """
    assert channels % 2 == 0
    log_timescale_increment = np.log(max_timescale) / (channels // 2 - 1)
    inv_timescales = torch.exp(-log_timescale_increment * torch.arange(channels // 2))
    scaled_time = torch.arange(length)[:, np.newaxis] * inv_timescales[np.newaxis, :]
    return torch.cat([torch.sin(scaled_time), torch.cos(scaled_time)], dim=1)