This part of the project documentation focuses on an information-oriented approach. Use it as a reference for the technical implementation of the mlpForecaster project code.

CorrelationAnalyzer

A class to calculate and visualize the correlation between variables in a data frame.

corr `staticmethod`

corr(
    data,
    variable_col,
    target_col,
    method="scatter",
    ties="auto",
    hue_col=None,
    n_sample=None,
)

Calculate the correlation between the target column and other variables in the data frame.

Parameters:

data (DataFrame) –

The data frame containing the data.
variable_col (list of str) –

List of column names to be used as independent variables.
target_col (str) –

The name of the dependent variable column.
method (str, default: 'scatter' ) –

The method to use for calculating the correlation: - 'scatter' (default): Scatter plot. - 'pearson': Pearson correlation. - 'kendall': Kendall rank correlation. - 'spearman': Spearman rank correlation. - 'ppscore': Predictive Power Score (PPS). - 'xicor': Xi correlation.
ties (str or bool, default: 'auto' ) –

How to handle ties in Xi correlation calculation: - 'auto' (default): Decide based on the uniqueness of y values. - True: Assume ties are present. - False: Assume no ties are present.
hue_col (str, default: None ) –

The column in data to use for color grouping.
n_sample (int, default: None ) –

The number of samples to use for the scatter

Returns:

DataFrame –

DataFrame containing the correlation between the target column and each variable.

Raises:

ValueError –

If the method is not supported.

Source code in mlpforecast/stats/corr.py

@staticmethod
def corr(
    data,
    variable_col,
    target_col,
    method="scatter",
    ties="auto",
    hue_col=None,
    n_sample=None,
):
    """
    Calculate the correlation between the target column and other variables in the data frame.

    Args:
        data (pandas.DataFrame): The data frame containing the data.
        variable_col (list of str): List of column names to be used as independent variables.
        target_col (str): The name of the dependent variable column.
        method (str, optional): The method to use for calculating the correlation:
            - 'scatter' (default): Scatter plot.
            - 'pearson': Pearson correlation.
            - 'kendall': Kendall rank correlation.
            - 'spearman': Spearman rank correlation.
            - 'ppscore': Predictive Power Score (PPS).
            - 'xicor': Xi correlation.
        ties (str or bool, optional): How to handle ties in Xi correlation calculation:
            - 'auto' (default): Decide based on the uniqueness of y values.
            - True: Assume ties are present.
            - False: Assume no ties are present.
        hue_col (str, optional): The column in `data` to use for color grouping.
        n_sample (int, optional): The number of samples to use for the scatter

    Returns:
        (pandas.DataFrame): DataFrame containing the correlation between the target column and each variable.

    Raises:
        ValueError: If the method is not supported.
    """
    if method == "scatter":
        return scatter_plot(
            data, variable_col, target_col, hue_col, n_sample=n_sample
        )
    elif method in ["pearson", "kendall", "spearman"]:
        return CorrelationAnalyzer._get_correlation(data, variable_col, target_col)
    elif method == "ppscore":
        return CorrelationAnalyzer._get_ppscore(data, variable_col, target_col)
    elif method == "xicor":
        return CorrelationAnalyzer._get_xicor_score(
            data, variable_col, target_col, ties
        )
    else:
        raise ValueError(
            f"Unsupported method: {method}. Choose from 'pearson', 'kendall', 'spearman', 'ppscore', or 'xicor'."
        )

plot `staticmethod`

plot(ax, corr_df)

Plot the correlation data using a heatmap.

Parameters:

ax (Axes) –

The axes on which to plot the heatmap.
corr_df (DataFrame) –

DataFrame containing the correlation data with three columns: two for the pairs of items and one for the correlation values.

Source code in mlpforecast/stats/corr.py

@staticmethod
def plot(ax, corr_df):
    """
    Plot the correlation data using a heatmap.

    Args:
        ax (matplotlib.axes.Axes): The axes on which to plot the heatmap.
        corr_df (pandas.DataFrame): DataFrame containing the correlation data with three columns: \
                two for the pairs of items and one for the correlation values.
    """
    return plot_correlation(ax, corr_df)

CorrelationAnalyzer

corr staticmethod

plot staticmethod

corr `staticmethod`

plot `staticmethod`