Skip to content

This part of the project documentation focuses on an information-oriented approach. Use it as a reference for the technical implementation of the mlpForecaster project code.

CorrelationAnalyzer

A class to calculate and visualize the correlation between variables in a data frame.

corr staticmethod

corr(
    data,
    variable_col,
    target_col,
    method="scatter",
    ties="auto",
    hue_col=None,
    n_sample=None,
)

Calculate the correlation between the target column and other variables in the data frame.

Parameters:

  • data (DataFrame) –

    The data frame containing the data.

  • variable_col (list of str) –

    List of column names to be used as independent variables.

  • target_col (str) –

    The name of the dependent variable column.

  • method (str, default: 'scatter' ) –

    The method to use for calculating the correlation: - 'scatter' (default): Scatter plot. - 'pearson': Pearson correlation. - 'kendall': Kendall rank correlation. - 'spearman': Spearman rank correlation. - 'ppscore': Predictive Power Score (PPS). - 'xicor': Xi correlation.

  • ties (str or bool, default: 'auto' ) –

    How to handle ties in Xi correlation calculation: - 'auto' (default): Decide based on the uniqueness of y values. - True: Assume ties are present. - False: Assume no ties are present.

  • hue_col (str, default: None ) –

    The column in data to use for color grouping.

  • n_sample (int, default: None ) –

    The number of samples to use for the scatter

Returns:

  • DataFrame

    DataFrame containing the correlation between the target column and each variable.

Raises:

Source code in mlpforecast/stats/corr.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
@staticmethod
def corr(
    data,
    variable_col,
    target_col,
    method="scatter",
    ties="auto",
    hue_col=None,
    n_sample=None,
):
    """
    Calculate the correlation between the target column and other variables in the data frame.

    Args:
        data (pandas.DataFrame): The data frame containing the data.
        variable_col (list of str): List of column names to be used as independent variables.
        target_col (str): The name of the dependent variable column.
        method (str, optional): The method to use for calculating the correlation:
            - 'scatter' (default): Scatter plot.
            - 'pearson': Pearson correlation.
            - 'kendall': Kendall rank correlation.
            - 'spearman': Spearman rank correlation.
            - 'ppscore': Predictive Power Score (PPS).
            - 'xicor': Xi correlation.
        ties (str or bool, optional): How to handle ties in Xi correlation calculation:
            - 'auto' (default): Decide based on the uniqueness of y values.
            - True: Assume ties are present.
            - False: Assume no ties are present.
        hue_col (str, optional): The column in `data` to use for color grouping.
        n_sample (int, optional): The number of samples to use for the scatter

    Returns:
        (pandas.DataFrame): DataFrame containing the correlation between the target column and each variable.

    Raises:
        ValueError: If the method is not supported.
    """
    if method == "scatter":
        return scatter_plot(
            data, variable_col, target_col, hue_col, n_sample=n_sample
        )
    elif method in ["pearson", "kendall", "spearman"]:
        return CorrelationAnalyzer._get_correlation(data, variable_col, target_col)
    elif method == "ppscore":
        return CorrelationAnalyzer._get_ppscore(data, variable_col, target_col)
    elif method == "xicor":
        return CorrelationAnalyzer._get_xicor_score(
            data, variable_col, target_col, ties
        )
    else:
        raise ValueError(
            f"Unsupported method: {method}. Choose from 'pearson', 'kendall', 'spearman', 'ppscore', or 'xicor'."
        )

plot staticmethod

plot(ax, corr_df)

Plot the correlation data using a heatmap.

Parameters:

  • ax (Axes) –

    The axes on which to plot the heatmap.

  • corr_df (DataFrame) –

    DataFrame containing the correlation data with three columns: two for the pairs of items and one for the correlation values.

Source code in mlpforecast/stats/corr.py
68
69
70
71
72
73
74
75
76
77
78
@staticmethod
def plot(ax, corr_df):
    """
    Plot the correlation data using a heatmap.

    Args:
        ax (matplotlib.axes.Axes): The axes on which to plot the heatmap.
        corr_df (pandas.DataFrame): DataFrame containing the correlation data with three columns: \
                two for the pairs of items and one for the correlation values.
    """
    return plot_correlation(ax, corr_df)