rCCA and output canonical correlation value based on variate scores

Greetings,

Firstly, thanks for all the great work put into this package, which I’ve began using for rCCA. I have a question somewhat related to this thread.

My understanding is that the canonical correlation (for example the first mode) would be the Pearson correlation between the the canonical variate scores from the first mode. When I do rCCA without any regularization (both set to 0), my manual calculation matches the r-value output by the rcca function.

However, if I apply any regularization, than the r-value in the output of rCCA does not correspond to the Pearson correlation I manually calculate between the first pair of canonical variate scores. Is there something I am misunderstanding, or is there a way to perhaps do any operation on the canonical variates so I can get the manual calculation to match what r-value in the output of the rCCA function?

Thank you in advance.

Sorry, just wanted to bump this in case anyone could provide some insight.

hi @psd,

Apologies for the late answer. We are still trying to figure out the answer to your question! The developer of the rcca has left the team many moons ago, and we are trying to retrieve the paper that I think has some answers but is behind a paywall.

What we can tell you though is that the way CCA is classically solved is through a Cholesky decomposition, followed by singular value decomposition (SVD) of (roughly) the product of the sample covariance matrices (line 263 here for more details, it’s been shown in the literature too, such as here). The canonical correlation is then the eigenvalue of this SVD.

In a classic CCA, calculating the correlation between the variates will be equal to the eigenvalue of the SVD.

In the regularised CCA however, and this is what you have noticed the value will differ. If the regularisation is high (large value of lambda), the eigenvalue < cor(variates). I have this feeling that the eigenvalue (i.e the output you get from the acc function in mixOmics) would be closer to the true but we can confirm later after reading the paper.

Here is the code to show this inflation on the ‘manual’ correlation, feel free to change the value of lambda to experience this yourself.

# Load the mixOmics package
library(mixOmics)

# Load data
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene

# Perform Canonical Correlation Analysis WITH regularization
reg_rcca <- rcc(X, Y, lambda1 = 0.1, lambda2 = 0.1)


reg_scores_X <- reg_rcca$variates$X[, 1]  # Canonical variate 1 from X (regularized)
reg_scores_Y <- reg_rcca$variates$Y[, 1]  # Canonical variate 1 from Y (regularized)

# Manually calculate Pearson correlation for the first canonical variates
manual_corr_reg <- cor(reg_scores_X, reg_scores_Y)        # Likely won't match reg_rcca output

cat("Regularized canonical correlation (from rCCA):", reg_rcca$cor[1], "\n")
# Regularized canonical correlation (from rCCA): 0.8391354 
cat("Manually calculated regularized correlation:", manual_corr_reg, "\n")
# Manually calculated regularized correlation: 0.9674422 

Kim-Anh

Hi Kim-Anh,

Thank you very much for the follow up!

Do you have any suggestions than as to how I could (within the rcc framework) extract the canonical variate scores, that when correlated, would match the output canonical correlation (the eigenvalue of the SVD in regularized CCA)?

Thank you.

hi @psd,

I had a look at the original publication (details below). My understanding is that if you (re)calculate the correlation between the variates, your correlation will largely be inflated.
Instead, you should use directly the output of the canonical correlation from the rcc() method. This is what is output in the case study 2 in the paper:

From
Highlighting relatipnships between heterogeneous biological data through graphical displays based on regularised canonical correlation analysis. I. GONZÁLEZ, S. DÉJEAN, P. G. P. MARTIN, O. GONÇALVES, P. BESSE, and A. BACCINI. Journal of Biological Systems 2009 17:02, 173-199

Kim-Anh