rCCA and output canonical correlation value based on variate scores

psd · November 9, 2024, 7:11pm

Greetings,

Firstly, thanks for all the great work put into this package, which I’ve began using for rCCA. I have a question somewhat related to this thread.

My understanding is that the canonical correlation (for example the first mode) would be the Pearson correlation between the the canonical variate scores from the first mode. When I do rCCA without any regularization (both set to 0), my manual calculation matches the r-value output by the rcca function.

However, if I apply any regularization, than the r-value in the output of rCCA does not correspond to the Pearson correlation I manually calculate between the first pair of canonical variate scores. Is there something I am misunderstanding, or is there a way to perhaps do any operation on the canonical variates so I can get the manual calculation to match what r-value in the output of the rCCA function?

Thank you in advance.

psd · November 22, 2024, 2:09am

Sorry, just wanted to bump this in case anyone could provide some insight.

kimanh.lecao · December 19, 2024, 9:37pm

hi @psd,

Apologies for the late answer. We are still trying to figure out the answer to your question! The developer of the rcca has left the team many moons ago, and we are trying to retrieve the paper that I think has some answers but is behind a paywall.

What we can tell you though is that the way CCA is classically solved is through a Cholesky decomposition, followed by singular value decomposition (SVD) of (roughly) the product of the sample covariance matrices (line 263 here for more details, it’s been shown in the literature too, such as here). The canonical correlation is then the eigenvalue of this SVD.

In a classic CCA, calculating the correlation between the variates will be equal to the eigenvalue of the SVD.

In the regularised CCA however, and this is what you have noticed the value will differ. If the regularisation is high (large value of lambda), the eigenvalue < cor(variates). I have this feeling that the eigenvalue (i.e the output you get from the acc function in mixOmics) would be closer to the true but we can confirm later after reading the paper.

Here is the code to show this inflation on the ‘manual’ correlation, feel free to change the value of lambda to experience this yourself.

# Load the mixOmics package
library(mixOmics)

# Load data
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene

# Perform Canonical Correlation Analysis WITH regularization
reg_rcca <- rcc(X, Y, lambda1 = 0.1, lambda2 = 0.1)


reg_scores_X <- reg_rcca$variates$X[, 1]  # Canonical variate 1 from X (regularized)
reg_scores_Y <- reg_rcca$variates$Y[, 1]  # Canonical variate 1 from Y (regularized)

# Manually calculate Pearson correlation for the first canonical variates
manual_corr_reg <- cor(reg_scores_X, reg_scores_Y)        # Likely won't match reg_rcca output

cat("Regularized canonical correlation (from rCCA):", reg_rcca$cor[1], "\n")
# Regularized canonical correlation (from rCCA): 0.8391354 
cat("Manually calculated regularized correlation:", manual_corr_reg, "\n")
# Manually calculated regularized correlation: 0.9674422

Kim-Anh

psd · January 2, 2025, 8:40pm

Hi Kim-Anh,

Thank you very much for the follow up!

Do you have any suggestions than as to how I could (within the rcc framework) extract the canonical variate scores, that when correlated, would match the output canonical correlation (the eigenvalue of the SVD in regularized CCA)?

Thank you.

kimanh.lecao · January 2, 2025, 9:13pm

hi @psd,

I had a look at the original publication (details below). My understanding is that if you (re)calculate the correlation between the variates, your correlation will largely be inflated.
Instead, you should use directly the output of the canonical correlation from the rcc() method. This is what is output in the case study 2 in the paper:

From
Highlighting relatipnships between heterogeneous biological data through graphical displays based on regularised canonical correlation analysis. I. GONZÁLEZ, S. DÉJEAN, P. G. P. MARTIN, O. GONÇALVES, P. BESSE, and A. BACCINI. Journal of Biological Systems 2009 17:02, 173-199

Kim-Anh

psd · March 5, 2025, 8:55pm

Hi Kim-Anh,

I had a follow up. I’m interested in doing some additional analyses on the canonical variate scores from the output of rcc (e.g., clustering using these values in my samples). Would using the canonical variate scores as directly outputted by rcc be appropriate (are they the correct values reflecting the regularized CCA relationship)? I ask because if the manual correlation between the first pair of canonical variate scores is not the actual correlation produced by rcc (following regularization as you noted), are these values in some way different than the actual canonical variate scores which underly the correct rcc correlation? Hope this question makes sense.

Thank you in advance,
Paul

kimanh.lecao · March 6, 2025, 9:39pm

hi @psd

Yes I believe you can do any follow up analysis on the rcc components. Those are the ones you visualise on the plot.

Kim-Anh

Topic		Replies	Views
Canonical correlation output vs calculation	4	606	September 21, 2020
Questions regarding cross-validation in rCCA lambda selection	3	223	July 13, 2023
How to compute rCCA correlation significance Analysis	3	1301	April 15, 2020
Question concerning rCCA analysis	5	1250	July 2, 2020
Questions on rCCA Functionality in mixOmics Support	1	45	December 11, 2024

rCCA and output canonical correlation value based on variate scores

Related topics