Questions regarding cross-validation in rCCA lambda selection

neurothew · June 30, 2023, 4:38pm

Hello,

I found this package while I am searching for rCCA implementation in R. I am trying to use tune.rcc.R to find the optimal lambda that maximize the output CV score.

As I read into the source code in tune.rcc.R, although I understand most part of the Mfold function inside tune.rcc.R, I have a question regarding the second last line:

cv.score = cor(xscore, yscore, use = "pairwise")

As far as I understand, the whole leave-one-out CV procedures is largely based on the definition in Leurgans et al. (1993). Here are some original wordings from P.729 of the article:

The cross-validation score of a is then defined to be the squared correlation of the n pairs of numbers … We then choose the value of a that maximizes this correlation.

My question is, should the cv.score be the squared correlation, or just correlation? Sorry in advance if I misunderstood the 1993 article since my mathematical background is not very good.

Appreciate any response, thanks!

References
Leurgans, S. E., Moyeed, R. A., & Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society, 55(3), 725–740.

kimanh.lecao · July 6, 2023, 11:31pm

hi @neurothew,

You want the correlation to be positive, which is what the CCA will try to do anyway, so I think it makes no difference.
If the tuning is too cumbersome, you can have a look at the argument method = 'shrinkage' directly in rcc(). It will not calculate the best lambda per component but across all components.

Kim-Anh

neurothew · July 8, 2023, 6:08am

Hi @kimanh.lecao, thanks for your reply!

I am asking the question because in my case, at certain lambda the correlation would be negative and larger in value (e.g. -0.6) but the algorithm would automatically find the largest positive correlation over the grid (e.g. 0.3).

If we follow the logic of “squared correlation” then it will become 0.36 vs. 0.09, which the first case (cor = -0.6) would be selected. That makes me confused.

Matthew

kimanh.lecao · July 13, 2023, 10:46pm

@neurothew,

I see … it could be just in this tuning function, although it is awkward to me that the correlation would be negative to start with. How about you run for those lambda values an rcc and look at the sample plots to see if they are inverted?

We are not able to look into the code at the moment (no developer), and mixOmics focuses more on PLS than CCA (which you could try with PLS canonical mode).

Kim-Anh

Topic		Replies	Views
rCCA and output canonical correlation value based on variate scores Support	6	106	March 6, 2025
Canonical correlation output vs calculation	4	609	September 21, 2020
How to compute rCCA correlation significance Analysis	3	1303	April 15, 2020
Tune.rcc() is slow Support	1	386	June 30, 2021
Rcc: unequal number of rows in 'X' and 'Y' Analysis	1	388	February 3, 2022

Questions regarding cross-validation in rCCA lambda selection

Related topics