Questions regarding cross-validation in rCCA lambda selection

Hello,

I found this package while I am searching for rCCA implementation in R. I am trying to use tune.rcc.R to find the optimal lambda that maximize the output CV score.

As I read into the source code in tune.rcc.R, although I understand most part of the Mfold function inside tune.rcc.R, I have a question regarding the second last line:

cv.score = cor(xscore, yscore, use = "pairwise")

As far as I understand, the whole leave-one-out CV procedures is largely based on the definition in Leurgans et al. (1993). Here are some original wordings from P.729 of the article:

The cross-validation score of a is then defined to be the squared correlation of the n pairs of numbers … We then choose the value of a that maximizes this correlation.

My question is, should the cv.score be the squared correlation, or just correlation? Sorry in advance if I misunderstood the 1993 article since my mathematical background is not very good.

Appreciate any response, thanks!

References
Leurgans, S. E., Moyeed, R. A., & Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society, 55(3), 725–740.

hi @neurothew,

You want the correlation to be positive, which is what the CCA will try to do anyway, so I think it makes no difference.
If the tuning is too cumbersome, you can have a look at the argument method = 'shrinkage' directly in rcc(). It will not calculate the best lambda per component but across all components.

Kim-Anh

Hi @kimanh.lecao, thanks for your reply!

I am asking the question because in my case, at certain lambda the correlation would be negative and larger in value (e.g. -0.6) but the algorithm would automatically find the largest positive correlation over the grid (e.g. 0.3).

If we follow the logic of “squared correlation” then it will become 0.36 vs. 0.09, which the first case (cor = -0.6) would be selected. That makes me confused.

Matthew

@neurothew,

I see … it could be just in this tuning function, although it is awkward to me that the correlation would be negative to start with. How about you run for those lambda values an rcc and look at the sample plots to see if they are inverted?

We are not able to look into the code at the moment (no developer), and mixOmics focuses more on PLS than CCA (which you could try with PLS canonical mode).

Kim-Anh