Hi,

Thanks very much for this fantastic resource, which I’ve recently started exploring (apologies for the novice questions!). I am working with a longitudinal dataset consisting of several hundred individuals with two omics (transcriptomics and metabolomics) measured on them at three or more timepoints. I have been exploring use of rCCA for analysis of these data.

I have run rCCA using the shrinkage method (for the purposes of this specific analysis I am ignoring the longitudinal aspect, but am interested in exploring this further in the future), and have found that the resulting canonical correlations are extremely high (the first 300 are above 0.9). They each explain quite a small amount of the variance from the transcriptomic data, but a larger amount of the variance from the metabolomic data. The canonical correlations seem to be reproducibly high - is how high they are necessarily problematic?

Assuming they are robust, are you able to offer any suggestions on how I should go about selecting the number of canonical variates to include in downstream analyses (I am interested in identifying the top features in each ‘omic accounting for the covariance between the two datasets, and to visualise these and the relationships between them in a network plot, potentially using GEPHI).

I’d be very grateful for any advice you could offer. Thanks!

Julia