Using PLS to determine the concordance between omics

Dear mixOmics team,
Thanks for developing this great package. I have a question when using the DIABLO method to find correlated and discriminatory features in my urine and plasma proteome data. I’ve seen that in the paper and several post that the author recommended first using PLS to see the concordance between omics, and the concordance is determined by correlation of first PLS component from different omics.

My question is that if that only by correlation can reflect the concordance between two omics?
I do not have much mathmatical background. But my understanding of PLS when applied to two omics data, is that it will maximize the covariance between the latent component from each omics. And the latent component can be regarded as linear combination of the orignal features. For omics data, they have much more features than samples. I would assume a high correlation between the PLS component from different omics, since their optimization goal is to do so, and the high dimensional feature will make it easy to achieve the goal.

For my own data analysis, I have a 0.73 correlation between urine and plasma proteome first PLS component. What makes me uncomfortable is that the first PLS component in urine actually explains 58% of the urine data variance, but the first PLS component in plasma explains only 8% of the plasma variance. I’m wondering if I can call it those urine and plasma data are still concordant?

Based on the correlation, I would say the urine and plasma proteome data are moderate to high concordant. But based on the variance explained, they seemed to be very different.
Hope to see your comments on this.

hi @ljiahao

You interpretation is overall correct. The correlation coefficient indicates that those two data set include similar information on the first PLS component. However, each data set has different variance *see likes the plasma data are quite noisy) which explains the difference in the amount of explained variance.
Just remember that the components are at the sample level (the matching samples across the 2 data sets can be summarised ‘equivalently’ by a linear combination of either proteins from urine and plasma), whereas the amount of explained variance is at the variable level (calculated based on the total amount of variance in each data set).

Hope that helps,