Dear mixOmics team,
Thanks for developing this great package. I have a question when using the DIABLO method to find correlated and discriminatory features in my urine and plasma proteome data. I’ve seen that in the paper and several post that the author recommended first using PLS to see the concordance between omics, and the concordance is determined by correlation of first PLS component from different omics.
My question is that if that only by correlation can reflect the concordance between two omics?
I do not have much mathmatical background. But my understanding of PLS when applied to two omics data, is that it will maximize the covariance between the latent component from each omics. And the latent component can be regarded as linear combination of the orignal features. For omics data, they have much more features than samples. I would assume a high correlation between the PLS component from different omics, since their optimization goal is to do so, and the high dimensional feature will make it easy to achieve the goal.
For my own data analysis, I have a 0.73 correlation between urine and plasma proteome first PLS component. What makes me uncomfortable is that the first PLS component in urine actually explains 58% of the urine data variance, but the first PLS component in plasma explains only 8% of the plasma variance. I’m wondering if I can call it those urine and plasma data are still concordant?
Based on the correlation, I would say the urine and plasma proteome data are moderate to high concordant. But based on the variance explained, they seemed to be very different.
Hope to see your comments on this.