Good morning!
Is PLS canonical correlation equivalent to PLS correlation? If it is not, which one is more suitable for computing a symmetrical PLS on biological data sets (gene, bacteriota, cytokines ecc.)?
Thank you,
Leandro
Good morning!
Is PLS canonical correlation equivalent to PLS correlation? If it is not, which one is more suitable for computing a symmetrical PLS on biological data sets (gene, bacteriota, cytokines ecc.)?
Thank you,
Leandro
Morning @Leandro,
Is PLS canonical correlation equivalent to PLS correlation?
This question doesn’t make a lot of sense. I believe maybe you’ve gotten confused by something along the way.
When you say ‘PLS correlation’, I’ll assume you are referring to the correlation between the components yielded by a PLS model. Therefore, if using canonical PLS, you could refer to these values as ‘PLS canonical correlations’, but I would say this is far from the best way to describe it.
‘Canonical correlation’ is a term we usually reserve for Canonical Correlation Analysis (or CCA - read more here). This is because CCA is concerned with maximising the correlation between components whereas PLS maximises covariance between components.
PLS has the canonical mode which works in a similar way to CCA - such that the two datasets are considered symmetrically - but is more appropriate than CCA for contexts of high dimensionality.
If it is not, which one is more suitable for computing a symmetrical PLS on biological data sets (gene, bacteriota, cytokines ecc.)?
If you are looking at your data in a symmetrical sense, then canonical PLS is the method you’re looking for. Remember however that this method looks at the covariance between your components, not the correlation. If you have a small number of features and/or you are more concerned with the correlation between resulting components, then (r)CCA is the better bet.
Hope this clarified a few things for you. I’d definietely recommend reading through the information on mixOmics.org to help with your understanding.
Cheers,
Max.
Good morning Max, thank you for your quick reply.
Sorry if I have not been clear, I was referring to PLSC which I’m actually studying in those pages:
Being a correlation and not a regression, PLSC is also symmetrical (just as PLS in canonical correlation mode) and because of its name (“correlation”) I thought that they may be the same algorithm.
Between PLSC and “PLS in canonical mode” which one do you suggest in order to analyze (generally) the type of data set I was talking about?
Leandro
Ah my apologies. I thought you were referring to something within mixOmics
and I was a little confused!
While I think to some degree these techniques are related, I wouldn’t go as far to say that theyre the same. It really depends on which implementation of PLSC you are looking at, as different researchers/packages are developed using slightly different variants of the same method. mixOmics
’ PLS does employ SVD and maximises covariance between latent components (as described on the wikipedia article and the bookdown you sent), but beyond this its very hard to say. Having said that, I think its fair to assume that the outputs of mixOmics
PLS and whatever implementation of PLSC would be comparable.
Which method is more appropriate does not just depend on what type of data (eg. genetic expression), but more on the implements and techniques used to measure them, how many features were recorded and how many samples there are. Hence, I cannot say with confidence which method I would suggest.
It may not be the most helpful answer, but I’d suggest exploring both. Begin with the package you are more comfortable and familiar with. Use this to set a benchmark of performance and evaluate the other package’s implementation against this. If you can’t get a better model, stick with what you started with.
Thank you so much Max for you availability, you have been really clear and helpfull!
Best wishes,
Leandro