Questions on multiblock (s)PLS

Hi,

I plan to use the multiblock (s)PLS model to find multi-omic determinants of an univariate continuous outcome Y. After looking at the documentation, I still have few questions regarding this model:

  • What is the difference between RGCCA and multi-block PLS ? What part of the algorithm include the regression on Y ?
  • The perf() and tune() functions are not implemented for multi-block (s)PLS, right? Is there any quality indicator we could use (regression accuracy, correlation/covariance explained by each component/block etc)?

Thanks a lot for your help
Ines

hi @InesAmine,

I am not sure how easy this graphic is:

RGGCA does not have a Y response whereas block.splsda does.
We have not included any perf / tune function at this stage because methodologically, it requires a bit of work. When you enter the realm of block.spls things get a bit more exploratory. I often advise to choose a set number of variables (e.g 50 X variables) and inspect the results on the sample plots, and the variable plots. At this stage you want to generate hypotheses, but validating them numerically is difficult. As you mentioned you can look at correlation between components, variance explained, and regression accuracy (although the latter can be difficult to implement).

Kim-Anh

Hi Kim-Anh,
Thanks a lot for your help, it is clearer now.
Since then, I have another question regarding multiblock.pls: by default, what is the connection matrix for a predicting setting? A matrix with 1 everywhere except in the diagonal? Or stronger connection between the X blocks with Y then between the X blocks?
Thanks a lot
Ines

hi @InesAmine,

This one:

Or stronger connection between the X blocks with Y then between the X blocks?

But it depends on the data and also what you would like to highlight from the integration between them. Run a few ‘dry runs’ to inspect the graphics.

Kim-Anh