Block PLS: variance explained by components

Hi everyone,

I am working on a multi-omics analysis with the block.pls() method. There are 5 X-blocks with different number of features and the Y-block has 4 continuous variables. The aim of the study is to understand the mechanism rather than making pedictions.

As the first step, I would like to make a decision on the number of components by checking the variance explained by each component in each block. However, the variance explained inblock Y did not sum up to a value smaller than 1, which I couldn’t understand:

> fit.block.pls <- block.pls(X, Y, ncomp = 20, design = "full", scale = TRUE,  mode = "regression")
> sum(fit.block.pls$prop_expl_var$X1)
[1] 0.1021845
> sum(fit.block.pls$prop_expl_var$X2)
[1] 0.1782524
> sum(fit.block.pls$prop_expl_var$X3)
[1] 0.7210476
> sum(fit.block.pls$prop_expl_var$X4)
[1] 0.6524423
> sum(fit.block.pls$prop_expl_var$X5)
[1] 0.8613584
> sum(fit.block.pls$prop_expl_var$Y)
[1] 7.309105

I read a previous post about sPLS-DA where the variance explained in Y was 1 at the first component. But I think the explanation does not apply here because my Y block has 4 continuous variables instead of 1 binary variable.

The variance explained by each component in Y look like this:

given which, is there no clear clue of how to select the number of components? In the X-blocks, it is much easier because there are clear elbows in the plot at around component 8 (could not paste a plot here because of the limit for new users).

Many thanks in advance for your help!

Sincerely,
CR

hi @CRW

The reason might be that we calculate the variance explained per variable (in X or Y) and then add them up (I have to recheck this in the code but I am lacking time).

Screen Shot 2023-05-26 at 09.40.24

Either way, I dont think looking at the variance would help you choose the # of components because this is not what a multi block PLS is trying to do. It’s trying to maximise the covariance between the components of Y with the components of X.
We have not implemented the Q2 for block.pls() unfortunately, but perhaps you could simply calculate cov(variates$X1, variates$Y) to make your decision. Or inspect the sample plots and stop when there is not much interpretation to gain from it (i.e your treatment groups are not really discernible).

Kim-Anh

Dear Kim-Anh,

Thank you for your reply! I used cov(variates$X1, variates$Y) as you suggested and it indeed gave a clue of the number of components to extract.

I look forward to more implementations in the package!

Best regards,
CR

1 Like