Interpretation variance explained in mode="regression"

Hi mixOmcis team and users,

I have a question regarding the interpretation of proportion of explained variance when mode=“regression”. I can observe for this mode that the sum of explained variance for Y over pls comp are over 100%. That’s not the case when the mode is classical.

On the Linnerud dataset example :

data(“linnerud”)
X = scale(linnerud$physiological)
Y = scale(linnerud$exercise)
mod_mix_regression = mixOmics::pls(Y=Y,X=X,mode=“regression”,scale=F,ncomp=3)
mod_mix_classic = mixOmics::pls(Y=Y,X=X,mode=“classic”,scale=F,ncomp=3)
cumsum(mod_mix_classic$prop_expl_var$Y)
cumsum(mod_mix_regression$prop_expl_var$Y) #over 100%

My questions are :

  • I guess that it comes from the way Y is deflated/normalized but why this choice?
  • Is it still possible to interpret proportion of explained variance? How?

Best regards.

Hi @ggrignon,

You’re right that the sum of explained variance is over 100% in the regression mode for PLS but not the classic mode. In the classic mode each component is orthogonal, so none of them explain any shared variance so total explained variance is <100%. For regression mode Y is not deflated across components, as each component is trying to explain the original Y data. This means that components can explain overlapping variance in the data, so the total explained variance can be >100%.

You can use the explained variance to compare the importance of the PLS components in regression mode, but the cumulative sum of explained variance is not informative because the component can explain overlapping variance.

Hope that helps!
Eva