Interpretation variance explained in mode="regression"

ggrignon · May 14, 2025, 1:16pm

Hi mixOmcis team and users,

I have a question regarding the interpretation of proportion of explained variance when mode=“regression”. I can observe for this mode that the sum of explained variance for Y over pls comp are over 100%. That’s not the case when the mode is classical.

On the Linnerud dataset example :

data(“linnerud”)
X = scale(linnerud$physiological)
Y = scale(linnerud$exercise)
mod_mix_regression = mixOmics::pls(Y=Y,X=X,mode=“regression”,scale=F,ncomp=3)
mod_mix_classic = mixOmics::pls(Y=Y,X=X,mode=“classic”,scale=F,ncomp=3)
cumsum(mod_mix_classic$prop_expl_var$Y)
cumsum(mod_mix_regression$prop_expl_var$Y) #over 100%

My questions are :

I guess that it comes from the way Y is deflated/normalized but why this choice?
Is it still possible to interpret proportion of explained variance? How?

Best regards.

evahamrud · May 16, 2025, 3:12am

Hi @ggrignon,

You’re right that the sum of explained variance is over 100% in the regression mode for PLS but not the classic mode. In the classic mode each component is orthogonal, so none of them explain any shared variance so total explained variance is <100%. For regression mode Y is not deflated across components, as each component is trying to explain the original Y data. This means that components can explain overlapping variance in the data, so the total explained variance can be >100%.

You can use the explained variance to compare the importance of the PLS components in regression mode, but the cumulative sum of explained variance is not informative because the component can explain overlapping variance.

Hope that helps!
Eva

ggrignon · May 19, 2025, 4:30pm

Hi Eva,

Thanks for the answer. I would like to know what is the practical advantage of regression mode over classic mode? In which case should I use regression instead of the classic method?

Thanks.

evahamrud · May 30, 2025, 6:03am

Hi @ggrignon,

The choice of mode will depend on what your analysis aim is.

You should use regression mode when you want to predict an outcome from one dataset.
e.g. I have transcriptomics data from different tumour samples. Can I model tumour size based on transcriptomics data?

You should use canonical mode when you’re comparing two datasets of equal importance to find patterns they share across samples.
e.g. I have transcriptomics and proteomics data from different tumour samples. Do these datasets agree?

Cheers,
Eva

Topic		Replies	Views
Explained Variance	3	2008	March 4, 2020
PLSDA Mode difference between “classic” and “regression” Analysis	4	716	June 10, 2021
Using PLS to determine the concordance between omics Analysis	1	372	December 3, 2020
PLS-DA Amount of variance explained by components Analysis	2	1652	September 10, 2021
Block.plsda explained variance Analysis	0	310	July 20, 2021

Interpretation variance explained in mode="regression"

Related topics