Explained Variance

Hello!

First, thank you to all who have contributed to this package and this forum - it has been an invaluable resource for me.

Second, my question. In short, I am wondering if someone might provide an explanation of the “Explained Variance” generated from a PLS or sPLS model. (I am currently working with a PLS model in regression mode.)

The manual states “This function calculates the variance explained by variates.” I am wondering how it is calculated? How ought it be interpreted? Importantly, I am wondering how it relates to Q^2? (E.g., in some models I run that have a negative Q^2, the Explained Variance is positive.)

Some example output provided, in case that is of any help. Here, let X be metabolomics1 and Y be metabolomics2. (And yes, sadly I am aware that my model is a poor fit…)

tune.pls$Q2.total
Q2.total
1 comp 0.001666725
2 comp 0.007492810

explained_variance(pls$X, pls$Y, ncomp = 2)
comp 1 comp 2
0.008908663 0.017972628

explained_variance(pls$Y, pls$X, ncomp = 2)
comp 1 comp 2
0.010692957 0.006996459

pls$explained_variance$X
comp 1 comp 2
0.41078514 0.08263149

pls$explained_variance$Y
comp 1 comp 2
0.09326141 0.27158469

Any help would be much appreciated! Thank you.

Hi @bort,

Thanks for using mixOmics and sharing your questions with us.

Briefly, Explained Variance of a components is simply the component’s variance over total variance in the data. Typically but not always, components with relatively very low explained variance tend to be less important.

Q2 is a measure of model fit as to how well the model fits the data when using cross-validation. Where 1 means complete fit, 0 represents a model which perform only as well as using mean value of the train data as prediction for the test data, and negative values mean a rather poor model.

Hope this helps. Please let us know if you need further clarification.

Best wishes,

Al

Hi Al,

Thank you very much for the quick response.

RE: Explained Variance. I think I can see how this would work for the X and Y components individually, but I’m unsure how the total variance would be calculated for the X-Y data in full. I am wondering if you could perhaps provide an example calculation or a citation? No worries if not!

RE: Q2. I believe I understand this. I.e., that the Q2 is obtained from cross-validation predictive performance, whereas the explained variance are “inherent” to the data and the components.

Thank you again!

Best,
Brett

HI @bort,

My pleasure.

RE Explained Variance: The pls functions report the explained variance for each component per dataset, so in X and Y space separately. In the XY space covariance is the relevant measure which is the criteria to maximise by the pls function.

RE Q2: That’s correct. Q2 is measure of model fit, while Explained Variance indicates the amount of variability in each component for each dataset. Note that we are maximising the covariance b/w X and Y components so the amount of variance explained by components does not directly inform of their relevance.

Best wishes,

Al